The company is a major American investment bank.
The company realised that they had a huge quantity of opportunity and business value locked up in their data. Especially, when data assets are shared and combined across the company.
They wanted to do this in a way that was cost effective, removed pain points and enabled data reuse.
But the organisation did not want to go down the route of building a monolithic data lake, which would be too tightly coupled and hence difficult to scale.
Another major challenge was balancing value and risk: the more freely available data is, the higher its potential to create value, but also the greater the possible risk to the organisation.
Their central question was: how can we make data easy to share across the organisation while maintaining control?
The bank settled on a data mesh approach, which centres around several key principles: a domain-driven design approach, a product-thinking mindset (treating data as products) and decentralised data storage and sharing.
They chose to build their mesh in AWS, leveraging data-infrastructure-as-a-platform through their platform services. Leveraging higher managed services on AWS gave them the ability to deliver self-service infrastructure to their data domains.
Proper data discovery is one of the cornerstones of the data mesh. They used AWS Glue to build data cataloguing capability and AWS CloudWatch for monitoring to ensure effective metrics were placed around the data.
A metadata catalogue is used to track the provenance and movement of data so the quality can be trusted and reported on to regulators. Whenever data is moved or changes, the data catalogue is automatically updated so the catalog always reflects the actual data available.
They shifted their thinking around data to be able to work in a product-centric fashion. They took a “you build it, you own it, you run it” approach to incorporating ownership of data products. Data products are the central concern with data lakes organised around them (not the other way around).
This meant that business teams own the data products end-to-end, with support from data specialists (rather than the other way around).
With the data mesh in place, the bank’s data architecture is now aligned with its data product strategy.
This means that data product owners themselves - who understand their own domain data better than anyone - are empowered to make risk-based decisions regarding the management of their data. This helps to facilitate rapid decision-making, minimising the wait time for the data consumers.
The bank is able to make critical datasets very easy to discover and use for all other domains in the business, whether it be business reporting analytics or feeding data into machine learning models. Overall, there is also a much higher level of observability across the business.
This has lowered data risks, because data is now shared across the business, rather than being copied and moved around the business. This is how the organisation solved the paradox of enabling collaboration and availability, while reducing risk.