“Failure rates for analytics, AI, and big data projects = 85% – yikes!”
Brian O’Neill, Founder, Designing for Analytics (source)
Enterprises are rightly in pursuit of the promise of big data: democratised access to real-time insights into your customers, your products and your business.
These insights can then provide the best-possible ground for making killer business decisions and driving real competitive advantage.
But, as our friend points out in the introductory quote, these data investments rarely deliver what they promise.
In this article, I want to explore the key reasons why the promise of data has not yet been achieved and articulate some of the main benefits of an approach to data that is shaking up the data world: the data mesh!
Has anyone ever seen a truly data-driven enterprise business?
Many enterprises are investing huge sums in next-generation technology, which, on paper, should deliver the promised value but, in practice, deliver mediocre results at a small scale.
There are a few core reasons for this.
Enterprise data platforms are nearly always centralised (in lakes, warehouses etc.).
This can work in smaller organisations with fewer data sources and fewer data consumers, but in an enterprise organisation (with large numbers of both) you run into a few problems.
Firstly, so many different kinds of data accumulate that being able to ingest and make sense of it gets harder and harder.
Secondly, growing numbers of consumers (with ever-larger lists of use cases) also need to slice and dice the data in increasingly different ways to get what they need. Response times get slower as more people are piling into the same data platform.
Your sophisticated platform ends up being a data jumble sale where it’s hard for the prospective buyer to find the valuable nugget they need
Enterprise platform architectures are monolithic and built around highly-coupled data pipelines.
This is akin to how applications used to be built: monolithic apps were constructed with high levels of dependency between the different components. This meant that when it was time to add new features or scale parts of the app, the entire application would have to be redeployed.
It’s like having to repaint your entire house every time you want to give your front door a quick coat.
The result is the platform cannot change and/or scale at the pace required for real-time data analytics.
When the data systems are centralised, highly-coupled and monolithic, the teams that run them must be structured in the same way.
So, teams are often split into domain-specific teams at each end (where the data comes in and where it comes out) with a big team of generic data engineers in the middle.
The key flaw of this setup is that it breaks the chain of accountability from one end of the platform to the other.
This means that, firstly, teams focus on their individual task at the expense of the wider view across the whole platform (and, indeed, the whole business). And, secondly, the data teams are disconnected from the business goals and objectives.
How are they supposed to know which tasks to prioritise or how to improve the platform if they don’t know the direction the business is headed?
Together, the result is bottlenecks at different siloes, leading to overwhelmed teams, massive backlogs of work and frustrated consumers.
This will not lead you to the promised land of data!
But these problems are familiar.
The software development and infrastructure space was in the exact same predicament six or seven years ago.
The problems of centralisation and monolithicity were solved with microservices: breaking applications down into small, independent functional services that could be changed independently.
Similarly, the problems of siloes and hyper-specialisation were solved with the popularisation of DevOps, an approach to software development that prioritised cross-functional teams and end-to-end accountability.
The same evolution is now happening in the world of data and is symbolised by a new approach called the data mesh.
The data mesh is a paradigm that questions the foundational assumption of data in the enterprise: that the data sources, the platform itself and the data team have to be centralised.
Instead, the data mesh approach decentralises the whole thing, supporting distributed, democratised, self-serve access to data organised by business domain, not by pipeline stage.
The data mesh is founded on a few core principles:
Inspired by Eric Evans’ concept of domain-driven design, the data mesh turns how we think about ownership in the world of data upside-down.
Rather than thinking in terms of pipeline stages (i.e. data source teams shipping data to a central data lake to be sifted through by a centralised data team, who then prepare it for data consumers), we think about data in terms of domains (e.g. marketing or finance).
This is much more useful from a business perspective as it maps much more closely to the actual structure of your business.
Importantly, domains can be followed from one end of the business to the other, meaning teams are accountable from end-to-end and that their processes can be scaled without impacting other teams.
By thinking about data in terms of domains, this enables a shift towards Product Thinking.
Product Thinking emphasises solving the customer’s problem as the main priority of any task or project. So keeping your eye on the business goal, rather than getting lost in technicalities.
The paradigm shift is for these data domains and their teams to start thinking about themselves as a ‘mini enterprise’ that is building a product (high-quality, accessible data sets) that will make their customers (lines of business, other data teams etc.) deliriously happy!
The result is cross-functional teams of techies and business folk (the mini enterprise) who are fully accountable for delivering data products that meet the needs of the business as a whole.
The one problem you might foresee with such domain-specific mini enterprises is that there would be a lot of duplication of effort (particularly with each needing their own data pipelines and infrastructure).
But, again, this is a solved problem in the software development space.
The data mesh approach is to leverage the cloud and automation to create templates for self-serve data infrastructure that any team can instantly spin up.
This ‘universal interoperability layer’ means each domain can handle their own pipelines, while maintaining company-wide data and security standards.
From a technical perspective, the major benefits are speed, agility, access and scalability.
Speed: with each domain able to meet their own needs and take full responsibility dependencies are minimised and everyone can plow ahead at full pelt!
Agility: just like with microservices, each node in the data mesh can be updated and built upon without needing to redeploy the whole platform.
Access: data source teams are no longer dependent on a centralised (and overworked!) data team to ingest their data, they can just get on with it themselves. Similarly, consumers know exactly where to go (i.e. which domain) to get what they need and can help themselves.
Scalability: decoupling the components of the platform makes enterprise-grade scaling a possibility, which is foundational to truly see business value from your data investments beyond a few small experimental projects.
These technical benefits translate into massive business benefits:
Enable ML and AI at scale: when teams can serve their own data needs at speed and scale, you can much more easily establish innovation centres across the business using ML and AI to experiment and innovate.
Unleash innovation: by decentralising the whole data kit and caboodle, democratising access with scalable self-serve capacities and placing the emphasis on providing a first-class consumer experience, the data mesh opens the door to serial, distributed, democratised innovation.
Align data with business objectives: this is critical! By eliminating the siloes that have traditionally separated the data engineers from the business folk, these two ‘sides’ can start to mutually inform each other: business objectives set the direction for data projects, the results of which then inform business objectives. It’s a virtuous circle!
In my opinion, the data mesh is inaugurating the kind of revolution that the software development world has seen over the last ten years or so with the advent of DevOps, SRE and the like.
Do you agree? Disagree? Drop me a message if you want to discuss!