As enterprises continue to relentlessly search out new sources of revenue, data science is gaining more and more attention as a powerful source of innovation and business value.
And that’s why many are making it a top investment target for 2022. Yet, it’s a complex domain where most companies are struggling to see real results.
As investments pour in and new technologies emerge, what are the most relevant and important trends to look out for?
In this blog, I’ll highlight the five juiciest enterprise data science trends we’re seeing that will have a deep and lasting impact on the enterprise in 2022.
Let’s jump in.
MLOps is the buzzword that designates the various attempts to standardise the tools and processes needed for the large-scale deployment of machine learning in production.
It’s a much-needed field: the rewards for cracking the ML code would deliver substantial business value for any enterprise, but most are struggling.
But it’s also a very young field. So while there are lots of products that solve part of the problem (and many pie-in-the-sky frameworks few enterprises could realistically achieve), there are no standardised best practices and tools that companies can apply consistently for success.
The result is that currently many of the wheels on the ML bus start to fall off when companies try to scale: they can’t provide high-quality data access, automate testing to check outputs, track the drift of their models over time or easily retrain or retire existing models.
Everyone has a bit of something, but there’s no consensus on how to bring those ‘somethings’ together into a complete framework.
But that is changing.
The pieces of the puzzle are slowly being brought together. And in 2022, I expect to see the many ‘somethings’ that exist start to come together in a more coherent fashion that give companies a reliable handrail when attempting to scale out their own ML efforts.
Given the relatively low cost of storage and compute, companies have tended to solve difficult machine learning problems by simply throwing more data at the model.
They try to get at a useful output by increasing the quantity of data they train for their model, rather than increasing the quality of the data points they already have.
But people are starting to realise that emphasising volume at the cost of quality may be a missed opportunity.
There is a growing trend towards ‘data-centric’ AI that prioritises cleaning the data as much as possible to get the best results.
Often, a smaller extremely clean training set can yield much better results with simpler models than throwing masses of data at the latest deep learning architecture.
Another key driver for this will be TinyML: a term that describes implementing ML on hardware with limited compute and storage. The sharp limits of the horsepower of local devices forces us back to using smaller, simpler models.
With simpler models, we are much better off with a smaller amount of cleaner data than the other way around. Because you can’t use your smart fridge to perform deep learning on a ton of dirty data (yet?!).
People have been over-emphasising data volume at the expense of data quality. The trend I see is the penny starting to drop on the importance of data quality instead, with smaller, simpler models being deployed with super-clean data to get great results.
More and more businesses want the capabilities that data science brings, but there is nowhere near enough talent to go around.
As a result, a trend has emerged towards the ‘downskilling’ of data science: automating key aspects to create low- or no-code ML models that can be used by those without maths and data backgrounds.
Known most popularly as AutoML, this trend will only accelerate as the popularity of data science continues to skyrocket, with more and more businesses hoping to pluck low-hanging fruit using low-code models.
But this will come at a cost.
Using a complex tool when you don’t understand how it works can be dangerous, especially when using it to inform or make key business decisions.
ML models feature many nuances and biases that, if not properly considered, can massively skew the outputs in terms of scale, distribution, representation and so on.
Unskilled practitioners using drag-and-drop ML platforms are simply not going to be able to make the adjustments necessary.
I expect the trend towards the ‘downskilling’ of AI to continue, but with it some pretty eye-popping (and possibly well-publicised) mistakes will be made at scale.
Ethics and responsibility is a growing concern for all enterprises, not only for the sake of external credibility but because of the genuine business value that it can bring.
And the Environment, Social and Governance (ESG) score is one of the most popular ways of assessing a form’s credibility in this domain. Yet, to date, ESG scores have been fairly arbitrary: with accrediting organisations deciding what data points are relevant based largely on their own biases, rather than on objective measures of ethical probity.
Their models might sound scientific but, in my experience, if you peek under the hood it is a fairly arbitrary process.
The trend I expect to see is that, as the importance of ESG scores increases, they become more and more aligned with a credible scientific process that has data science, artificial intelligence and machine learning at its core.
We could, for example, use natural language processing to parse the media for the best examples of ESG excellence and then reverse-engineer the most relevant factors to score enterprises on.
Taking a scientific, data-driven approach will help enterprises to get a better view of the sustainability of their own business and will help accreditors to provide a more credible and robust service.
Knowing your customer is the first task of any prospective business.
What do they think of our business? Our products? Our competitors?
Traditionally, companies have sought answers to these questions by hiring someone to survey their customers for them.
This is, however, a pretty expensive method that carries massive sampling biases: you don’t end up surveying your customers as a whole but, rather, that segment of your customers that don’t mind filling in surveys; a very different prospect.
Yet, your average enterprise already holds the keys to the castle: mountains of data from each and every customer interaction.
In 2022, we’ll see a trend towards companies investing much more in turning that mountain of data into gold.
Advances in the power and accessibility of data science approaches means you can interrogate both internal and public data to give a comprehensive appraisal of your customers and their experience with your company.
Using only data that companies already have it’s now relatively straight-forward to gather simple but incredibly valuable metrics such as:
- How many people spend time looking for product X then leave?
- How far do people go along the buying process before dropping off?
- Which key interactions drive brand loyalty?
- What kinds of customer experience attract those with higher lifetime value to the business?
And this trend is going through the roof as a result of the switch to digital that has been amplified by the Covid-19 pandemic.
These trends see data science further proliferate across the enterprise: from how they run their technology, to how they track their performance and relate to their customers.
Many enterprises want to get started with data science, but they need some help automating and scaling it.
At Mesh-AI, we are a global consultancy that uses data, machine learning and artificial intelligence to deliver transformative outcomes for enterprise organisations.
Our mission is to make data your competitive advantage.
Get in touch to see if we can help you to upgrade your data science capability!
If you want to be competitive, you need to sort your data constraints, and that's where Mesh-AI can help. Identify the areas in your organisation that require the most attention and solve your most crucial data bottlenecks. Get in touch with us at firstname.lastname@example.org for a Data Maturity Assessment.
Interested in seeing our latest blogs as soon as they get released? Sign up for our newsletter using the form below, and also follow us on LinkedIn.