3 Jul

Dark Data: How to Think About Collecting Data More Sustainably

Viral Patel

To say organisations are increasingly focusing on data would be an understatement. More and more businesses are realising the value of a data-driven approach, investing in data leadership and formulating data competence at all levels of the organisation. 

The latest figures from the International Data Corporation (IDC) illustrate this more clearly. It predicts the total amount of digital data created by 2025 will rise to 175 zettabytes – 175 trillion gigabytes - with each of the world’s 7.3 billion people producing 1.7MB every minute. This, unsurprisingly, leads to a surge in data-centre workloads and further demands for modern data platforms on the cloud. 

With organisations growing their data portfolio in all facets of the business, what does this mean from a sustainability perspective? How can the tech industry enable this continued proliferation of data in a green and clean way? How can you be both a data-driven organisation and a sustainable one? 

An Explosion of Data

We have reached an age where organisations beyond the traditional, such as banks and telecoms firms, want to use data to drive business value and devise strategies. And the more that quality data is used, the more quality data will be produced.

For example, we recently saw the Denver Nuggets crowned NBA Champions. As an organisation, the NBA now ingests and analyses around 10 million data points per game. While volume is a sign of progress, this eruption of data comes with its own set of challenges when it comes to sustainability. 

Understanding Data Sustainability

Businesses are generally unaware of the sheer volume and various types of data they are collecting. This leaves us in the murky world of dark data. The term dark data is described as "any information assets organisations collect, process, and store during regular business activities but generally fail to use for other purposes". 

Let’s put this into perspective. The Economist says between 70% and 90% of data captured by an organisation is dark data. If we consider that data centres are responsible for 2.5% of human-produced carbon emissions (more than estimates of aviation’s 2.1%), we can start to understand the extent of the impact on the climate from storing dark data. These data centres have a large carbon footprint, requiring energy to build, run and cool (with some exciting innovations for how to do so). 

Not only is this data using up unnecessary energy, but this comes at a cost to transmit and store – all without being used for any real business opportunities or insights. For organisations to maximise their potential, business and technology, leads must ensure that their data is used sustainably to maximise their business value.

How is Dark Data Created?

A 2020 Seagate report, from Rethink Data, identified how only 32% of data in enterprises is being maximised and listed the most common reasons for dark data in large enterprise:

  1. Regulatory requirements
  2. Multiple platforms hosting data
  3. Later Usage

Regulatory Requirements

The introduction of regulations, such as GDPR, force organisations to follow more stringent standards for collecting and storing confidential/sensitive data. Most of these governing bodies enforce storing data for a set period of time. However, most organisations do not have a process for removing data that is no longer required to be kept or expired. This means more data than is needed is being stored, to the tune of unnecessary costs for the climate and financially.

Multiple Platforms Hosting Data

Organisations tend to sign up for other SaaS platform services integrated into their business. Most of the data hosted onto these SaaS platforms is used for short-lived purposes, though often not cleaned up after it is no longer required. With duplicate data being stored with no real value, businesses are blowing funds through poor data governance. 

Later Usage

One element of data storage is the potential for data to provide valuable insights at a future date. As data is forever changing, this leads to the question of ‘would the data stored in earlier times provide helpful insights?’ This has the potential to increase the volume of data being stored but not being actively used. 

Implications of Dark Data

With so much data being collected and stored unnecessarily, there is a significant impact beyond just financing this:

  • Cost - Significant costs are incurred to pay for servers space to host unused data
  • Compliance: While this data isn’t being used, it is still vulnerable to cyberattacks. Is the unused data stored within the same regulatory guidelines as the data that you are utilising? Not only does a cybersecurity breach lead to regulator fines, increased insurance premiums, but also reputational damage and potential revenue loss.
  • Inaccurate data analysis - When gathering large amounts of data, how do you know, for example, you’re training your model on correct data sets and not redundant data? If the unused data is not governed in the right way, it can blur the lines between what you want to utilise and what isn’t useful. 

How to Become Data Sustainable

What steps can you take to optimise your data and be more sustainable? There are four core areas to help how you think about collecting data more sustainably:

Effective Storage

Cloud storage has increased in affordability and allows organisations to store terabytes of data at a minimal cost. However, storing data that will deliver business insights is more pertinent to maximise your business value. As part of data transformation, it is good practice to emphasise eliminating data with no value and, in true DevOps fashion, bring that into your continuous lifecycle.

Implementing a robust storage management strategy for your data will enable you to focus on the data that will drive business value and eliminate data that is not required. This will not only reduce the energy consumption needed for storage, as you are now only storing useful data not dark data, but will also bring costs savings and admin reductions in focusing and mining the data that your business actually needs.


In our current climate we have large petabytes of data being transmitted which has in turn doubled our internet traffic. The power of clouds such as Azure and AWS provides local copies of data rather than retrieving copies of data from different data centres worldwide. 

This gives businesses access to a replica of your regional data to avoid having to create inter-connected global links between different regions. This networking infrastructure from cloud providers ensure enterprises remain compliant without having to own their own data centres – coming at great cost.

Cloud Native Solutions

Cloud providers such as Azure and AWS have, more than ever, been looking at providing customers with a plethora of cloud-native data solutions for customers to help drive their data transformation. Rather than building dedicated resources, it is worth considering cloud-native solutions to help drive your business value to become sustainable. A great example is that Vasakronnan adopted Digital Twin using Azure and expects to reduce their energy consumption cost while leveraging Azure cloud native solution.


Speak to product owners in respective team departments. Review the data that is being collected and stored, and determine how much of it has no value and can be removed without implicating governance or compliance. Discussion with key members of staff can help identify data that can be removed and step closer to storing valuable data and cutting consumption costs. Culturally, it can also increase data literacy across your organisation and lead to the collection and use of more quality data. Improving data literacy and ownership is important for more than just sustainability and reducing the carbon footprint, so this may well contribute to a wider and no less important conversation. 


Taking the steps to becoming a well governed, data-driven organisation also helps transform into a more sustainable one. It is well worth laying the groundwork in reviewing the data your organisation currently collects. Only by focusing on the key data points that are poignant to conveying your story, fostering innovation in products and business lines, and driving your business value can you then become lean and sustainable. 

One consideration we haven’t touched on, but will do in the future, is how to track and measure your data sustainability. To ensure the concepts and processes listed above are working, it’s important to record and improve on the emissions created by how your business collects and stores data.

By driving more focused data to deliver efficient results, businesses are also becoming sustainable and working towards a net zero goal. 

You can read more about how data and digital twins are carving a path to net zero in the energy industry.

Find out how we are helping TotalEnergies to harness data and support their energy transition.

Finally, check out the key takeaways from our race to net zero dinner with leaders from major energy and utilities firms.

Latest Stories

See More
This website uses cookies to maximize your experience and help us to understand how we can improve it. By clicking 'Accept', you consent to the use of these cookies. If you would like to manage your cookie settings, you can control this in your internet browser. Find out more in our Privacy Policy