Headline: Data sustainability – The next big thing!
polypoly polyVerse
2021-07-12

T
he crux with data: Our digitised world is based on data. The mantra of recent years: the more data you collect, the better. After all, “Big Data” promises a treasure trove that increases profit and innovation. It is not enough to just collect data however, it must also be analysed.

As the piles of data continue to grow, companies and institutions can no longer keep up with their evaluation. In a survey by splunk, 57 % of business and IT professionals state: The volume of data is increasing so fast that their company cannot keep up.

It is time for a rethink. How? Sustainability. You may be asking yourself: “Sustainability? Wasn't that something about environmental protection?” Yes, you are right, but we are talking about economically oriented sustainability. To use the words of Pufé: it is not about “[...] generating profits which then flow into environmental and social projects, but about generating profits in an environmentally and socially compatible way”¹

So, with regard to the sustainability of data, there are several aspects to consider, which we look more closely at below:

  • the economic aspect
  • the ecological aspect and
  • the social aspect

Collecting all the data you can get your hands on is a really expensive hobby!

Collecting and storing data incurs enormous costs, many of which we may not even be aware. However, before we get into the specifics of costs, here are a few brief definitions to aid understanding.

Data can be categorised into the following three types:

  • Business critical data (clean data)
  • ROT data
  • Dark data

Business-critical data is data that is indispensable for the economic continuity and growth of a company. It has been made usable, hence the term “clean data”. These account for 14% of the total.

32% of the data is ROT data. This is data that exists multiple times, is not needed for other reasons and is therefore worthless. ROT stands for “redundant, obsolete and trivial”.

Dark data is unclassified, i.e. unused data whose content and usefulness is unknown. This group is growing faster and faster and its share is already 54%.

As mentioned above, collecting and storing data is decidedly expensive, as a calculation example from Com magazine shows: In a more restrained company that has accumulated about 250 terabytes of data, there is an estimated total cost for annual data storage of 1.25 million US dollars. Specifically, this means:

$175,000 to store clean data (14%)
$400,000 to store ROT data (32%)
$675,000 to store dark data (54%)

As a reminder, only clean data is relevant for the continued existence of the company, the rest is - excuse the expression – really expensive data rubbish.

Cost is not the only issue here, however. The assumed 250 terabytes correspond to about 580 million files. The content of more than half of these, i.e. just over 300 million files, is completely unknown. Many of these files contain personal information, i.e. information relevant to the GDPR. 
Welcome to data protection hell!

We sum up: The uncontrolled collection and storage of data not only causes high costs, but also trouble with regard to data protection.

An expensive hobby that consumes a lot of CO₂

The amount of data collected and stored is not decreasing, quite the opposite. According to IWD, the amount of data worldwide increases by about 27% every year. To put this into perspective: one zettabyte corresponds to one billion terabytes. A 90-minute film in standard quality requires about 500 megabytes of storage space. This means that a zettabyte is equivalent to about two trillion films – a two with twelve zeros. A forecast by the IWD assumes a worldwide data volume of 175 zettabytes in 2025. This enormous amount of data presents us with a further problem: energy requirements for data centres means the production of vast amounts of CO₂.

Since 2010, energy demand for German data centres has increased by 15% to 12 billion kWh/a, which is about 2 % of the total electricity consumption in Germany. The trend is increasing: in 2025, the electricity demand will be around 16.4 billion kWh/a. As a reminder: over 50% of this data is dark data, i.e. useless data. According to estimates, the global energy demand for the storage of dark data in 2020 led to the production of 5.8 million tonnes of CO₂. By way of comparison, this corresponds to the amount of CO₂ consumed by a car when it circles the earth 575,000 times.

We sum up: The uncontrolled collection and storage of data is an ecological mistake.

The social aspect of data

According to the splunk survey mentioned above, 80% of executives see data as a success factor for their company. This is also reflected in the price of data.

However, the only ones who do not participate are all of us – the data producers. We can use services and platforms supposedly for free, but the price we have to pay for them is not transparent to us. We users have lost control over our data and often we don't even know the consequences because they are usually indirect. The result: violation of our privacy.

We sum up: The uncontrolled collection and storage of data violates the privacy of users.

The uncontrolled collection and storage of data has many disadvantages. Data is important for our society though and therefore indispensable. What then is the alternative?

Data sustainability as an image factor

Why collect data that you neither need nor can use? Why look at the GDPR as a cost centre and not as a competitive advantage with which you can credibly strengthen your image? This is a huge potential for European companies, because the precious commodity is not data, but the insights that can be drawn from it.

Instead of collecting all the data twice and three times on central servers and analysing it there with algorithms, it would be much cheaper and more effective to send the algorithms to where the data is. How? By using the supercomputers in our pockets - our smartphones. A decentralised infrastructure makes it possible to analyse data directly on any end device. So the data no longer leaves the device, only the insights.

And that's where we at polypoly come in, with our polyPod. This is GDPR as technology – developed by us, a data cooperative owned by European citizens.

The polyPod makes economic sense, because only the information that is really needed is requested and instead of data, knowledge is stored. 
The savings potential is immense!

The polyPod is ecologically sensible: the CO₂ savings potential is certainly not immense, but still relevant.

Last, but not least: the polyPod is social, because a digital income is made possible. When the algorithms come to the end devices, users determine who may use their computing power to gain insights. They can also determine the price.

If you would like more information about the polyPod, you can find it in our polyBlog: https://polypoly.coop/en-de/blog

Sources:
¹ Pufé 2014, S.16
Com-Magazin
www.iwd.de
Studie des Borderstep Instituts, gemeinsam mit dem Fraunhofer-Institut IZM
www.storage-insider.de