We talk an awful lot about how data growth is exploding here at Aprigo, and apparently we’re not alone. This morning while catching up with my RSS feeds, I started to notice a trend. First, I saw an article from The Economist entitled “Data, data everywhere” with the subhead “Information has gone from scarce to superabundant. That brings huge new benefits, says Kenneth Cukier —but also big headaches”. The article goes on to talk about how the Sloan Digital Sky Survey’s telescope collected 140 Terabytes of data in 2000, while its successor will collect that much data every five days.
The article then goes into Pro and Con mode:
All these examples tell the same story: that the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account.
But they are also creating a host of new problems. Despite the abundance of tools to capture, process and share all this information—sensors, computers, mobile phones and the like—it already exceeds the available storage space (see chart 1). Moreover, ensuring data security and protecting privacy is becoming harder as the information multiplies and is shared ever more widely around the world.
The takeaway: Data growth is big.
The obvious problem: It costs money to store this stuff, and people aren’t going to stop creating files.
The real problem: Lots of data makes it insanely difficult to keep up with security and privacy.
Just a few days before that, The Economist put out another article on data growth from the print edition: “All too much- Monstrous amounts of data“. The article follows a very similar format with examples of things that produce huge amounts of data (this time the LHC, which generates 40 TB of data per second), followed by what that means.
Only 5% of the information that is created is “structured”, meaning it comes in a standard format of words or numbers that can be read by computers. The rest are things like photos and phone calls which are less easily retrievable and usable. But this is changing as content on the web is increasingly “tagged”, and facial-recognition and voice-recognition software can identify people and words in digital files.
The takeaway: 95% of all data is unstructured
The obvious problem: It’s difficult to use and retrieve data when it is not structured
The real problem: With all this unstructured data being created, copied, and shared, it becomes nearly impossible to find the good stuff among the garbage and to secure the sensitive stuff.
And again, in the same print edition of The Economist, another article about the data pandemonium appeared: “The data deluge – Businesses, governments and society are only starting to tap its vast potential“. Same deal: examples of things creating lots of data (a retail supply chain firm and flying drones in Iraq) and the problems huge data creates:
Everywhere you look, the quantity of information in the world is soaring. According to one estimate, mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes. Merely keeping up with this flood, and storing the bits that might be useful, is difficult enough. Analysing it, to spot patterns and extract useful information, is harder still. Even so, the data deluge is already starting to transform business, government, science and everyday life (see our special report in this issue). It has great potential for good—as long as consumers, companies and governments make the right choices about when to restrict the flow of data, and when to encourage it.
and
But the data deluge also poses risks. Examples abound of databases being stolen: disks full of social-security data go missing, laptops loaded with tax records are left in taxis, credit-card numbers are stolen from online retailers. The result is privacy breaches, identity theft and fraud. Privacy infringements are also possible even without such foul play: witness the periodic fusses when Facebook or Google unexpectedly change the privacy settings on their online social networks, causing members to reveal personal information unwittingly. A more sinister threat comes from Big Brotherishness of various kinds, particularly when governments compel companies to hand over personal information about their customers. Rather than owning and controlling their own personal data, people very often find that they have lost control of it.
The takeaway: The more data you have the bigger the risk (kind of like Mo’ Money Mo’ Problems….but, you know, with data)
The obvious problem: We’ve lost control. All hope is lost.
So there we go. It’s not just us talking about what data growth means to businesses, the prestigious writers at The Economist are in the same boat (only they have cooler examples). It’s not just that companies need to worry about where they’re going to put all this digital stuff. Instead, the big worry is trying to find the important data among the junk, and figure out how to let the right people get to it while keeping the wrong people away from it.




















