Craptastic analysis turns 2.8 zettabytes of Big Data into 2.8 ZB of FAIL

You know EVERYTHING about me, but can you show relevant ads? Nope


Open ... and Shut We can't seem to get enough of Big Data. In its Digital Universe in 2020 report (PDF), IDC forecasts Big Data-related IT spending to rise 40 per cent each year between 2012 and 2020, as the digital universe, now at 2.8 zettabytes (ZB), or 2.8 trillion GB, explodes to 40 ZB.

That's very, very Big Data. It's a pity, therefore, that we currently analyse a mere 0.5 per cent of it all.

Not that all of these data are useful. IDC expects that by 2020, just 33 per cent of the world's data will be useful if analysed. But the delta between today's 0.5 per cent of actually analysed data and 33 per cent that could be useful if analysed is unlikely to get dramatically better. We like to think of ourselves as hyper-analytical, what with our quantified selves and "measure everything" approaches to business.

But, as I've argued before, we're actually quite inept at analysing data, be the data big or small.

Not only are we bad at regulating our intake of information, to paraphrase Nick Carr, but we're also really bad at separating signal from noise. In hindsight, we think we see clearly, but even then we tend to miss the point.

Despite a world awash in data, just consider just two failings to analyse these data properly:

  • Social networks hold immense quantities of data, and account for a significant chunk of the new data being created, according to IDC, yet "social networks aren’t using [our data] to create for us a social experience that is more like our real world, and frankly more in tune with our human-ness," as O'Reilly's Jim Stogdill argues. Heck, Facebook and its data hoarding ilk can't even serve up moderately useful ads, despite a tremendous amount of data collected on me.
  • Despite an ever-rising amount of surveillance and its associated data, there's little evidence that it actually reduces crime. For example, a man was recently gunned down near Columbus Circle in New York City, which was caught on CCTV, but this has neither helped to catch the killer nor stop him. Glyn Moody hence perhaps correctly worries about the threat a surveillance society has on the rights of citizens, especially in light of its apparent futility.

Ironically, IDC calls out surveillance footage and social media as two of the best opportunities to apply Big Data. Perhaps things will get dramatically better by 2020, but it feels like the technologies we use to analyse our data tend to be backward-oriented, rather than real-time. This is changing, with the advent of Apache Drill and other technologies, but perhaps the real focus should be on highlighting what is happening, and then letting business managers intuit from these trends what they should do.

Compounding the problem of volume is where that volume is shifting - emerging markets:

do not use without permissions

Importantly, the areas with the most data to analyse are planning to spend the least amount of money to do so. So while the US will spend around $1.80/GB and Western Europe $2.50/GB to manage Big Data, China (~$1.30/GB) and India (~$0.95/GB) are much lower. As IDC notes, this disparity can partly be explained by differing economic conditions but "also represents differences in the sophistication of the underlying IT, content, and information industries — and may represent a challenge for emerging markets when it comes to managing, securing, and analysing their respective portions of the digital universe."

Open-source data solutions will help to lower the costs of Big Data storage and analysis for all regions, but the far bigger problem is still knowing what to do with all these data. We still don't have much of a clue. Will we become dramatically better at managing our data by 2020? What do you think? ®

Matt Asay is vice president of corporate strategy at 10gen, the MongoDB company. Previously he was SVP of business development at Nodeable, which was acquired in October 2012. He was formerly SVP of biz dev at HTML5 start-up Strobe (now part of Facebook) and chief operating officer of Ubuntu commercial operation Canonical. With more than a decade spent in open source, Asay served as Alfresco's general manager for the Americas and vice president of business development, and he helped put Novell on its open source track. Asay is an emeritus board member of the Open Source Initiative (OSI). His column, Open...and Shut, appears three times a week on The Register. You can follow him on Twitter @mjasay.

Narrower topics


Other stories you might like

  • US Supreme Court puts Texas social media law on hold
    Justices Roberts, Kavanaugh, Barrett help halt enforcement of HB 20

    The US Supreme Court on Tuesday reinstated the suspension of Texas' social-media law HB 20 while litigation to have the legislation declared unconstitutional continues.

    The law, signed in September by Texas Governor Greg Abbott (R), and promptly opposed, forbids large social media companies from moderating lawful content based on a "viewpoint," such as "smoking cures cancer" or "vaccines are poison" or hateful theories of racial superiority. Its ostensible purpose is to prevent internet giants from discriminating against conservative social media posts, something that studies indicate is not happening.

    Those fighting the law – industry groups and advocacy organizations – say the rules would require large social media services such as Facebook and Twitter to distribute "lawful but awful" content – hate speech, misinformation, and other dubious material. They argue companies have a First Amendment right to exercise editorial discretion for the content distributed on their platforms.

    Continue reading
  • Florida's content-moderation law kept on ice, likely unconstitutional, court says
    So cool you're into free speech because that includes taking down misinformation

    While the US Supreme Court considers an emergency petition to reinstate a preliminary injunction against Texas' social media law HB 20, the US Eleventh Circuit Court of Appeals on Monday partially upheld a similar injunction against Florida's social media law, SB 7072.

    Both Florida and Texas last year passed laws that impose content moderation restrictions, editorial disclosure obligations, and user-data access requirements on large online social networks. The Republican governors of both states justified the laws by claiming that social media sites have been trying to censor conservative voices, an allegation that has not been supported by evidence.

    Multiple studies addressing this issue say right-wing folk aren't being censored. They have found that social media sites try to take down or block misinformation, which researchers say is more common from right-leaning sources.

    Continue reading
  • Amazon finally opens doors to its serverless analytics
    Still managing app servers by hand? What is this, 2012?

    If you want to run analytics in a serverless cloud environment, Amazon Web Services reckons it can help you out all while reducing your operating costs and simplifying deployments.

    As is typical for Amazon, the cloud giant previewed this EMR Serverless platform – EMR once meaning Elastic MapReduce – at its Re:Invent conference in December, and only opened the services to the public this week.

    AWS is no stranger to serverless with products like Lambda. However, its EMR offering specifically targets analytics workloads, such as those using Apache Spark, Hive, and Presto.

    Continue reading
  • We've never even built datacenters using robots here on Earth
    Interesting Moon experiment raises cool questions about disposal of hidden value

    Opinion Making a call on the quality of a new idea in tech can be hard. But if you ask me, not in the case of Lonestar Data Holdings, whose plan to build datacenters on the Moon is literal lunacy.

    Every detail of the roadmap, from tentative tiny proofs of concept to massive underground server farms built and tended by Moon robots, is priceless nonsense. From Apollo onward, every spacecraft has had data storage and network access. We have retrieved data held in New Horizon's 16GB filing system from Kuiper Belt object Arrokoth, 16 thousand times more distant than the Moon. Concept bloody well proved.

    As for building bit lairs in lava pipes by robot, nobody's built a datacenter by robot on Earth yet. And nobody seems minded to try.

    Continue reading

Biting the hand that feeds IT © 1998–2022