IS there anything sexier than hype-converged systems?

First Map Reduce, then Hive/Impala, but what’s next?


Comment Everyone likes hyper-converged systems. They are cool, dense, fast, energy-saving, agile, scalable, manageable, easy-to-use, and whatever else you want. But you know what? They have their limits too.

They are good for average workloads, a large range of workloads indeed, but not for those workloads that need huge amounts of a specific resource to the detriment of others, such as Big Data, for example.

Data grows (steadily ... and exponentially) and nothing gets thrown away. Since data adds up, the concept of the “data lake” has taken shape. Even systems created for big data are starting to sense this problem and system architects are beginning to think differently about storage.

I’m going to look at Hadoop because it gives a good example of a hyper-converged infrastructure.

Today, most Hadoop clusters are built on top of an HDFS (Hadoop Distributed File System). HDFS characteristics make this filesystem much cheaper, reliable, and more scalable than many other solutions but, at the same time, it’s limited by the cluster design itself.

CPU/RAM/network/capacity ratios are important to design the best balanced systems, but things change so rapidly that what you have implemented today could become very inefficient tomorrow. I know that we are living in a very commodity-hardware-world right now, but despite the short lifespan of modern hardware I’m not convinced that enterprises are willing to change their infrastructures (and spend boatloads of money) very often.

Look at what's happening. Two years ago it was all about Map Reduce, then it was all about Hive/Impala and the like, now it’s all about Spark and other in-memory technologies. What’s next?

Whatever, my first question is: “Can they run on the same cluster?”

Yes, of course, because the underlying infrastructure, now Hadoop 2.6, has evolved as well.

But the real question is: “Can they run with the same level of efficiency on the same two-year-old cluster?” Mmmm, probably not.

And then another question arises: “Can you update that cluster to meet the new requirements?”

Well, this is a tough one to answer. Capacity grows but you don’t normally need to process all the data at the same time while, on the other hand, applications, business needs, and workloads change very quickly, making it difficult to build a hyper-converged cluster and serve them all efficiently.

Things get even more complicated if the big data analytics cluster becomes an enterprise-wide utility. Classic Hadoop tools are not the only ones, and many departments in your organisation have different views and need to make different analyses on different data sets (which often come from the same raw data); it’s one of the advantages of a data lake.

Next page: Why divergence?

Other stories you might like

  • It's primed and full of fuel, the James Webb Space Telescope is ready to be packed up prior to launch

    Fingers crossed the telescope will finally take to space on 22 December

    Engineers have finished pumping the James Webb Space Telescope with fuel, and are now preparing to carefully place the folded instrument inside the top of a rocket, expected to blast off later this month.

    “Propellant tanks were filled separately with 79.5 [liters] of dinitrogen tetroxide oxidiser and 159 [liters of] hydrazine,” the European Space Agency confirmed on Monday. “Oxidiser improves the burn efficiency of the hydrazine fuel.” The fuelling process took ten days and finished on 3 December.

    All eyes are on the JWST as it enters the last leg of its journey to space; astronomers have been waiting for this moment since development for the world’s largest space telescope began in 1996.

    Continue reading
  • China to upgrade mainstream RISC-V chips every six months

    Home-baked silicon is the way forward

    China is gut punching Moore's Law and the roughly one-year cadence for major chip releases adopted by the Intel, AMD, Nvidia and others.

    The government-backed Chinese Academy of Sciences, which is developing open-source RISC-V performance processor, says it will release major design upgrades every six months. CAS is hoping that the accelerated release of chip designs will build up momentum and support for its open-source project.

    RISC-V is based on an open-source instruction architecture, and is royalty free, meaning companies can adopt designs without paying licensing fees.

    Continue reading
  • The SEC is investigating whistleblower claims that Tesla was reckless as its solar panels go up in smoke

    Tens of thousands of homeowners and hundreds of businesses were at risk, lawsuit claims

    The Securities and Exchange Commission has launched an investigation into whether Tesla failed to tell investors and customers about the fire risks of its faulty solar panels.

    Whistleblower and ex-employee, Steven Henkes, accused the company of flouting safety issues in a complaint with the SEC in 2019. He filed a freedom of information request to regulators and asked to see records relating to the case in September, earlier this year. An SEC official declined to hand over documents, and confirmed its probe into the company is still in progress.

    “We have confirmed with Division of Enforcement staff that the investigation from which you seek records is still active and ongoing," a letter from the SEC said in a reply to Henkes’ request, according to Reuters. Active SEC complaints and investigations are typically confidential. “The SEC does not comment on the existence or nonexistence of a possible investigation,” a spokesperson from the regulatory agency told The Register.

    Continue reading

Biting the hand that feeds IT © 1998–2021