Cray puts super stake in the big data ground

Crunch this


Big data may or may not pan out for the users, but it is a bit of a boom for IT vendors, who are scrambling to prove their data analytics chops and go for the easiest money in the market these days. And to that end, supercomputer maker Cray is setting up a dedicated division to chase big data biz.

The division, called YarcData, is a bit of a private joke. YARC is an acronym that is short for "Yet Another Router Chip", and it is the architectural name that Cray slapped onto the high radix router at the heart of the experimental "BlackWidow" supercomputer. This was commercialized as none other than the "Gemini" XE interconnect inside its latest XE6 Opteron-based massively parallel supers as well as the XK6 hybrid Opteron-Tesla machines. Yarc is also Cray spelled backwards, so presumable the new division is "a tad Cray."

Cray already had a knowledge management practice, but has decided to create a proper division – pulling in employees from research and development, marketing, sales, services, and support and dedicating them towards creating and supporting hardware and software for running big data and analytics workloads (as distinct from the kinds of simulation workloads that Cray's gear generally runs).

"Cray is best known for building supercomputers that can run massive scientific and engineering simulations, and from that work we have developed unique technologies and amassed significant experience working with some of the largest data-intensive environments in the world," explained Peter Ungaro, Cray's president and CEO, in a statement announcing the new division. "This makes our entry into the big data market a natural evolution."

Cray has hired a manager from outside the company to run the division: Arvind Parthasarathi, who was named senior vice president and general manager of YarcData. Prior to joining Cray, Parthasarathi was senior vice president and general manager of Informatica's Master Data Management (MDM) business unit, and he was previously vice president of product management for the company's data quality products. (Which means, by the way, that Parthasarathi has a keen understanding of the fact that the biggest problem that big companies have with big data projects is that their information is largely garbage.)

Before joining Informatica, Parthasarathi was director of product management at i2 Technologies (now part of JDA Software), running its RFID, product information management, supply chain integration, and supply chain event management products. He started his career at Oracle, where he was a product line manager in charge of the software giant's Intel Technologies division. Parthasarathi has a BS in computer science from the Indian Institute of Technology and a MS in computer science from the Massachusetts Institute of Technology.

So here's the fun bit: Trying to figure out what Cray is actually going to do in the big data racket. Cray did not speak of such things today, of course, but here's what is obvious from El Reg's systems desk. First, Cray can build server clusters with tens of thousands of cores and wonking clustered file systems with a high-speed XE interconnect linking nodes to each other. If you could beef up a Cray XE blade with some disk drives, you could make a hell of a Hadoop cluster.

Also, the Cray Linux Environment (a variant of SUSE Linux) has a nifty feature called Cluster Compatibility Mode, which makes the XE interconnect look like a standard Ethernet controller as far as Linux applications are concerned. CLE 4.0, the latest release, supports the Java JDK 1.6.0 and can therefore run Java applications.

And the Hadoop MapReduce algorithm and its HDFS file system is a humongous Java app. At the moment, Hadoop tops out at around 4,000 nodes maximum, and Cray could certainly help the open source Apache project do a better job scaling across more nodes. There's no reason why the open source R stats program could not be parallelized, as Revolution Analytics has done, and run across a Cray XE6 super – and run in conjunction with Hadoop, chewing on the reduced data.

Supercomputer rival Silicon Graphics has been going on about how its shared memory parallel supers, the UV 1000 Xeon-based machines, can scale Windows Server 2008 R2 across 256 cores and 2TB of memory – the upper limit of that Microsoft operating system – making it an ideal box for running big databases for online transaction processing and data warehousing. Since last fall, SGI has been selling variants of its Rackable rackish servers with the Cloudera CDH3 commercial Hadoop distribution. SGI has taken down a number of Hadoop deals with as many as 1,200 nodes each in the quarter ended in December.

Cray would have to do some substantial engineering to the XE interconnect to create a shared memory architecture that could match the Windows Server scalability that SGI has. But on parallel commercial workloads like Hadoop, and maybe even on NoSQL data stores, the engineering job is do-able. ®


Other stories you might like

  • DuckDuckGo tries to explain why its browsers won't block some Microsoft web trackers
    Meanwhile, Tails 5.0 users told to stop what they're doing over Firefox flaw

    DuckDuckGo promises privacy to users of its Android, iOS browsers, and macOS browsers – yet it allows certain data to flow from third-party websites to Microsoft-owned services.

    Security researcher Zach Edwards recently conducted an audit of DuckDuckGo's mobile browsers and found that, contrary to expectations, they do not block Meta's Workplace domain, for example, from sending information to Microsoft's Bing and LinkedIn domains.

    Specifically, DuckDuckGo's software didn't stop Microsoft's trackers on the Workplace page from blabbing information about the user to Bing and LinkedIn for tailored advertising purposes. Other trackers, such as Google's, are blocked.

    Continue reading
  • Despite 'key' partnership with AWS, Meta taps up Microsoft Azure for AI work
    Someone got Zuck'd

    Meta’s AI business unit set up shop in Microsoft Azure this week and announced a strategic partnership it says will advance PyTorch development on the public cloud.

    The deal [PDF] will see Mark Zuckerberg’s umbrella company deploy machine-learning workloads on thousands of Nvidia GPUs running in Azure. While a win for Microsoft, the partnership calls in to question just how strong Meta’s commitment to Amazon Web Services (AWS) really is.

    Back in those long-gone days of December, Meta named AWS as its “key long-term strategic cloud provider." As part of that, Meta promised that if it bought any companies that used AWS, it would continue to support their use of Amazon's cloud, rather than force them off into its own private datacenters. The pact also included a vow to expand Meta’s consumption of Amazon’s cloud-based compute, storage, database, and security services.

    Continue reading
  • Atos pushes out HPC cloud services based on Nimbix tech
    Moore's Law got you down? Throw everything at the problem! Quantum, AI, cloud...

    IT services biz Atos has introduced a suite of cloud-based high-performance computing (HPC) services, based around technology gained from its purchase of cloud provider Nimbix last year.

    The Nimbix Supercomputing Suite is described by Atos as a set of flexible and secure HPC solutions available as a service. It includes access to HPC, AI, and quantum computing resources, according to the services company.

    In addition to the existing Nimbix HPC products, the updated portfolio includes a new federated supercomputing-as-a-service platform and a dedicated bare-metal service based on Atos BullSequana supercomputer hardware.

    Continue reading

Biting the hand that feeds IT © 1998–2022