Cray puts super stake in the big data ground

Crunch this


Big data may or may not pan out for the users, but it is a bit of a boom for IT vendors, who are scrambling to prove their data analytics chops and go for the easiest money in the market these days. And to that end, supercomputer maker Cray is setting up a dedicated division to chase big data biz.

The division, called YarcData, is a bit of a private joke. YARC is an acronym that is short for "Yet Another Router Chip", and it is the architectural name that Cray slapped onto the high radix router at the heart of the experimental "BlackWidow" supercomputer. This was commercialized as none other than the "Gemini" XE interconnect inside its latest XE6 Opteron-based massively parallel supers as well as the XK6 hybrid Opteron-Tesla machines. Yarc is also Cray spelled backwards, so presumable the new division is "a tad Cray."

Cray already had a knowledge management practice, but has decided to create a proper division – pulling in employees from research and development, marketing, sales, services, and support and dedicating them towards creating and supporting hardware and software for running big data and analytics workloads (as distinct from the kinds of simulation workloads that Cray's gear generally runs).

"Cray is best known for building supercomputers that can run massive scientific and engineering simulations, and from that work we have developed unique technologies and amassed significant experience working with some of the largest data-intensive environments in the world," explained Peter Ungaro, Cray's president and CEO, in a statement announcing the new division. "This makes our entry into the big data market a natural evolution."

Cray has hired a manager from outside the company to run the division: Arvind Parthasarathi, who was named senior vice president and general manager of YarcData. Prior to joining Cray, Parthasarathi was senior vice president and general manager of Informatica's Master Data Management (MDM) business unit, and he was previously vice president of product management for the company's data quality products. (Which means, by the way, that Parthasarathi has a keen understanding of the fact that the biggest problem that big companies have with big data projects is that their information is largely garbage.)

Before joining Informatica, Parthasarathi was director of product management at i2 Technologies (now part of JDA Software), running its RFID, product information management, supply chain integration, and supply chain event management products. He started his career at Oracle, where he was a product line manager in charge of the software giant's Intel Technologies division. Parthasarathi has a BS in computer science from the Indian Institute of Technology and a MS in computer science from the Massachusetts Institute of Technology.

So here's the fun bit: Trying to figure out what Cray is actually going to do in the big data racket. Cray did not speak of such things today, of course, but here's what is obvious from El Reg's systems desk. First, Cray can build server clusters with tens of thousands of cores and wonking clustered file systems with a high-speed XE interconnect linking nodes to each other. If you could beef up a Cray XE blade with some disk drives, you could make a hell of a Hadoop cluster.

Also, the Cray Linux Environment (a variant of SUSE Linux) has a nifty feature called Cluster Compatibility Mode, which makes the XE interconnect look like a standard Ethernet controller as far as Linux applications are concerned. CLE 4.0, the latest release, supports the Java JDK 1.6.0 and can therefore run Java applications.

And the Hadoop MapReduce algorithm and its HDFS file system is a humongous Java app. At the moment, Hadoop tops out at around 4,000 nodes maximum, and Cray could certainly help the open source Apache project do a better job scaling across more nodes. There's no reason why the open source R stats program could not be parallelized, as Revolution Analytics has done, and run across a Cray XE6 super – and run in conjunction with Hadoop, chewing on the reduced data.

Supercomputer rival Silicon Graphics has been going on about how its shared memory parallel supers, the UV 1000 Xeon-based machines, can scale Windows Server 2008 R2 across 256 cores and 2TB of memory – the upper limit of that Microsoft operating system – making it an ideal box for running big databases for online transaction processing and data warehousing. Since last fall, SGI has been selling variants of its Rackable rackish servers with the Cloudera CDH3 commercial Hadoop distribution. SGI has taken down a number of Hadoop deals with as many as 1,200 nodes each in the quarter ended in December.

Cray would have to do some substantial engineering to the XE interconnect to create a shared memory architecture that could match the Windows Server scalability that SGI has. But on parallel commercial workloads like Hadoop, and maybe even on NoSQL data stores, the engineering job is do-able. ®


Other stories you might like

  • FTC signals crackdown on ed-tech harvesting kid's data
    Trade watchdog, and President, reminds that COPPA can ban ya

    The US Federal Trade Commission on Thursday said it intends to take action against educational technology companies that unlawfully collect data from children using online educational services.

    In a policy statement, the agency said, "Children should not have to needlessly hand over their data and forfeit their privacy in order to do their schoolwork or participate in remote learning, especially given the wide and increasing adoption of ed tech tools."

    The agency says it will scrutinize educational service providers to ensure that they are meeting their legal obligations under COPPA, the Children's Online Privacy Protection Act.

    Continue reading
  • Mysterious firm seeks to buy majority stake in Arm China
    Chinese joint venture's ousted CEO tries to hang on - who will get control?

    The saga surrounding Arm's joint venture in China just took another intriguing turn: a mysterious firm named Lotcap Group claims it has signed a letter of intent to buy a 51 percent stake in Arm China from existing investors in the country.

    In a Chinese-language press release posted Wednesday, Lotcap said it has formed a subsidiary, Lotcap Fund, to buy a majority stake in the joint venture. However, reporting by one newspaper suggested that the investment firm still needs the approval of one significant investor to gain 51 percent control of Arm China.

    The development comes a couple of weeks after Arm China said that its former CEO, Allen Wu, was refusing once again to step down from his position, despite the company's board voting in late April to replace Wu with two co-chief executives. SoftBank Group, which owns 49 percent of the Chinese venture, has been trying to unentangle Arm China from Wu as the Japanese tech investment giant plans for an initial public offering of the British parent company.

    Continue reading
  • SmartNICs power the cloud, are enterprise datacenters next?
    High pricing, lack of software make smartNICs a tough sell, despite offload potential

    SmartNICs have the potential to accelerate enterprise workloads, but don't expect to see them bring hyperscale-class efficiency to most datacenters anytime soon, ZK Research's Zeus Kerravala told The Register.

    SmartNICs are widely deployed in cloud and hyperscale datacenters as a means to offload input/output (I/O) intensive network, security, and storage operations from the CPU, freeing it up to run revenue generating tenant workloads. Some more advanced chips even offload the hypervisor to further separate the infrastructure management layer from the rest of the server.

    Despite relative success in the cloud and a flurry of innovation from the still-limited vendor SmartNIC ecosystem, including Mellanox (Nvidia), Intel, Marvell, and Xilinx (AMD), Kerravala argues that the use cases for enterprise datacenters are unlikely to resemble those of the major hyperscalers, at least in the near term.

    Continue reading

Biting the hand that feeds IT © 1998–2022