EMC wants to be the Linux of big data

Opens up Chorus tool, borgs agile coders Pivotal Labs

To broaden its reach in the big-data arena, disk-array maker EMC's Greenplum division, which peddles data warehousing and Hadoop appliances and software, announced that it will open source its Chorus management and collaboration tools. EMC also has acquired Pivotal Labs, experts in agile programming, to help it build better big-data software and, equally importantly, help others do so.

EMC has always been serious about data, but in case you haven't noticed it, the company is now very serious about big data and the software that is used to chew it up and regurgitate useful bits of information.

"Having database-kernel developers doing a UI was not working out really well," conceded Luke Lonergan, CTO at the Greenplum division to El Reg in an interview after EMC made its announcements in a webcast presentation hosted in San Francisco and New York.

About a year ago, Greenplum hired Pivotal Labs, which was founded in 1989 and which has a couple hundred code-slingers that could teach the database programmers some new tricks. They got the Chorus product back on track, and then EMC pulled a Victor Kiam and liked the company so much it bought it today for an undisclosed sum.

Greenplum previewed the new Chrous 2.0 tool in December 2011, it being a central feature of its Unified Analytics Platform. The idea is to take data warehouses running the Greenplum variant of PostgreSQL and Hadoop clusters running either Greenplum HD (the open source distro) or Greenplum MR (the open-core version from MapR Technologies that EMC resells) and mash them up and glue them together using the Chorus collaboration environment.

EMC president Pat Gelsinger

Gelsinger: Open source Chorus 'is a big step for us'

Chorus 2.0 has a Facebook-style collaboration interface to data sets and analytics tools so people can share data. It also has a full metadata search so researchers can do data exploration in either structured or unstructured data.

Equally importantly, Chorus 2.0 can spin up a sandbox inside a data warehouse or Hadoop cluster, or spin up a data mart inside of a VMware virtual machine, so different "data scientists" can chew on different parts of the data and not create physically separate data silos running on other machines.

The current Chorus 1.2 does not know how to talk to Hadoop, and it can't spin up a personal sandbox for an analyst. Chorus 2.0 will also have integrated data visualization tools to help analysts and other big-data users get a feel for the shape of the data so they know where they might need to drill down more to try to understand some aspect of their business better.

Chorus 2.0 has been in beta testing for the past four months, says Lonergan, and during a tour of the Pivotal Labs facility in San Francisco that was part of the webcast, one of the code-slingers said that the product was in release-candidate phase right now. Lonergan later confirmed to El Reg that Chorus 2.0 will ship on March 23.

During that tour of Pivotal Labs – the company also has offices in New York and had an office in Singapore for a while – it was shown how the company has teams of a dozen or so people coding away on projects with pairs of programmers coding together on parts of the code.

Musical chairs

Every day or so, the programmers play musical chairs, and over the course of a week or so, everyone has been teamed up with everyone else on that development team – the Chorus team, for example, has ten people on it.

The idea is that both coders in a pair do some programming, and no one programmer becomes a subject-matter expert on any piece of the code. Everyone gets to know all of the code this way – not by studying it, but by working on it.

Every time the code changes, a build is done to the code. If it fails any tests, it is immediately flagged as failing and everyone on the team can see the issue – there is tremendous peer pressure to get the code fixed. You make iterative changes in the code, and you fix things as you go along rather than waiting until the end of a protracted development process.

EMC did not disclose the price it paid to acquire Pivotal Labs, but said that the company would remain an independent unit, much as Greenplum, VMware, RSA Security, and others have been left reasonably untouched by the EMC mothership after being acquired.

Pivotal Labs is privately held and sells a tool called Pivotal Tracker that is a scheduling system for agile programming, forcing developers to program down into small chunks, called stories, that they work on in teams. There are 240,000 developers using the Pivotal Tracker tool today, and EMC said in a statement that it was committed to investing in this tool and letting Pivotal Labs do what it does.

Pivotal Labs is big on Ruby on Rails. In fact, according to Lonergan, it has been instrumental in getting Greenplum to port the Chorus tool from the Java back-end used with the 1.2 release to Ruby on Rails with the 2.0 release.

Scott Yara, senior vice president of products at the Greenplum unit, said that as Greenplum got exposed to the coders at Pivotal Labs and the new techniques, its own programmers starting thinking outside of the box about Chorus, social media, open source, and what the product could be.

As far as bringing social media to the Chorus tool, which the company started mulling four years ago, before EMC even came a-calling, Yara said that this "seemed like a stretch."

But as time went by, "people kept pushing us," said Yara, and they started thinking about the big platforms that have established themselves in the past couple of years – Linux, Java, Hadoop, and Android, just to name a few – and they all have one thing in common: they are open source. And thus the idea was born to take the Chorus tool open source and position it as a platform for integrating big-data applications.

"This is a big step for EMC," explained Pat Gelsinger, president and COO of EMC's Information Infrastructure Products group, which includes Greenplum and a bunch of other products. "We've helped open source, but we have never been open source."

EMC did not provide a lot of details about the OpenChorus project, but the company said that it planned to have the code open sometime in the second half of this year.

Unlike Hadoop and other big-data projects, where the open sourcing was done to solicit help with actually completing the code and ruggedizing it for commercial use, EMC said that it was taking the Java and Android models, where the development work would be done largely by the sponsoring company.

The opening up of the Chorus source code is about making companies comfortable in investing in Chorus – they know it can survive any vendor – and getting developers to code applications that work through it and bring extensions to the tool itself. EMC is not looking for help on coding Chorus per se, but it sounds like it could have used some.

Lonergan would not reveal if EMC has made a decision about what license under which the Chorus tool will be distributed, but he hinted that the kind of "open" licenses used by Apache projects were appealing and the more restrictive GNU General Public License was not. "Our objective is to have a license that makes this partner-friendly and community-building," Lonergan said.

It will be interesting to see how other big-data players – IBM, Oracle, Teradata, and a slew of other smaller players such as Cloudera, Hortonworks, and so on – will participate in the OpenChorus community and link their products into the tools. Maybe they will play, and maybe they won't. ®

Other stories you might like

  • Lonestar plans to put datacenters in the Moon's lava tubes
    How? Founder tells The Register 'Robots… lots of robots'

    Imagine a future where racks of computer servers hum quietly in darkness below the surface of the Moon.

    Here is where some of the most important data is stored, to be left untouched for as long as can be. The idea sounds like something from science-fiction, but one startup that recently emerged from stealth is trying to turn it into a reality. Lonestar Data Holdings has a unique mission unlike any other cloud provider: to build datacenters on the Moon backing up the world's data.

    "It's inconceivable to me that we are keeping our most precious assets, our knowledge and our data, on Earth, where we're setting off bombs and burning things," Christopher Stott, founder and CEO of Lonestar, told The Register. "We need to put our assets in place off our planet, where we can keep it safe."

    Continue reading
  • Conti: Russian-backed rulers of Costa Rican hacktocracy?
    Also, Chinese IT admin jailed for deleting database, and the NSA promises no more backdoors

    In brief The notorious Russian-aligned Conti ransomware gang has upped the ante in its attack against Costa Rica, threatening to overthrow the government if it doesn't pay a $20 million ransom. 

    Costa Rican president Rodrigo Chaves said that the country is effectively at war with the gang, who in April infiltrated the government's computer systems, gaining a foothold in 27 agencies at various government levels. The US State Department has offered a $15 million reward leading to the capture of Conti's leaders, who it said have made more than $150 million from 1,000+ victims.

    Conti claimed this week that it has insiders in the Costa Rican government, the AP reported, warning that "We are determined to overthrow the government by means of a cyber attack, we have already shown you all the strength and power, you have introduced an emergency." 

    Continue reading
  • China-linked Twisted Panda caught spying on Russian defense R&D
    Because Beijing isn't above covert ops to accomplish its five-year goals

    Chinese cyberspies targeted two Russian defense institutes and possibly another research facility in Belarus, according to Check Point Research.

    The new campaign, dubbed Twisted Panda, is part of a larger, state-sponsored espionage operation that has been ongoing for several months, if not nearly a year, according to the security shop.

    In a technical analysis, the researchers detail the various malicious stages and payloads of the campaign that used sanctions-related phishing emails to attack Russian entities, which are part of the state-owned defense conglomerate Rostec Corporation.

    Continue reading
  • FTC signals crackdown on ed-tech harvesting kid's data
    Trade watchdog, and President, reminds that COPPA can ban ya

    The US Federal Trade Commission on Thursday said it intends to take action against educational technology companies that unlawfully collect data from children using online educational services.

    In a policy statement, the agency said, "Children should not have to needlessly hand over their data and forfeit their privacy in order to do their schoolwork or participate in remote learning, especially given the wide and increasing adoption of ed tech tools."

    The agency says it will scrutinize educational service providers to ensure that they are meeting their legal obligations under COPPA, the Children's Online Privacy Protection Act.

    Continue reading
  • Mysterious firm seeks to buy majority stake in Arm China
    Chinese joint venture's ousted CEO tries to hang on - who will get control?

    The saga surrounding Arm's joint venture in China just took another intriguing turn: a mysterious firm named Lotcap Group claims it has signed a letter of intent to buy a 51 percent stake in Arm China from existing investors in the country.

    In a Chinese-language press release posted Wednesday, Lotcap said it has formed a subsidiary, Lotcap Fund, to buy a majority stake in the joint venture. However, reporting by one newspaper suggested that the investment firm still needs the approval of one significant investor to gain 51 percent control of Arm China.

    The development comes a couple of weeks after Arm China said that its former CEO, Allen Wu, was refusing once again to step down from his position, despite the company's board voting in late April to replace Wu with two co-chief executives. SoftBank Group, which owns 49 percent of the Chinese venture, has been trying to unentangle Arm China from Wu as the Japanese tech investment giant plans for an initial public offering of the British parent company.

    Continue reading
  • SmartNICs power the cloud, are enterprise datacenters next?
    High pricing, lack of software make smartNICs a tough sell, despite offload potential

    SmartNICs have the potential to accelerate enterprise workloads, but don't expect to see them bring hyperscale-class efficiency to most datacenters anytime soon, ZK Research's Zeus Kerravala told The Register.

    SmartNICs are widely deployed in cloud and hyperscale datacenters as a means to offload input/output (I/O) intensive network, security, and storage operations from the CPU, freeing it up to run revenue generating tenant workloads. Some more advanced chips even offload the hypervisor to further separate the infrastructure management layer from the rest of the server.

    Despite relative success in the cloud and a flurry of innovation from the still-limited vendor SmartNIC ecosystem, including Mellanox (Nvidia), Intel, Marvell, and Xilinx (AMD), Kerravala argues that the use cases for enterprise datacenters are unlikely to resemble those of the major hyperscalers, at least in the near term.

    Continue reading

Biting the hand that feeds IT © 1998–2022