Oracle rolls its own NoSQL and Hadoop

A supremely confident Ellison mounts the Big Data elephant

OpenWorld There's no shortage of ego at Oracle, as evidenced by the effusion of confidence behind the company's OpenWorld announcement of the not-so-humbly named Big Data Appliance.

And then there were the o'erweening keynote presentations by some of the software giant and systems player's top brass on Monday, which included a montage of what Oracle has done so far this year, and hints at things ahead, and which ended with co-president and CFO Safra Catz intoning: "We are big data. And we're also the cloud."

Well, with Oracle owning big chunks of the database, middleware, application, and operating system markets, Catz's pronouncement settles any doubt about the future of information technology and Oracle's place in it. Just send your checks and surrender terms to Larry Ellison care of Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065.

But before Oracle swallows the entire IT market, it has to prove that it actually is big data. Or, more precisely, that its engineered systems can do the kind of MapReduce work that enterprises are increasingly using to cope with their unstructured data.

And so Oracle is creating yet another engineered system to put in its arsenal of things for its direct sales force to sell: the Big Data Appliance.

Although Thomas Kurian, executive vice president of product development, announced the Big Data Appliance as part of his keynote in the wake of some brief peppy talk by co-president Mark Hurd, details are a little sparse. Conceptually, here's what the Big Data Appliance looks like:

Oracle Big Data Appliance

Block diagram of Oracle's Big Daddy, er Data, Appliance

The underlying hardware for the Big Data Appliance is Oracle's Exadata x86 clusters, which support a parallel implementation of the Oracle 11g R2 database running on top of Oracle's RHEL-ish clone of Linux. Oracle Enterprise Linux and Oracle's twist on the open source Xen hypervisor are the appliance's underlying layer.

Oracle is grabbing the open source Hadoop MapReduce tool from the Apache Software Foundation and doing its own distribution for this specific machine; Hadoop is a Java program and runs atop Oracle's own JVMs in the Exadata.

Kurian did not say if Oracle was using the HDFS file system that is normally paired with Hadoop. He had very little to say about the Oracle NoSQL Database, which is a distributed key-value store, although Oracle's announcement predictably says that it is "easy to install, configure and manage, supports a broad set of workloads, and delivers enterprise-class reliability backed by enterprise-class Oracle support."

The announcement, however, neglects to explain, as did Kurian, exactly what Oracle's NoSQL Database is. The company's sparse page outlining the NoSQL Database says that it "scales horizontally to hundreds of nodes with high availability and transparent load balancing."

Only two weeks ago, in a conference call with Wall Street analysts going over Oracle's first quarter financial results for fiscal 2012, Ellison, the company's cofounder and CEO, didn't seem to think that Oracle needed any other kind of database for dealing with unstructured data.

"Oracle has always stored both structured and unstructured data. This is really nothing new. We are constantly adding features to our database to support the storage and searching of unstructured as well as structured data. Autonomy was a shock to us," he said at the time, referring to the company that HP is in the process of buying for $10.3bn for its expertise in coping with unstructured data.

"We looked at the price and thought it was absurdly high," Ellison continued. "We had no interest in making the Autonomy acquisition. We think we're much better off with a couple of smaller acquisitions and continuing to innovate in that area so that the unstructured data and the structured data both find their way into an Oracle database, where it's secure, it's scalable, it runs on Exadata. We think we really don't want to have two separate databases."

Listen, my children, and you shall hear...

Ellison then gave a brief history of the database business, explaining that first there were relational databases, then object relational databases, and now we need to cope with unstructured data.

He neglected to point out, however, that Oracle has a collection of databases that rivals those from IBM (in number, if not in functionality), including the Essbase and TimesTen databases that are now at the heart of its new Exalytics BI appliance, announced yesterday, plus MySQL, Berkeley DB, and rdb from the Alphas and VAXes.

Oracle is not adverse to adding a NoSQL database to the collection, but Ellison sure gave the impression that what the company wanted to do was keep everything inside of the Oracle database – by which he meant 11g R2. As it turns out, Oracle's NoSQL is based on the Berkeley DB key/value database, Oracle confirmed separately today.

The Big Data Appliance stack also includes the Oracle Data Integrator Application Adapter for Hadoop, which links the NoSQL database and the Oracle database to applications, and Oracle Loader for Hadoop, which transforms datasets created by MapReduce-crunching into formats that are native to Oracle databases so they can be sucked into 11g R2.

Although the chart above doesn't show it, the Big Data Appliance also includes the R programming language, a popular open source statistical-analysis tool. This R engine will integrate with 11g R2, so presumably if you want to do statistical analysis on unstructured data stored in and chewed by Hadoop, you will have to move it to Oracle after the chewing has subsided.

This approach to R-Hadoop integration is different from that announced last week between Revolution Analytics, the so-called Red Hat for stats that is extending and commercializing the R language and its engine, and Cloudera, which sells a commercial Hadoop setup called CDH3 and which was one of the early companies to offer support for Hadoop. Both Revolution Analytics and Cloudera now have Oracle as their competitor, which was no doubt no surprise to either.

In any event, the way they do it, the R engine is put on each node in the Hadoop cluster, and those R engines just see the Hadoop data as a native format that they can do analysis on individually. As statisticians do analyses on data sets, the summary data from all the nodes in the Hadoop cluster is sent back to their R workstations; they have no idea that they are using MapReduce on unstructured data.

Oracle did not supply configuration and pricing information for the Big Data Appliance, and also did not say when it would be for sale or shipping to customers. The company did say that it would sell the individual software components in the appliance separately, as it does for other elements of its engineered systems. ®

Other stories you might like

  • 381,000-plus Kubernetes API servers 'exposed to internet'
    Firewall isn't a made-up word from the Hackers movie, people

    A large number of servers running the Kubernetes API have been left exposed to the internet, which is not great: they're potentially vulnerable to abuse.

    Nonprofit security organization The Shadowserver Foundation recently scanned 454,729 systems hosting the popular open-source platform for managing and orchestrating containers, finding that more than 381,645 – or about 84 percent – are accessible via the internet to varying degrees thus providing a cracked door into a corporate network.

    "While this does not mean that these instances are fully open or vulnerable to an attack, it is likely that this level of access was not intended and these instances are an unnecessarily exposed attack surface," Shadowserver's team stressed in a write-up. "They also allow for information leakage on version and build."

    Continue reading
  • A peek into Gigabyte's GPU Arm for AI, HPC shops
    High-performance platform choices are going beyond the ubiquitous x86 standard

    Arm-based servers continue to gain momentum with Gigabyte Technology introducing a system based on Ampere's Altra processors paired with Nvidia A100 GPUs, aimed at demanding workloads such as AI training and high-performance compute (HPC) applications.

    The G492-PD0 runs either an Ampere Altra or Altra Max processor, the latter delivering 128 64-bit cores that are compatible with the Armv8.2 architecture.

    It supports 16 DDR4 DIMM slots, which would be enough space for up to 4TB of memory if all slots were filled with 256GB memory modules. The chassis also has space for no fewer than eight Nvidia A100 GPUs, which would make for a costly but very powerful system for those workloads that benefit from GPU acceleration.

    Continue reading
  • GitLab version 15 goes big on visibility and observability
    GitOps fans can take a spin on the free tier for pull-based deployment

    One-stop DevOps shop GitLab has announced version 15 of its platform, hot on the heels of pull-based GitOps turning up on the platform's free tier.

    Version 15.0 marks the arrival of GitLab's next major iteration and attention this time around has turned to visibility and observability – hardly surprising considering the acquisition of OpsTrace as 2021 drew to a close, as well as workflow automation, security and compliance.

    GitLab puts out monthly releases –  hitting 15.1 on June 22 –  and we spoke to the company's senior director of Product, Kenny Johnston, at the recent Kubecon EU event, about what will be added to version 15 as time goes by. During a chat with the company's senior director of Product, Kenny Johnston, at the recent Kubecon EU event, The Register was told that this was more where dollars were being invested into the product.

    Continue reading

Biting the hand that feeds IT © 1998–2022