Size doesn't matter to database thrusting Clustrix

Easily pleased, quickly done

Clustrix clustered server nodes loaded with Intel SSDs chew through parallelised database queries in a flash.

At a press event in San Jose, database startup Clustrix introduced its new CEO and described its technology. CEO Robin Purohit joined Clustrix 30 days ago, coming from HP and Mercury, and Veritas before that. His arrival saw the founders Paul Mikesell and Sergeo Tsarev step back from day-to-day operations at the 45-strong company.

Clustrix is a parallelised clustered database (CDS) designed for online transaction processing (OLTP) that runs on a set of servers giving it high-performance and fault-tolerance.

Let's hear it from the company's mouth:

CDS … can handle queries from simple point selects and updates to complicated SQL joins and aggregates. It is optimised for highly transactional OLTP workloads and also works for OLAP queries. The Clustrix architecture can start small and expand seamlessly with business needs to arbitrary scale. Tables can range from 0 to billions of rows in size. Workloads can range from a few to hundreds of thousands of transactions per second. It can handle simple key / value operations to full ACID-compliant transactional SQL.

Clustrix's intellectual property is mainly its Sierra database software but it has also designed server nodes clustered using InfiniBand. These are needed to run the software as this is not a a database that runs on commercial, off-the-shelf (COTS), servers - although they are X86 engines.

Co-founder Paul Mikesell was the founder and director of engineering at Isilon, where he designed, architected, and developed all of Isilon’s products up to the EMC purchase. Also, chief technology officer Aaron Passey comes from Isilon. Where Isilon is clustered nodes for files (unstructured data), Clustrix uses clustered hardware node technology for structured databases.

This is the cutting edge and flash is the blade

According to Clustrix a traditional monolithic database cannot scale "simply by bolting on an expandable storage layer. A distributed storage engine with a traditional planner and execution environment does not allow sufficient concurrency to scale a table to billions of rows and still obtain reasonable performance."

Local queries with local locking

The company needed to bridge the gap somehow between these two positions and saw that data and node locality was the way to go:

The key observation to be made is that local queries can be satisfied with local locking, local data, and local cache. A query operating on local data need not talk to other nodes. Locks on the data structures can be very short lived. Operations on different bits of data can be completely independent and operate with perfect parallelism.

The amount of total concurrency supported becomes a simple function on the number of independent data stores that contain that data. The magic then becomes the engine that ties these independent, high performance data stores into a global single-instance database.

There's a lot of local processing going on that has to be co-ordinated and COTS server engines can't cut it. Hence the somewhat specialised Sierra hardware engines. These use Intel SSDs, not PCIe flash Fusion-io-style. It took three-and-a half-years for Mikesell and Tsarev to get the core software written and the hardware odes designed and specced.

The nodes are 2U enclosures with Intel processors, multi-level cell Intel SSDs and an Intel flash controller inside them.

Map Reduce for structured data

Purohit says: "We do Map Reduce for the structured world," and bring the query to the data, not the data to the query, so to speak. The pay-off, he say, is that CDS is faster, simpler to manage, scales more linearly, and is less expensive than an Oracle alternative,.

He argues that Oracle's Exadata system was not built for internet-scale and is expensive "It costs around $1m versus Clustrix' starting price of $140K." This is not really an apple-for-apples comparison as Exadata is for data warehousing and business intelligence whereas Clustrix is for OLTP, the heart of Oracle's business.

The CDS cluster is effectively an appliance and is managed as a single database instance, according to Purohit.

Fifteen customers are evaluating Clustrix technology and "they have never lost any data." PhotoBox in Europe is one tester, an online backup company in the Mozy and Carbonite class is another. It uses Clustrix to store its metadata, but not the basic backup file data.

Three-year lead?

Clustrix will launch itself next year and will then probably seek a third funding round after the current Series B funding of $12m runs out. Purohit will spend some of this building a sales and marketing team - he reckons Clustrix has a three-year lead over competitors and his biggest issue is growing Clustrix's business capabilities and infrastructure fast enough to generate sales and so both sustain Clustrix' lead and bring it to profitability.

We might see future Clustrix node technology using Intel PCIe flash. The hardware is clever but not that complex; its the Sierra database engine software that does the business. This provides a big enough barrier to the entry of competitors into this space with the InfiniBand-connected, flash-enhanced servers making it that little bit more difficult.

A lesson we can draw; spindle-based storage is no longer fast enough for advanced data-hungry servers which will have 2-tier memories; DRAM with NAND. This is the cutting edge and flash is the blade. It's getting sharper and sharper and will surely become a permanent fixture of server design. ®

Similar topics

Broader topics

Other stories you might like

  • Running Windows 10? Microsoft is preparing to fire up the update engines

    Winter Windows Is Coming

    It's coming. Microsoft is preparing to start shoveling the latest version of Windows 10 down the throats of refuseniks still clinging to older incarnations.

    The Windows Update team gave the heads-up through its Twitter orifice last week. Windows 10 2004 was already on its last gasp, have had support terminated in December. 20H2, on the other hand, should be good to go until May this year.

    Continue reading
  • Throw away your Ethernet cables* because MediaTek says Wi-Fi 7 will replace them

    *Don't do this

    MediaTek claims to have given the world's first live demo of Wi-Fi 7, and said that the upcoming wireless technology will be able to challenge wired Ethernet for high-bandwidth applications, once available.

    The fabless Taiwanese chip firm said it is currently showcasing two Wi-Fi 7 demos to key customers and industry collaborators, in order to demonstrate the technology's super-fast speeds and low latency transmission.

    Based on the IEEE 802.11be standard, the draft version of which was published last year, Wi-Fi 7 is expected to provide speeds several times faster than Wi-Fi 6 kit, offering connections of at least 30Gbps and possibly up to 40Gbps.

    Continue reading
  • Windows box won't boot? SystemRescue 9 may help

    An ISO image you can burn or drop onto a USB key

    The latest version of an old friend of the jobbing support bod has delivered a new kernel to help with fixing Microsoft's finest.

    It used to be called the System Rescue CD, but who uses CDs any more? Enter SystemRescue, an ISO image that you can burn, or just drop onto your Ventoy USB key, and which may help you to fix a borked Windows box. Or a borked Linux box, come to that.

    SystemRescue 9 includes Linux kernel 5.15 and a minimal Xfce 4.16 desktop (which isn't loaded by default). There is a modest selection of GUI tools: Firefox, VNC and RDP clients and servers, and various connectivity tools – SSH, FTP, IRC. There's also some security-related stuff such as Yubikey setup, KeePass, token management, and so on. The main course is a bunch of the usual Linux tools for partitioning, formatting, copying, and imaging disks. You can check SMART status, mount LVM volumes, rsync files, and other handy stuff.

    Continue reading

Biting the hand that feeds IT © 1998–2022