For both performance and capacity reasons, companies running large transaction processing systems, whether they are tickled directly by Web users or just end users working behind the company firewall, sometimes have to partition their production relational databases. This practice, called sharding, is a pain in the neck. Actually, it is several pains in the neck. And ScaleBase has some software unguents to cope with it.
ScaleBase is a startup founded in Israel and located now in Boston that is rolling out its first product, called the Database Load Balancer, out today. As the name suggests, it is a proxy server that sits in front of the actual database and in this case, the tool breaks a monolithic relational database into chunks and spreads it out across multiple physical servers. (That's the sharding part.)
Companies have been doing sharding for years, and it is a particularly popular technique for goosing the performance of databases working on very large servers or spread across many clustered nodes. However, if you shard your database, you have to basically rewrite the entire data access layer of a database management system – this is basically what Oracle Real Application Clusters, or RAC, does. Homegrown sharding algorithms often spread data over a fixed number of nodes, and reports and applications based on the database have to be tweaked to be aware of the shards. Backing up and tuning each database node also has to be done more or less by hand.
The ScaleBase Database Load Balancer wants to trick those applications, backup programs, and report writers into thinking they are talking to one database even though they are legion. The automated sharding software is the brainchild of Doron Levari, currently the CEO at ScaleBase, and Liran Zelkha, vice president of business development. Levari has been a database administrator for 15 years, and ran Aluna, a database consulting firm that was eventually sold to Matrix, the largest system integrator in Israel. Zelkha has worked for a number of large-scale database and cloud projects and kept running into the same issues of performance and scalability.
"The third time we wrote a sharding layer for a customer, we knew we were on to something," Zelkha tells El Reg with a laugh.
The Database Load Balancer looks exactly like a MySQL or Oracle database would at the network level to any application. But it shards the database across multiple nodes, and does so automatically. The database proxy then accepts SQL commands and depending on what those commands are, it either runs the query against the appropriate subset of the database or across all the shards at once. You don't have to change one line of your application code, but you may have to work out a different license with your database vendor.
The Database Load Balancer is packaged up in a virtual machine that is compatible with Amazon's EC2 compute cloud (that's where the 500 beta testers have been playing around with it since it quietly went into beta in January) as well as VMware's ESXi hypervisor. The database shards themselves can be run inside virtual machines or on bare metal, the Database Load Balancer doesn't care. Each node of the ScaleBase tool can manage from 8 to 12 database nodes, according to Zelkha, and the server running the Database Load Balancer is "nothing too fancy", just a two-socket machine with four-core x64 processors and 16GB of memory doing the trick.
MySQL is just a start
At the moment, the Database Load Balancer supports the open source MySQL database, now controlled by Oracle, and Zelkha says that the next database to get front-ended and sharded will likely be Oracle's eponymous database. Depending on customer demand, ScaleBase will add support for IBM's DB2, Microsoft's SQL Server, and other open source databases such as PostgreSQL. The sharding program was written in Java and requires a Java SE6-compliant runtime to operate. While all of the beta testers have deployed the tool on top of Linux, the program will run atop AIX. Solaris, HP-UX, and any other box that has the right Java support. Customers should cluster their ScaleBase sharding nodes for high availability, of course, and the architecture recommends having standaby servers for each shard as well.
The Database Load Balancer is not for everyone, says Zelkha. It is aimed at databases that are 50GB or larger and that have to field tens to hundreds of requests per second. Depending on the configuration that customers use to shard the database, they can cut response time to one-quarter to one-half of whatever it was on a monolithic database setup and get in position for linear scaling as they need to add data and therefore nodes to their sharded database clusters. (ScaleBase has posted some initial performance metrics to give you a feel for it.) Zelkha warns that on databases that are under 10GB in size, using the tool will actually probably hurt performance.
At the moment, ScaleBase is targeting transaction processing and hyperscale Web applications with the tool, but over time will make tweaks to it that help companies build clustered data warehouses. Within the next twelve months, ScaleBase will add support at least for one more database – either Oracle 11g or SQL Server. It could do better than this, depending on how sales take off and the technical issues with front-ending these databases.
Unlike a lot of software vendors, ScaleBase has also published its price list. A development version of the Database Load Balancer costs $1,500 per year per back-end database node and comes with 9x5 business hour support. The Enterprise Edition allows for the ScaleBase tool to be installed on multiple servers and has 24x7 support and costs $5,000 per back-end database and $4,500 per database node if you order ten or more. The Premium Edition boosts support response to one-hour turnaround (from four hours in the Enterprise Edition) and adds phone support as well (instead of just web and email); it costs $6,000 per database node, or $5,400 if you order ten or more. ®