Open source at America's famous Los Alamos Lab: Pragmatism as its nucleus
Its 20,000-node cluster uses outdated MariaDB – for very good reasons
Established 80 years ago this year, Los Alamos National Labs remains most famous for its central role in developing the first atomic bomb. But that belies the breadth of scientific research it has undertaken since, encompassing physics, chemistry and biology, and addressing the threat of COVID-19.
Despite the breadth of subjects the US Department of Energy institution researches, its thousands of scientists have one need in common: data. One of the people ensuring they get it is Steven Senator, a scientist at the lab's high-performance computing division.
"High performance is perhaps an overused buzzword, but in this context, what it means is we build clusters, that service physicists, computational biologists, and computational chemists," he told the MariaDB OpenWorks conference this week in New York City.
"The ones might look at protein folding and the spread of diseases, most recently. These are not contrived examples; they are real experiments, real models and simulations that my customers use."
The $4.6 billion-budget organization employs around 18,000 people in its labs and around two thirds of them need regular access to high performance computing to do their work. It's a tough workload to manage.
Senator started his computing career with Tandem, which built systems to support telcos and stock exchanges around the world. Although the impact of downtime may not be as immediate in scientific research, the relative economic impact may be just as severe.
"Our service levels impact those users: whether they can submit a job; whether they can debug their code; whether they can get the job running and look at it while it's calculating," he told The Register.
"All of that means time to solution. It's not the metric of dollars, like the FinTech community or the transaction community, but its cost of labor in a relatively small community of high value scientists."
Older isn't necessarily bad
It might surprise some, then, that the 20,000-node cluster, dubbed Trinity, running on Cray hardware relies on an open-source community edition of MariaDB. It's a recent fully patched community version, not the latest MariaDB Enterprise Server. But the requirements for security and reliability make it the choice necessary.
"The code that runs on our systems has oversight. We can actually look at the open source code and run that through both auditing, oversight, testing, and all sorts of other inspection mechanisms needed for our environment," Senator said.
We can look at the open source code and run that through both auditing, oversight, testing
"Some of this is arm's length measurement to look at capabilities and failure modes. The fact that the code is inspectable, that it's open source, is it quite a big deal."
The choice of MariaDB, as opposed to the related open-source alternatives such as MySQL or PostgreSQL, was also influenced by its ability to execute “fine-grained locking.”
“It means I can take a backup of the database within seconds, rather than within minutes. And when the database is locked, I can't update it. I can't add a user but even more importantly, I might not be able to proceed with some of the calculations. If you have coarse grained locking, it's like locking everyone in the room. And no one can even enter the room,” Senator said.
- Uncle Sam needs novel memory for nuke sims. So why did it choose Intel?
- If today's tech gets you down, remember supercomputers are still being used for scientific progress
- MariaDB's Xpand offers PostgreSQL compatibility without the forking drama
- MariaDB cuts jobs, repeats 'going concern' warning to stock market
Fine-grained locking – which allows a more detailed approach to locking users in or out of the system, is now supported by MySQL but at the time it was a differentiator for MariaDB, he said.
“If you compared traditional MySQL dump to MariaDB backup, at least in our environment, [in MySQL] you're talking about a tens of minutes to do is a backup for the data. Because of the fine grained locking work [in MariaDB], I can get a complete backup done in under a minute and usually approximately 35 seconds. Without that, I could not hit my service level agreements,” he said.
A mix of software but does it work?
Operating systems employed in the center’s supercomputers are a mixture of Red Hat and SUSE Linux.
The flagship cluster runs the community database because of its age and the conservative nature of the scientific organization, but prototype and other production systems have begun to use MariaDB’s Enterprise Edition. The application on the Trinity super will move to an Enterprise flavor of MariaDB as that system retires, Senator said.
The difference between the two editions is an issue for some in the open-source community. Like MongoDB and CockroachDB, MariaDB employs a tailored licensing agreement, which is open source in the sense that users can see the code but places restrictions on the commercialization of products based on the code.
The difference between supposed pure and more business-minded or restrictive open source has caused a schism among some open-source developers and commentators, though it's not a priority or headache for Senator.
"I read columns on the topic. While I’m not sure I have enough knowledge to characterize the community, from my perspective, as an engineer, as long as I can see the code, it doesn't matter. To me it's [not] what your intentions are; it’s your actions," he explained.
"Your actions as put into code are either sufficient to give me a feature that I need with benefits or it's not. If it happens to benefit some other company that it doesn't matter to me. Either way, it's about the code, not about what motivated someone to act appropriately."
LANL runs its high-performance systems in its own datacentres. Reflecting on a MariaDB conference itinerary that placed a heavy emphasis on modern cloud databases, Senator encouraged fellow engineers to remember the basics.
“Being able to burst into the cloud is wonderful, but don't neglect the core engine in your car. Just be able to have a performance database that you can restore to hit your service level agreements,” he said. ®