CERN swells storage space beyond 1EB for LHC's latest ion-whacking experiments
A petabyte or more a day of readings? No problem, pal
In preparation for its latest round of ion-smashing tests, CERN boosted its storage array for the experiments to more than one million terabytes in total size.
The facility's data store now exceeds an exabyte of raw capacity — with much of it on hard disk drives and an "increasing fraction of flash drives," the European super-lab's team explained in a report.
It's one thing to increase capacity, it's another thing to be able to access it in a timely fashion, as Andreas Peters, who heads up CERN's EOS storage system, explained: "It is not just a celebration of data capacity, it is also a performance achievement, thanks to the reading rate of the combined data store crossing, for the first time, the 1TB/s threshold."
The upgrade, which added 289 PB of capacity since last year, was made to support the latest round of heavy-ion experiments within CERN's 27-kilometre Large Hadron Collider, which kicked off last week. These experiments involve smashing heavy ions together at nearly the speed of light to study the fundamental building blocks of the known universe.
As we understand it, these experiments, which will take place over several years at the ring-shaped particle collider near Geneva, Switzerland, will produce a prodigious amount of data — in excess of 600 PB — which has to be processed before being committed to long-term tape storage. During the last heavy-ion run between 2015 and 2018, CERN said it processed an average of one petabyte of data a day.
While a petabyte of data might seem like a lot, thanks to high-capacity storage chassis, it doesn't actually take that much physical space. Using high-capacity disks, it's now possible to cram a petabyte worth of storage into a single chassis. An exabyte of storage, however, is another matter entirely requiring rows of racks full of disk shelves to contain.
- CERN experiment proves gravity pulls antimatter the way Einstein predicted
- CERN swaps out databases to feed its petabyte-a-day habit
- CERN spots Higgs boson decay breaking the rules
- CERN celebrates 30 years since releasing the web to the public domain
CERN says its disk storage array features approximately 111,000 devices — most of which are hard drives but with increasing amounts of flash in the mix. The systems runs on EOS, an open source platform developed by CERN for use with the Large Hadron Collider and other scientific workloads.
We've asked for more information on just how large disks are and how many of each CERN is using; we'll let you know if we hear back.
While you'd only need 100,000 10 TB drives to hit that raw exabyte mark, the array wasn't built overnight. In fact, it has grown 56x from the initial 18 PB storage system in 2010, and between 2020 and today the system has more than doubled in capacity.
According to the post, with more than a hundred thousand discs humming along, drive failures are a regular occurrence. According to a report [PDF] from a few years back CERN was replacing 30 failed drives each week, necessitating a fair bit of planned resilience using different data replication methods.
The announcement comes just weeks after CERN ditched its time series database and monitoring platform in favor of one from VictoriaMetrics after researchers ran into performance issues with InfluxDB and Prometheus. ®