Mapping the universe at 30 Terabytes a night

Jeff Kantor, on building and managing a 150 Petabyte database


Interview It makes for one heck of a project mission statement. Explore the nature of dark matter, chart the Solar System in exhaustive detail, discover and analyze rare objects such as neutron stars and black hole binaries, and map out the structure of the Galaxy.

The Large Synoptic Survey Telescope (LSST) is, in the words of Jeff Kantor, LSST data management project manager, "a proposed ground-based 6.7 meter effective diameter (8.4 meter primary mirror), 10 square-degree-field telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night." Phew.

When it's fully operational in 2016, the LSST will: "Open a movie-like window on objects that change or move on rapid timescales: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects.

"The superb images from the LSST will also be used to trace billions of remote galaxies and measure the distortions in their shapes produced by lumps of Dark Matter, providing multiple tests of the mysterious Dark Energy."

In its planned 10-year run, the LSST will capture, process and store more than 30 Terabytes (TB) of image data each night, yielding a 150 Petabytes (PB) database. Talking to The Reg, Kantor called this the largest non-proprietary dataset in the world.

Data management is one of the most challenging aspects of the LSST. Every pair of 6.4GB images must be processed within 60 seconds in order to provide astronomical transient alerts to the community. In order to do this, the Data Management System is composed of a number of key elements. These are:

  • the Mountain/Base facility, which does initial data reduction and alert generation on a 25 TFLOPS Linux cluster with 60PB of storage (in year 10 of the survey)
  • a 2.5 Gbps network that transfers the data from Chile (where the telescope itself will be based) to the U.S. and within the US
  • the Archive Center, which re-reduces the data and produces annual data releases on a 250 TFLOPS Linux cluster and 60PB of storage (in year 10 of the survey)
  • the Data Access Centers which provide access to all of the data products as well as 45 TFLOPS and 12 Petabytes of end user available computing and storage.

So what's a time-critical system of this magnitude written in?

The data reduction pipelines are developed in C++ and Python. They rely on approximately 30 off-the-shelf middleware packages/libraries for parallel processing, data persistence and retrieval, data transfer, visualization, operations management and control, and security. The current design is based on MySQL layered on a parallel, fault-tolerant file system.


Other stories you might like

  • Toyota, Subaru recall EVs because tires might literally fall off
    Toyota says 'all of the hub bolts' can loosen even 'after low-mileage use'

    Toyota and Subaru are recalling several thousand electric vehicles that might spontaneously shed tires due to self-loosening hub bolts. 

    Toyota issued the recall last week for 2023 bZ4X all-electric SUVs, 2,700 of which are affected, the automaker said. Subaru is recalling all-electric Solterras, which were developed jointly with Toyota and have the same issue, Reuters reported.

    Japan's auto safety regulating body said "sharp turns and sudden braking could cause a hub bolt to loosen," Reuters said, though it's unknown if any actual accidents have been caused by the defect. In its recall notice, Toyota said "all of the hub bolts" can loosen "after low-mileage use," but said it was still investigating the cause of, and driving conditions that can lead to, the issue. 

    Continue reading
  • Alcatel-Lucent Enterprise adds Wi-Fi 6E to 'premium' access points
    Company claims standard will improve performance in dense environments

    Alcatel-Lucent Enterprise is the latest networking outfit to add Wi-Fi 6E capability to its hardware, opening up access to the less congested 6GHz spectrum for business users.

    The France-based company just revealed the OmniAccess Stellar 14xx series of wireless access points, which are set for availability from this September. Alcatel-Lucent Enterprise said its first Wi-Fi 6E device will be a high-end "premium" Access Point and will be followed by a mid-range product by the end of the year.

    Wi-Fi 6E is compatible with the Wi-Fi 6 standard, but adds the ability to use channels in the 6GHz portion of the spectrum, a feature that will be built into the upcoming Wi-Fi 7 standard from the start. This enables users to reduce network contention, or so the argument goes, as the 6GHz portion of the spectrum is less congested with other traffic than the existing 2.4GHz and 5GHz frequencies used for Wi-Fi access.

    Continue reading
  • Will Lenovo ever think beyond hardware?
    Then again, why develop your own software à la HPE GreenLake when you can use someone else's?

    Analysis Lenovo fancies its TruScale anything-as-a-service (XaaS) platform as a more flexible competitor to HPE GreenLake or Dell Apex. Unlike its rivals, Lenovo doesn't believe it needs to mimic all aspects of the cloud to be successful.

    While subscription services are nothing new for Lenovo, the company only recently consolidated its offerings into a unified XaaS service called TruScale.

    On the surface TruScale ticks most of the XaaS boxes — cloud-like consumption model, subscription pricing — and it works just like you'd expect. Sign up for a certain amount of compute capacity and a short time later a rack full of pre-plumbed compute, storage, and network boxes are delivered to your place of choosing, whether that's a private datacenter, colo, or edge location.

    Continue reading

Biting the hand that feeds IT © 1998–2022