NASA's hyperwall wonderwall uses virtual flash SAN

Distributed NVMe SAN solves slow off-node access latency issues


Case study How do get fast parallel data access to 128 compute nodes doing simulation processing off a slow, although massively parallel access data set?

You could employ flash cache burst buffering, as DDN might propose, or try out an NVMe flash drive-based virtual SAN, which is what NASA Ames did in a visualisation situation.

NASA's Advanced Supercomputing (NAS) facility is located at the NASA Ames Research Centre. The High-End Computing Capability Project (HECC) is about enabling scientists and engineers to use large-scale modelling, simulation, analysis and visualisation to help ensure NASA space missions are successful.

As part of that, the project has developed a Hyperwall, a 16 column by eight row grid of vertically mounted display screens, to show an active visualisation of a simulation at large scale. An example of such a simulation is projects such as ECCO (Estimating the Circulation and Climate of the Ocean) which involves flow field pathfinding. Simulations like this involve large and high-dimensional data sets produced by NASA supercomputers and instruments.

Scientists can use different tools, viewpoints, and parameters to display the same data or datasets and rattle through visualisations to check the results.

The problem that Excelero's NVMesh sorted was how to drive the hyperwall's 128 display screens and 130 compute nodes (128 + two spares) behind them fast enough. The work is based on processing a huge data set with an enormous number of small and random IOs. A Lustre file system backed by disk drives was inadequate, as the theoretical 80GB/sec file system produced only 100s of MB/sec of throughput.

NASA_Ames_hyperwall_650

NASA Ames Hyperwall visualisation multi-screen display

A 2TB flash drive was placed in each compute node behind the hyperwall. Then programmers split the data set up into 2TB or smaller pieces and copied them to each compute node. Processing then had to take note of data locality during the compute and interactive phases of the visualisation process, which complicated the programming.

Flow field pathfinding involves two techniques:

  • In-core methods are used on data that is in memory or on fast, local media such as flash
  • Out-of-core techniques are applied when the data to be manipulated is not local to the compute node, meaning a longer access time

Slow simulations meant less effective scientist and engineer use of the hyperwall visualisations. That was the issue that Excelero claims its NVMesh technology fixed.

NASA_Ames_HECC_simulation

ECCO simulation display

NVMe virtual flash SAN

If all the 128 node 2TB flash drives are aggregated into a pool, a single 256TB logical device, a virtual flash SAN, accessed by RDMA, then every flash drive would effectively be local to every compute node. For the compute node apps, directly accessing network device targets and leveraging RDMA gives them parallel read access.

NASA Ames' visualisation group installed NVMesh and got the features of centrally managed block storage – logical volumes, data protection and failover – without a SAN's traditional performance limitations. Excelero says this transforms the performance, economics and the feasibility of multiple use cases in visualisation, analytics/simulation, and burst buffer use.

NVMesh has three main modules:

  • The Storage Management Module is a centralised web-based GUI, RESTful API that controls system configuration,
  • A Target Module is installed on any host that shares its NVMe drives, validating initial connections from clients to the drives, and then keeping out of the data path,
  • A Client Block Driver runs on every host/image that needs to access NVMesh’s logical block volumes.

In converged deployments, the Client and Target Modules co-exist on the same server.

NASA_Ames_SLS_simulation

SLS visualisation

Ephemeral flash

Excelero says that, in this instance, because the simulation data is protected in the main Lustre file system, the 256TB device was treated as ephemeral, although using non-volatile media.

So it was created without data protection as a RAID-0 logical volume striped across all 128 nodes/devices. For simplicity, the device was attached to a single node, formatted with an XFS file system and populated with data. The file system was then unmounted and mounted (read-only) on all 128 compute nodes.

NVMesh logical block volumes can be used with clustered file systems and can also be protected against host or drive failures.

Latency matters

This NVMeshing added 5μsec data access latency, mostly from the network, over local NVMe drive latency.

There was no need for data locality constraints in programming; all data accesses behave as if they are local to every compute node. Preliminary results with fio (flexible IO tester) benchmarks running in all 128 nodes demonstrated over 30 million random read 4K IOPs. The average latency for those IOPs was 199μsec; the lowest value was 8μsec. Throughput has been measured at over 140 GB/sec of bandwidth at 1MB block size.

Excelero points out that, when utilising native NVMe queuing mechanisms, this method completely bypasses a target host's CPU (those with NVMe drives) preserving processing power for applications.

The simulation result is that visualisations run smoother and faster and NAS Ames scientists and engineers can interact with them more naturally, thus becoming more productive. ®

Similar topics

Broader topics


Other stories you might like

  • AI tool finds hundreds of genes related to human motor neuron disease

    Breakthrough could lead to development of drugs to target illness

    A machine-learning algorithm has helped scientists find 690 human genes associated with a higher risk of developing motor neuron disease, according to research published in Cell this week.

    Neuronal cells in the central nervous system and brain break down and die in people with motor neuron disease, like amyotrophic lateral sclerosis (ALS) more commonly known as Lou Gehrig's disease, named after the baseball player who developed it. They lose control over their bodies, and as the disease progresses patients become completely paralyzed. There is currently no verified cure for ALS.

    Motor neuron disease typically affects people in old age and its causes are unknown. Johnathan Cooper-Knock, a clinical lecturer at the University of Sheffield in England and leader of Project MinE, an ambitious effort to perform whole genome sequencing of ALS, believes that understanding how genes affect cellular function could help scientists develop new drugs to treat the disease.

    Continue reading
  • Need to prioritize security bug patches? Don't forget to scan Twitter as well as use CVSS scores

    Exploit, vulnerability discussion online can offer useful signals

    Organizations looking to minimize exposure to exploitable software should scan Twitter for mentions of security bugs as well as use the Common Vulnerability Scoring System or CVSS, Kenna Security argues.

    Better still is prioritizing the repair of vulnerabilities for which exploit code is available, if that information is known.

    CVSS is a framework for rating the severity of software vulnerabilities (identified using CVE, or Common Vulnerability Enumeration, numbers), on a scale from 1 (least severe) to 10 (most severe). It's overseen by First.org, a US-based, non-profit computer security organization.

    Continue reading
  • Sniff those Ukrainian emails a little more carefully, advises Uncle Sam in wake of Belarusian digital vandalism

    NotPetya started over there, don't forget

    US companies should be on the lookout for security nasties from Ukrainian partners following the digital graffiti and malware attack launched against Ukraine by Belarus, the CISA has warned.

    In a statement issued on Tuesday, the Cybersecurity and Infrastructure Security Agency said it "strongly urges leaders and network defenders to be on alert for malicious cyber activity," having issued a checklist [PDF] of recommended actions to take.

    "If working with Ukrainian organizations, take extra care to monitor, inspect, and isolate traffic from those organizations; closely review access controls for that traffic," added CISA, which also advised reviewing backups and disaster recovery drills.

    Continue reading

Biting the hand that feeds IT © 1998–2022