GOAST in the machine: Lenovo’s genomics powerhouse speeds discovery
High performance with low overhead
Paid Feature The capabilities of genomic sequencing and analysis have expanded at an incredible pace in the last two decades. The outcomes of this combined work have enabled targeted innovations at the individual level with personalized medicine and at the broad, population scale where vast understanding of genomic evolution and conditions can be richly understood.
With so many technological points of progress along the way, the wider field of genomics can finally begin to standardize on best practices, tools, and platforms. Many of these foundational technical elements have been, and are still, iterative in development. One example of this iterative innovation is the revolutionary Genome Analysis Toolkit (GATK), a result of collaboration between the Broad Institute and Intel in 2017.
The GATK package includes everything from AVX-512 optimizations for critical algorithms (Smith-Waterman and Pair-HMM in particular) to a broader codesign-focused reference architecture.
To put this in real-world context, GATK’s value was demonstrated to dramatically reduce runtimes for genome processing, moving researchers from 160 hours for a whole genome and 4-6 hours for an exome down to just under 11 hours for the genome and under a half-hour for a single exome with GATK on Intel’s Select Solutions for Genomic Analysis reference architecture. This led to a 5X performance boost with GATK and was certified by the Broad Institute for its own cutting-edge genomics work.
That is nothing short of revolutionary for a field that requires fast time to solution to maintain a rapid pace of discovery. As population genetics becomes more central to understanding the roots of diseases and genetic conditions, the need for vastly scalable, efficient, and high-performance computing systems and software is growing. That’s why it has taken another milestone in iterative progress to push genetic analysis to the next level, this time via Lenovo’s effort to build on the successes of GATK with a ready-to-roll reference architecture that has next-generation sequencing and analysis in mind.
Lenovo has innovated on top of Intel’s GATK and Select Solutions for Genomic Analysis platform with its GOAST reference architecture for demanding genomics workloads. Through careful optimization, tuning between hardware and software, Lenovo has shown a 27X to 40X improvement over GATK for whole genome and exomes respectively. GOAST goes beyond mere hardware, software co-design and optimizations; it includes novel elements like the GOAST Scaler, which can determine the amount of HPC horsepower needed for user-defined workloads.
Even without detailed, workload-specific optimizations, Lenovo’s GOAST can process one whole genome in 18 minutes and a single exome in 30 seconds. None of this relies on novel hardware or expensive top-end interconnects or elements that require sophisticated refactoring of codes. To take that a step further, Lenovo is enabling a GOAST-driven datacenter to process 30 genomes each day or 1,000 individual exomes, opening the door for more robust, speedy genomic insights. All of this happens with GOAST, hand-in-hand with expertise from the Lenovo teams who took the Broad Institute and Intel’s efforts to a new level.
An even bigger picture view reveals a vGOAST in the Machine: Lenovo’s Genomics Powerhouse These innovations touch all parts of such an organization, beginning with researchers and developers who can gain insight faster and scale workloads according to need, all the way down to financial managers at research institutions who can use elements like GOAST Scaler to predict HPC usage and requirements and budget according to need.
For those who manage HPC systems, GOAST brings high performance with low overhead with configurations including the Lenovo GOAST Plus 8-socket system, which has been shown to deliver whole genome sequencing results in 18 minutes. For other researchers, a 2-socket Lenovo GOAST base system might be the acceleration needed for genomic analysis. No matter what scale, time is of the essence and sparking a fast deployment is part of that—making a GOAST system all the more appealing for life sciences organizations.
The Power of GOAST
Few researchers can speak to the power of GOAST quite like Dr. Miguel Vazquez, head of Genomics and Informatics at the Barcelona Supercomputing Center (BSC). His teams saw a 40X improvement in time to result using their GOAST system. “What used to take 30 hours on our systems now takes just under an hour — around 45 minutes,” Vazquez says.
In addition to other avenues of genetic research, Vazquez and BSC teams have been exploring some of the most pressing questions in genomic analysis with their work on the International Pan-cancer Analysis of Whole Genomes. This ambitious undertaking required integrative analysis of 2,658 whole-cancer genomes and corresponding tissues for 38 types of cancer.
This work evaluated specific mutations to understand cancer drivers and had a major scientific impact in showing the genetic roots of cancer—and how much we still don’t know about potential causes and potential treatments. The Pan-Cancer Analysis of Whole Genomes project took two years, even with the sizable supercomputing power of world-class machines like the Mare Nostrum system at BSC and other European machines. Looking back, Vazquez says if teams had used just a single Lenovo GOAST Plus node, the analysis would have taken 6 months.
In addition, a 50x whole exome run in a single node can take about two hours but on the BSC team’s GOAST machine, researchers were able to capture each sample in 1.5 minutes, an 80X speedup.
“With Lenovo GOAST, we got a taste of just how much we could increase throughput capacity,” he adds. “For instance, the solution has the potential to help us process 32 whole genomes per node each day, or 351,000 whole exome samples annually.2 Achieving similar throughput levels at the Mare Nostrum would significantly increase the number of research projects that we can support and help scientists get their results even faster.”
In this era of rapid genomic progress, from systems to software and analytics, the key is to build iteratively. Just as BSC and other research centers built on top of the work of other institutions and technology partners, Lenovo too is taking a step forward. The GOAST technology is built on time-tested technologies at scale and by making them more scalable and robust, Lenovo will continue to power the next generation of next-gen sequencing and analysis.
Sponsored by Lenovo.