This article is more than 1 year old

Come back, AI. All is forgiven: We know we've mocked you in the past, but we need help analyzing 26,000 papers on COVID-19, coronaviruses

Please develop machine-learning algos to analyse this text for a vaccine

A dataset of more than 29,000 scientific papers focused on COVID-19, and the coronavirus family as a whole, has been publicly shared to ultimately help the medical world thwart the bio-nasties.

Specifically, it is hoped AI-based tools can be developed to comb through this COVID-19 Open Research Dataset (CORD-19) and dig up vital clues and insights on how to treat and contain the virus.

Folks at America's Georgetown University’s Center for Security and Emerging Technology led a White-House-driven effort to get Microsoft, the US National Institute of Health’s National Library of Medicine, and the Chan Zuckerberg Initiative to collect acres of relevant literature for the dataset.

That was then passed to the Allen Institute for AI (AI2) – a research lab set up by the late Paul Allen, co-founder of Microsoft – so that the data could be transformed into a machine-readable format for algorithms to process.

“This dataset will be best utilized by natural-language processing researchers,” Doug Raymond, general manager of Semantic Scholar, at AI2, told The Register.

Semantic Scholar, released in 2015, is a specialised search engine that uses a variety of AI techniques to unearth information from mountains of journal articles. CORD-19 is now available on the Semantic Scholar site.

“Users can use this comprehensive dataset to answer a variety of questions – what research has been done before, what the research describes, how to reproduce results, and more,” Raymond added. "We are providing articles from hundreds of journals, many from China and Korea. Without using this new master resource, it’s very difficult for researchers to find comprehensive information. We will continue to provide updates to this dataset as the situation evolves and new research is released."

The dataset is also hosted on Kaggle, a Google-owned outfit known for its online coding competitions. Kaggle has set up a new challenge calling upon “artificial intelligence experts” to help develop new text-and-data-mining tools to pore over CORD-19.

“It’s difficult for people to manually go through more than 20,000 articles and synthesize their findings,” said Anthony Goldbloom, co-founder and CEO of Kaggle.

Coronavirus in San Francisco

After a weekend of WTF-ing at Trump's COVID-19 testing website vow, Google-Verily's site finally comes to life... And it's not what was promised


“Recent advances in technology can be helpful here. We’re putting machine readable versions of these articles in front of our community of more than 4 million data scientists. Our hope is that AI can be used to help find answers to a key set of questions about COVID-19.”

The hope is boffins in the medical community can use AI tools – whether from Semantic Scholar or bespoke made – with the dataset to get the info they need to combat the ongoing coronavirus pandemic, whether that’s finding out how it mutates and spreads, or what type of drugs are most effective in developing a potential vaccine.

Smaller tech companies are trying to do their bit during the pandemic, too. Paperspace, a cloud company that rents out GPUs and CPUs to run machine-learning models, announced it was offering some of its hardware resources for free.

"We're not looking to support anything specific so long as the research is related to COVID-19," a Paperspace spokesperson told El Reg. "We're really hopeful machine learning and deep learning can accelerate a solution."

Cloud instances spun up on older GPU models such as Nvidia's Quadro M4000 and P5000, released in 2015 and 2017 respectively, are free. Prices have been slashed for more powerful chips, including the later V100 and P100.

“Decisive action from America’s science and technology enterprise is critical to prevent, detect, treat, and develop solutions to COVID-19,” Michael Kratsios, Uncle Sam's CTO, said.

“The White House will continue to be a strong partner in this all hands-on-deck approach. We thank each institution for voluntarily lending its expertise and innovation to this collaborative effort, and call on the United States research community to put artificial intelligence technologies to work in answering key scientific questions about the novel Coronavirus.” ®

More about

More about

More about


Send us news

Other stories you might like