Come back, AI. All is forgiven: We know we've mocked you in the past, but we need help analyzing 26,000 papers on COVID-19, coronaviruses

Please develop machine-learning algos to analyse this text for a vaccine

A dataset of more than 29,000 scientific papers focused on COVID-19, and the coronavirus family as a whole, has been publicly shared to ultimately help the medical world thwart the bio-nasties.

Specifically, it is hoped AI-based tools can be developed to comb through this COVID-19 Open Research Dataset (CORD-19) and dig up vital clues and insights on how to treat and contain the virus.

Folks at America's Georgetown University’s Center for Security and Emerging Technology led a White-House-driven effort to get Microsoft, the US National Institute of Health’s National Library of Medicine, and the Chan Zuckerberg Initiative to collect acres of relevant literature for the dataset.

That was then passed to the Allen Institute for AI (AI2) – a research lab set up by the late Paul Allen, co-founder of Microsoft – so that the data could be transformed into a machine-readable format for algorithms to process.

“This dataset will be best utilized by natural-language processing researchers,” Doug Raymond, general manager of Semantic Scholar, at AI2, told The Register.

Semantic Scholar, released in 2015, is a specialised search engine that uses a variety of AI techniques to unearth information from mountains of journal articles. CORD-19 is now available on the Semantic Scholar site.

“Users can use this comprehensive dataset to answer a variety of questions – what research has been done before, what the research describes, how to reproduce results, and more,” Raymond added. "We are providing articles from hundreds of journals, many from China and Korea. Without using this new master resource, it’s very difficult for researchers to find comprehensive information. We will continue to provide updates to this dataset as the situation evolves and new research is released."

The dataset is also hosted on Kaggle, a Google-owned outfit known for its online coding competitions. Kaggle has set up a new challenge calling upon “artificial intelligence experts” to help develop new text-and-data-mining tools to pore over CORD-19.

“It’s difficult for people to manually go through more than 20,000 articles and synthesize their findings,” said Anthony Goldbloom, co-founder and CEO of Kaggle.

Coronavirus in San Francisco

After a weekend of WTF-ing at Trump's COVID-19 testing website vow, Google-Verily's site finally comes to life... And it's not what was promised


“Recent advances in technology can be helpful here. We’re putting machine readable versions of these articles in front of our community of more than 4 million data scientists. Our hope is that AI can be used to help find answers to a key set of questions about COVID-19.”

The hope is boffins in the medical community can use AI tools – whether from Semantic Scholar or bespoke made – with the dataset to get the info they need to combat the ongoing coronavirus pandemic, whether that’s finding out how it mutates and spreads, or what type of drugs are most effective in developing a potential vaccine.

Smaller tech companies are trying to do their bit during the pandemic, too. Paperspace, a cloud company that rents out GPUs and CPUs to run machine-learning models, announced it was offering some of its hardware resources for free.

"We're not looking to support anything specific so long as the research is related to COVID-19," a Paperspace spokesperson told El Reg. "We're really hopeful machine learning and deep learning can accelerate a solution."

Cloud instances spun up on older GPU models such as Nvidia's Quadro M4000 and P5000, released in 2015 and 2017 respectively, are free. Prices have been slashed for more powerful chips, including the later V100 and P100.

“Decisive action from America’s science and technology enterprise is critical to prevent, detect, treat, and develop solutions to COVID-19,” Michael Kratsios, Uncle Sam's CTO, said.

“The White House will continue to be a strong partner in this all hands-on-deck approach. We thank each institution for voluntarily lending its expertise and innovation to this collaborative effort, and call on the United States research community to put artificial intelligence technologies to work in answering key scientific questions about the novel Coronavirus.” ®

Similar topics

Other stories you might like

  • Graviton 3: AWS attempts to gain silicon advantage with latest custom hardware

    Key to faster, more predictable cloud

    RE:INVENT AWS had a conviction that "modern processors were not well optimized for modern workloads," the cloud corp's senior veep of Infrastructure, Peter DeSantis, claimed at its latest annual Re:invent gathering in Las Vegas.

    DeSantis was speaking last week about AWS's Graviton 3 Arm-based processor, providing a bit more meat around the bones, so to speak – and in his comment the word "modern" is doing a lot of work.

    The computing landscape looks different from the perspective of a hyperscale cloud provider; what counts is not flexibility but intensive optimization and predictable performance.

    Continue reading
  • The Omicron dilemma: Google goes first on delaying office work

    Hurrah, employees can continue to work from home and take calls in pyjamas

    Googlers can continue working from home and will no longer be required to return to campuses on 10 January 2022 as previously expected.

    The decision marks another delay in getting more employees back to their desks. For Big Tech companies, setting a firm return date during the COVID-19 pandemic has been a nightmare. All attempts were pushed back so far due to rising numbers of cases or new variants of the respiratory disease spreading around the world, such as the new Omicron strain.

    Google's VP of global security, Chris Rackow, broke the news to staff in a company-wide email, first reported by CNBC. He said Google would wait until the New Year to figure out when campuses in the US can safely reopen for a mandatory return.

    Continue reading
  • This House believes: A unified, agnostic software environment can be achieved

    How long will we keep reinventing software wheels?

    Register Debate Welcome to the latest Register Debate in which writers discuss technology topics, and you the reader choose the winning argument. The format is simple: we propose a motion, the arguments for the motion will run this Monday and Wednesday, and the arguments against on Tuesday and Thursday. During the week you can cast your vote on which side you support using the poll embedded below, choosing whether you're in favour or against the motion. The final score will be announced on Friday, revealing whether the for or against argument was most popular.

    This week's motion is: A unified, agnostic software environment can be achieved. We debate the question: can the industry ever have a truly open, unified, agnostic software environment in HPC and AI that can span multiple kinds of compute engines?

    Our first contributor arguing FOR the motion is Nicole Hemsoth, co-editor of The Next Platform.

    Continue reading

Biting the hand that feeds IT © 1998–2021