This article is more than 1 year old

Computers vs Ebola: Scientists use big data to predict future disease hotspots

And it all boils down to seeing what bats are up to

A team of scientists have developed a model that can predict the likelihood of bat species carrying Ebola and other filoviruses using a machine learning algorithm.

Filoviruses are a group of long filament shaped viruses that encode their genome on a single-stranded RNA. Ebola is the most well-known example; other filoviruses include Marburg disease. Both are lethal viruses that are spread by coming into contact with bodily fluids from an infected person.

The last Ebola outbreak happened in 2014 and resulted in 11,310 deaths, according to the World Health Organisation. Bats are the primary suspects for spreading the disease. It was reported that the first case of the disease which triggered the recent outbreak may have been passed to humans by bats.

Scientists led by the Cary Institute of Ecosystem Studies have turned to machine learning in hope of preventing the future spread of Ebola.

The paper published in the PLOS Neglected Tropical Diseases journal said that machine learning was “particularly well-suited to comparative studies” because it was unaffected by sample biases, hidden interactions and collinearity in data sets.

“The model allows us to move beyond our own biases and find patterns in the data that only a machine can,” said David Hayman, co-author of the study and researcher at the Institute of Vet, Animal & Biomedical Sciences at Massey University.

Massey said the model predicts the location of Ebola risks based on the intrinsic traits of a filovirus-positive bat, instead of looking at the last event.

First, the team of researchers had to identify the profile of a bat that is likely to carry the filoviruses from looking at the biological characteristics of 21 bat species known to carry diseases.

A large dataset was compiled by considering 57 variables ranging from diet and reproductive behaviour to migratory patterns and population size, using a sample of 1,116 bats. A binary code system assigned bats that were not known to carry a filovirus and known carriers as 0 and 1.

Researchers used a machine learning method known as “generalized boosted regression” to build an algorithm that can identify the patterns of features that make a bat a filovirus carrier.

The algorithm was used to pick out potential filovirus-positive bat species with an estimated 87 per cent accuracy.

Bat species that were more likely to harbour filoviruses had the tendency to live in large groups, reached sexual maturity at an earlier age, had larger offspring and gave birth to more than one pup.

The researchers predicted new bat species that could carry filoviruses were widely distributed outside of Africa. Potential Ebola hotspots included Thailand, Burma, Malaysia, Vietnam, and north-east India.

"Maps generated by the algorithm can help guide targeted surveillance and virus discovery projects. We suspect there may be other filoviruses waiting to be found. An outstanding question for future work is to investigate why there are so few filovirus spillover events reported for humans and wildlife in Southeast Asia compared to equatorial Africa," said John Drake, co-author of the study and researcher at the School of Ecology at the University of Georgia. ®

More about


Send us news

Other stories you might like