Don't believe the hype: Today's AI unlikely to best actual doctors at diagnosing patients from medical scans
Majority of academic studies into hospital image processing aren't subjected to clinical testing
Don’t fall for overblown claims that AI algorithms are just as good as, or even better, than human doctors at diagnosing diseases from medical images. That's according to a study published in The British Medical Journal on Wednesday.
A group of researchers, led by Imperial College London, studied 91 peer-reviewed papers that applied deep-learning algorithms, mostly convolutional neural networks, to look through people's medical records for common signs and symptoms of various illnesses from cancer to glaucoma. Ten studies were based on physical trials, while the rest of the 81 were purely academic.
The majority of these 81 papers, 69 in fact, all boasted about AI having superior or at least comparable performance to clinicians when applied to a particular problem, whether that’s spotting cancerous tumors in breasts or cirrhosis scarring in liver tissue. Only two admitted doctors were better than the machines, and 14 said the machine-learning models could perhaps aid humans in diagnoses.
It’s no wonder that these types of studies are accompanied with splashy headlines claiming that computers are more accurate than real human doctors. But read the small print. Many of these papers may report impressive numbers, but the testing is often limited to the data sets that the papers' authors have compiled themselves. Only six of the 81 AI algorithms were applied on real patient data in clinical settings.
Thankfully, our AI savior is here to nail the COVID-19 pandemic: A neural network that can detect coughingREAD MORE
The sample sizes used to train and test the models are often small. Hell, in some cases fake data is generated because it's often difficult to obtain real records and body scans from patients due to privacy concerns. The average number of human experts pitted against each algorithms was four.
What’s more troubling is that these studies are often very difficult to replicate. Full access to the data sets fed to the machine-learning models was unavailable in 95 per cent of the studies. Code inside the algorithms themselves was absent 93 per cent of the time. It's basically impossible to verify and build upon the claims and findings.
The Imperial College researchers reckon that around two thirds of the 81 studies were likely highly biased. Many of them were “non-randomised,” neglecting to take into account the effects of age, sex, and medical history of the albeit imaginary patients.
“We found only one randomised trial registered in the US despite at least 16 deep learning algorithms for medical imaging approved for marketing by the Food and Drug Administration (FDA),” the Imperial study stated.
Self-driving truck boss: 'Supervised machine learning doesn’t live up to the hype. It isn’t C-3PO, it’s sophisticated pattern matching'READ MORE
Although deep learning is fancy and exciting, drawing in investors and developers in industry and academia, it’s still premature to claim it is better at medical screening than real healthcare professionals. Clinical trials often take years to carry out before drugs or medical devices are deemed effective, and machine-learning code cannot short cut that.
“At present, many arguably exaggerated claims exist about equivalence with or superiority over clinicians, which presents a risk for patient safety and population health at the societal level, with AI algorithms applied in some cases to millions of patients,” the paper concluded.
“Overpromising language could mean that some studies might inadvertently mislead the media and the public, and potentially lead to the provision of inappropriate care that does not align with patients’ best interests.”
The researchers aren’t completely down in the dumps over medical AI technology, however. Mahiben Maruthappu, co-author of the study and CEO of Cera Care, a startup focused on healthcare for the elderly, told The Register:
“Machine learning, when developed in a robust manner and well evaluated, can be transformative for many parts of healthcare, from how we triage patients in A&E, to diagnosis, to recommendations on prescriptions, to advice to patients on lifestyle modification.
"At a time when health systems face unprecedented pressure, such solutions could be invaluable, when delivered in a safe and effective manner.” ®