This article is more than 1 year old
Machine learning the hard way: IBM Watson's fatal misdiagnosis
The doctor won't see you now
Opinion It started in Jeopardy and ended in loss. IBM's flagship AI Watson Health has been sold to venture capitalists for an undisclosed sum thought to be around a billion dollars, or a quarter of what the division cost IBM in acquisitions alone since it was spun off in 2015.
Not the first nor the last massively expensive tech biz screw-up, but isn't AI supposed to be the future? Isn't IBM supposed to be good at this?
It all started so well. One of Watson's early set pieces was taking a complex set of symptoms and finding the most probable diagnosis out of an encyclopaedic knowledge of rare diseases. A different challenge marked its demise. Like a corpse with a broken neck, 15 bullet holes and a strong smell of cyanide, it raised the question: which massive failure actually finished it off?
A good doctor takes a comprehensive history, so let's start at birth. The first most people knew of IBM Watson was in 2011, when it used its natural language processing and capacious knowledge models to win Jeopardy, an American TV game show.
A PR and marketing coup, IBM lost no time in launching Watson Health off the back of it. With this sort of automated analytics, the company promised, doctors could diagnose more accurately, more swiftly and at less expense. Treatments would be more precisely targeted for more patients. A revolution in healthcare was underway.
That's a lot to promise off the back of a gameshow, but IBM was bullish. Shiny new Manhattan headquarters were unveiled with an "immersion room" that put punters and journalists inside a planetarium-like dome where Watson could display its fabulous talents. Institutional and commercial partners were signed up for co-development, and the magic of turning hype into reality kicked off.
IBM tried to bypass that by buying other companies with successful AI medical products, to absorb their goodness, but Watson consistently rejected the transplants.
In 2019, IEEE Spectrum, the flagship mag of the American professional body for electronic engineering, reported that around 50 partnerships had been announced since launch between IBM Watson and healthcare organisations.
It listed 20 of the highest profile ones, with such august bodies as the Mayo Clinic, American national organisations for cancer, cardiology and oncological research, and numerous hospitals and companies. None had produced usable tools or apps.
At the time (we can't speak to today), Watson Health didn't seem to work. It is usually possible to fudge such misfortune in business technology, because stats can be dressed up, returns on investment left agreeably fuzzy, and sufficient figleaves plucked and donned for CIOs to move on without shame to the next failure. Medicine uses real statistics. It publishes. It checks outcomes, because it's not selling widgets, it's trying to keep people not dead, ideally happily so.
When clinical trials were published, Watson came up short every time. It didn't matter what field it was in, it consistently scored less well than human clinicians – sometimes under 50 per cent – and demonstrated some alarming blind spots in suggested treatments. Medical professionals had enough to worry about without babysitting a broken AI: it got dropped.
You can learn the rules for Jeopardy in a minute. Becoming a doctor takes 10 years. Becoming the best doctor you can be takes a lifetime. Medical data, whether in the literature or in test results, is meaningless or misleading without a lot of implied context. Watson Health couldn't work across lots of fields at once; that needs a general intelligence which AI currently does not have. It needed to evolve from the ground up with experts in each specialism, letting them set the rules, the ways of working, what knowledge mattered and why. You can't do that with an immersion room. If you promise and do not deliver, you won't get the help you need to make things better. Once you've blown trust, you're toast.
IBM tried to bypass that by buying other companies with successful AI medical products, to absorb their goodness, but Watson Health consistently rejected the transplants. Companies that could prosper on their own against small, nimble, focused competition couldn't thrive when sewn into Watson's marketing-led system. Customers left, and the newly acquired employees dumped when the numbers fell apart.
IBM's Watson Health failed at the time, like so much AI/ML, because it didn't know what the question was – ironic, since the game of Jeopardy at which it excelled is all about deducing questions from data. It wanted to automate the highest skilled aspects of healthcare, diagnosis and treatment, but the problem wasn't one of getting the most data and the best algorithm. Rather, the problem was one of meaning.
A good doctor sees the patient, not the symptoms. Watson saw the symptoms of inefficiency and lack of capability. It did not see the process of care and making whole, where doctors, not data, were what needed to be understood.
Fortunately, this sad case does not mean AI-based tools can't work in medicine, nor even that Watson Health won't do so, nor that AIs' evaluation and adoption has been slowed down. Ask the right questions, such as "why do some patients get readmitted" and can you spot them early?, and you get the right answers. AI in healthcare is being adopted, and it shows every sign of deserving its place.
It only wins that place when it's held to the same standards as any other aspect of healthcare. AI/ML in general business will only succeed once we learn to spot the Watsons. Set early tests with clear goals and clear results. Set time limits – it took Watson Health many years to die, but it was failing within three. If you can't deliver on proof of concept, stop. If you can't even decide on a proof of concept, don't even start.
Perhaps the best legacy Watson Health has left from its short, troubled time with us is a game in the spirit of Jeopardy. Let's call it Hyperspace. Just enter "IBM Watson" and a year between 2011 and 2021 into the search engine of your choice, and see if you can work out just how badly things had gone by that year from the mix of IBM-led hype and reality-led news reports. There's a pattern there. It's worth learning – after all, it cost enough. ®