AI in Medicine? It's back to the future, Dr Watson

Why IBM's cancer projects sounds like Expert Systems Mk.2

Analysis "OK, the error rate is terrible, but it's Artificial Intelligence – so it can only improve!"

Of course. AI is always "improving" – as much is implied by the cleverly anthropomorphic phrase, "machine learning". Learning systems don't get dumber. But what if they don't actually improve?

The caveat accompanies almost any mainstream story on machine learning or AI today. But it was actually being expressed with great confidence forty years ago, the last time AI was going to "revolutionise medicine".

IBM's ambitious Watson Health initiative will unlock "$2 trillion of value," according to Deborah DiSanzo, general manager of Watson Health at IBM.

But this year it has attracted headlines of the wrong kind. In February, the cancer centre at the University of Texas put its Watson project on hold, after spending over $60m with IBM and consultants PricewaterhouseCoopers. Earlier this month, StatNews published a fascinating investigative piece into the shortcomings of its successor, IBM's Watson for Oncology. IBM marketing claims this is "helping doctors out-think cancer, one patient at a time".

The StatNews piece is a must-read if you're thinking of deploying AI, because it's only tangentially about Artificial Intelligence, and actually tells us much more about the pitfalls of systems deployment, and cultural practice. In recent weeks Gizmodo and MIT Technology Review have also run critical looks at Watson for Oncology. In the latter, the system's designers despaired at the claims being made on its behalf.

"How disappointing," wrote tech books publisher O'Reilly's books editor, Andy Oram, to net protocol pioneer Dave Farber. "This much-hyped medical AI is more like 1980s expert systems, not good at diagnosing cancer."

What does he mean?

Given how uncanny it is that so much of today's machine learning mania echoes earlier hypes, let's take a step back and examine the fate of one showpiece Artificial Intelligence medical system, and see if there's anything we can learn from history.


The history of AI is one of long "winters" of disinterest punctuated by brief periods of hype and investment. Developed by Edward Shortliffe, MYCIN was a backward-chaining system designed to help clinicians that emerged early on in the first "AI winter".

MYCIN used AI to identify the bacteria causing infections, and based on information provided by a clinician, recommended the correct dosage for the patient.

MYCIN also bore the hallmarks of experience. The first two decades of AI had been an ambitious project to encode all human knowledge in symbols and rules, so they could be algorithmically processed by a digital computer. Despite great claims made on its behalf, this had yielded very little of use. Then in 1973, the UK withdrew funding for AI from all but three UK universities. The climate had gone cold again.

AI researchers were obliged to explore new approaches. The most promising seemed to be to give the systems constraints – simplifying the problem space. Micro-worlds, artificially simple situations, were one approach; and Terry Winograd's block-stacker SHRDLU was one example. From Micro-worlds came rules-based "expert systems". MYCIN was such a rules-based system. Comprising 150 IF-THEN statements, MYCIN made inferences from a limited knowledge base.

There's a detailed description of MYCIN here (PDF).

At the time, MYCIN's defenders claimed that no "expert" could outperform it, and it prompted a wave of enthusiasm. MYCIN had much to commend it. It was honest, giving the user a probability figure and a full trace of all the evidence.

Just as today's AI experiments purport to be able to tell if you're gay (and in future, its Stanford creator claims, your political views), the headline conceals a probability, derived after much training (or "learning").

Unlike almost all mainstream publications, which rarely if ever report the failure rate, we do. For example, the failure rate for the AI recognition that claimed to identify masked faces was surprising: with cap and scarf the AI is between 43 and 55 per cent accurate. "On a practical level it isn't that awesome," we noted.

Similar topics

Broader topics

Other stories you might like

  • US won’t prosecute ‘good faith’ security researchers under CFAA
    Well, that clears things up? Maybe not.

    The US Justice Department has directed prosecutors not to charge "good-faith security researchers" with violating the Computer Fraud and Abuse Act (CFAA) if their reasons for hacking are ethical — things like bug hunting, responsible vulnerability disclosure, or above-board penetration testing.

    Good-faith, according to the policy [PDF], means using a computer "solely for purposes of good-faith testing, investigation, and/or correction of a security flaw or vulnerability."

    Additionally, this activity must be "carried out in a manner designed to avoid any harm to individuals or the public, and where the information derived from the activity is used primarily to promote the security or safety of the class of devices, machines, or online services to which the accessed computer belongs, or those who use such devices, machines, or online services."

    Continue reading
  • Intel plans immersion lab to chill its power-hungry chips
    AI chips are sucking down 600W+ and the solution could be to drown them.

    Intel this week unveiled a $700 million sustainability initiative to try innovative liquid and immersion cooling technologies to the datacenter.

    The project will see Intel construct a 200,000-square-foot "mega lab" approximately 20 miles west of Portland at its Hillsboro campus, where the chipmaker will qualify, test, and demo its expansive — and power hungry — datacenter portfolio using a variety of cooling tech.

    Alongside the lab, the x86 giant unveiled an open reference design for immersion cooling systems for its chips that is being developed by Intel Taiwan. The chip giant is hoping to bring other Taiwanese manufacturers into the fold and it'll then be rolled out globally.

    Continue reading
  • US recovers a record $15m from the 3ve ad-fraud crew
    Swiss banks cough up around half of the proceeds of crime

    The US government has recovered over $15 million in proceeds from the 3ve digital advertising fraud operation that cost businesses more than $29 million for ads that were never viewed.

    "This forfeiture is the largest international cybercrime recovery in the history of the Eastern District of New York," US Attorney Breon Peace said in a statement

    The action, Peace added, "sends a powerful message to those involved in cyber fraud that there are no boundaries to prosecuting these bad actors and locating their ill-gotten assets wherever they are in the world."

    Continue reading

Biting the hand that feeds IT © 1998–2022