AI + ML

This article is more than 1 year old

39 episodes of 'CSI' used to build AI's natural language model

The show's predictability makes it the ideal robo-cop training tool

Thu 2 Nov 2017 // 06:02 UTC

A group of University of Edinburgh boffins have turned CSI:Crime Scene Investigation scripts into a natural language training dataset.

Their aim is to improve how bots understand what's said to them – natural language understanding.

Drawing on 39 episodes from the first five seasons of the series, Lea Frermann, Shay Cohen and Mirella Lapata have broken the scripts up as inputs to a LSTM (long short-term memory) model.

The boffins used the show because of its worst flaw: a rigid adherence to formulaic scripts that make it utterly predictable. Hence the name of their paper: “Whodunnit? Crime Drama as a Case for Natural Language Understanding”.

“Each episode poses the same basic question (i.e., who committed the crime) and naturally provides the answer when the perpetrator is revealed”, the boffins write. In other words, identifying the perpetrator is a straightforward sequence labelling problem.

What the researchers wanted was for their model to follow the kind of reasoning a viewer goes through in an episode: learn about the crime and the cast of characters, start to guess who the perp is (and see whether the model can outperform the humans).

The human sample was small – just three individuals – but those who worry robots will replace humans can at least take heart that we can still outperform the AI in answering “whodunnit?”

While humans can outperform the LSTM model in precision, we're mostly cautious: the researchers' model would put in its first guess (right or wrong) at the 190th sentence in an episode, whereas humans typically waited for 300 sentences.

“Once humans guess the perpetrator, however, they are very precise and consistent” the researchers write; when models got the identification right, they guessed sooner. The models also picked out “first mentions” of perpetrators quickly.

The best way to confuse the AI, it turned out, was to have no perpetrator at all: in the one episode of the 39 involving a suicide, human viewers worked out the twist about two-thirds of the way through, while the model kept guessing right to the end.

Topics

Special Features

Vendor Voice

Resources

AI + ML

39 episodes of 'CSI' used to build AI's natural language model

The show's predictability makes it the ideal robo-cop training tool

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Google Cloud chief is really psyched about this AI thing

AI spam is winning the battle against search engine quality

Intel CEO suggests AI can help to create a one-person Unicorn

A different view from the edge

Tech titans assemble to decide which jobs AI should cut first

Microsoft rolls out safety tools for Azure AI. Hint: More models

Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine

Microsoft puts ex-DeepMind boffin in charge of London AI hub

Why Microsoft's Copilot will only kinda run locally on AI PCs for now

US House mulls forcing AI makers to reveal use of copyrighted training data

Tough luck, bosses, AI is coming for your job, too

What if AI produces code not just quickly but also, dunno, securely, DARPA wonders

About Us

Our Websites

Your Privacy