DeepMind AI tool helps historians restore ancient texts

'Ithaca' brings deep learning to the Greek epigrapher set


AI software can help historians interpret and date ancient texts by reconstructing works destroyed over time, according to a new paper published in Nature.

A team of computer scientists and experts in classical studies led by DeepMind and Ca' Foscari University of Venice trained a transformer-based neural network to restore inscriptions written in ancient Greek between 7th century BC and 5th century AD. The model, named "Ithaca" after the home of legendary Greek king Odysseus, can also estimate when the text was written and where it might have originated.

By recovering fragments of text on broken pieces of pottery or blurry scripts, for example, researchers can begin translating them and learn more about ancient civilizations.

Thea Sommerschield, co-author of the paper and an epigrapher of ancient Greek and Roman, told The Register in a joint statement with co-author and DeepMind scientist Yannis Assael that inscriptions are vital records. Everything from religious calendars to laws and leases can be preserved.

Why ancient Greek? The researchers said the variable content and available context in the Greek epigraphic record made it an "excellent challenge" for language processing, plus the large body of (digitized) written texts that is currently available – essential for training the model.

"These documents are one of the most important bodies of evidence for the history, language, religion, politics, and mentality of the ancient world," Sommerschield and Assael said. The duo hope that Ithaca will pave the way for researchers to study history with new AI techniques.

"Just as microscopes and telescopes extended the range of what scientists can do – providing historians with further tools to aid their discoveries and improve our collective understanding of history and culture. We hope that this work may set a new standard for the field of digital epigraphy, by using advanced deep-learning architectures to support the work of ancient historians," the pair told us.

First, the text needs to be transcribed by scanning an image of an old object or script. The text is then fed into Ithaca for analysis. It works by predicting lost or blurry characters to restore words as outputs. The software generates and ranks a list of its top predictions; epigraphists can then scroll through them and judge whether the model's guesses seem accurate or not.

The best results are reached when human and machine work together. When experts worked alone, they were 25 per cent accurate at piecing together ancient artefacts, but when they collaborated with Ithaca the accuracy level jumped up to 72 per cent. Ithaca's performance on its own is about 62 per cent, for comparison. It's also 71 per cent at pinpointing the location of where the text was written, and can date works to within 30 years of their creation between 800BC and 800AD.

Ithaca was trained on over 63,000 Greek inscriptions containing over three million words from The Packard Humanities Institute's Searchable Greek Inscriptions public dataset. The team masked portions of the text and tasked the model with filling in the blanks. Ithaca analyses other words in a given sentence for context when generating characters.

For example, when restoring the ancient Greek word for "alliance", it looked at the words "Athenians" and "Thessalians", describing people from two ancient peoples that banded together to fend off the Spartans. It's likely these three words appeared together in the same sentence in previous uncorrupted inscriptions the model saw in its training phase.

"Ithaca is trained to restore up to half of the missing text. In our experiments, and in the case of restoration specifically, we report results with up to 10 characters missing," Sommerschield and Assael told us.

"Ithaca has been used to re-date key texts of Classical Athens, thus contributing to topical debates in Ancient History. We hope many more such discoveries will follow, and that historians will include Ithaca in their workflow. The response we received from independent third-party historians has been very positive and enthusiastic."

DeepMind is now adjusting its model to adapt to other types of old writing systems, like Akkadian developed in Mesopotamia, Demotic from ancient Egypt, to Mayan originating from Central America and ancient Hebrew. "We hope that models like Ithaca can unlock the cooperative potential between AI and the humanities, transformationally impacting the way we study and write about some of the most significant periods in human history," the biz said.

You can see a demo of Ithaca here and find the code for Ithaca here. ®

Broader topics


Other stories you might like

Biting the hand that feeds IT © 1998–2022