Software has been trained by academics to produce different styles of biblical text, after swotting up on the original sacred texts.
The neural network – developed at Dartmouth University and Indiana University Bloomington in the US – is an interesting demonstration of artificially intelligent code poring over writing in one form, and using it to craft prose in another.
This could be ultimately used, for example, to turn complicated information into an easy-to-understand explanation, automatically by a computer, of course.
Speaking to The Register this week, Keith Carlson, a Dartmouth PhD student and coauthor of a paper into the research effort, which was published on Wednesday in the Royal Society Open Science journal, said: “There are a few things that make the Bible such a great dataset for language tasks.
"It has been translated into many languages and there are even many versions for major languages, so it is useful simply because of the breadth.”
In fact, machine translation systems like Google Translate do appear to be trained on the Bible. This is because passages in, say, a French version can be directly linked to parts in a German edition, allowing a neural network to connect the meanings of French and German words used in the paragraphs. Sometimes passages will spookily resurface when users try to make the system translate odd sentences between particularly rare languages.
The Bible is also already handily cut up into small chunks and labelled with verse numbers that match across different versions. “Systems for many NLP tasks require this alignment, using the Bible eliminates the need to use an automatic alignment algorithm, which may introduce errors,” Carlson added.
Instead of machine translation, however, the boffins for this experiment focused on style transfer. It’s a similar problem to machine translation, they argued in their paper: “Style transfer can naturally be viewed as a machine translation problem where the source language and target language are simply different textual styles.”
They trained a neural network to reproduce biblical verses in different prose styles by training it on various versions of the Bible, such as the King James Version that was written in 1604 and the more modern American Standard Version (ASV) completed in 1901. Thirty three English translations of the Bible were scraped from Bible Gateway, a website that can be searched for passages in different Bible styles, and fed into the network to teach it.
The dataset contained more than 1.5 million unique pairings of source and target verses from different Bible versions, and was used to train the system.
Neural networks and God
Here’s an example input text taken from the Bible in Basic English (BBE) edition given to the recurrent neural network as a test, and the output text the AI translated into the ASV style.
Input (BBE): "Then the Levites took down the ark of the Lord and the chest in which were the gold images, and put them on the great stone: and the men of Beth-shemesh made burned offerings and gave worship that day before the Lord."
Output (ABV): “And the Levites brought down the ark of Jehovah, and the chest in which were the golden images, and put them upon the great stone: and the men of Beth-shemesh burned incense, and worshipped that day before Jehovah."
The overall meaning of the text has to be preserved in order for it to be a fair representation. It’s difficult to understand how the recurrent neural network generated the text, and what features it learns to pick up on, Carlson admitted.
How to feed and raise a Wikipedia robo-editorREAD MORE
“It operates by reading the entire input sentence and creating a vector representing it," he said. "To create the output it then uses this vector to produce a single word at a time, conditioned on any words it has already produced. The features learned by neural networks are notoriously tricky to interpret.”
He hopes that they will be useful in making information in text more accessible to different populations.
“For example, text could be rewritten to be easier understood by children, or non-native speakers," he suggested. "Similarly, text could be re-written to allow a layperson to better understand the meaning of something highly technical, such as an engineering paper or a legal document.
“Aside from accessibility, text could be rewritten to match the style of a particular author. This may just be for curiosity's sake, for example rewriting novels in the style of another author, or serve a more practical function, such as allowing a team of writers producing product descriptions within a company to maintain a consistent stylistic voice across their writing.”
Other researchers have already performed similar experiments with the latter. DeepTingle, a neural network developed by researchers from New York University, could spit out text in the style of Chuck Tingle, a gay erotica author. ®