Boffin-loving bots are penning potential new Wikipedia pages to recognize the work of notable scientists who are missing from the online encyclopedia.
Any human with an internet connection can submit new Wikipedia entries, or edit existing ones. Heck, even AI software can tweak and update the encyclopedia's pages – they even seem to enter cyber-spats, constantly scribbling over one another’s changes.
Editing is one thing. Writing an article from scratch is another. It’s still difficult for computers to craft long and coherent sentences automatically to do this. A group of researchers from Google Brain tried to get a neural network to do cough up new pages by summarizing snippets of information after scraping relevant webpages. The results were OK at best; the text, like roast beef in a cheap carvery, was quite dry.
Now, engineers at Primer, a Silicon Valley AI startup focused on natural language processing, have followed Google’s approach, although they have gone a step further and built a knowledge base alongside a text generation model, a technology dubbed Quicksilver.
“For Quicksilver’s architecture we started on the trail blazed by the Google AI team, but our goal is more practical,” said John Bohannon, director of science at Primer. "Rather than using Wikipedia as an academic testbed for summarization algorithms, we’re building a system that can be used for building and maintaining knowledge bases such as Wikipedia."
Here are 100 proposed Wikipedia entries the model has created from various knowledge sources. The eggheads that had the most mentions on web articles or journals but did not have a Wikipedia page, and thus were given the bot treatment, include John Noseworthy, a neurologist and CEO of the Mayo Clinic; Ami Zota, an assistant professor at the George Washington University's Milken Institute School of Public Health; and Andrej Karpathy, an AI boffin at Tesla.
Look out, Wiki-geeks. Now Google trains AI to write Wikipedia articlesREAD MORE
Quicksilver found 40,000 people missing from Wikipedia that it believed deserved pages, including a good number of women scientists. It did this by analyzing 30,000 English Wikipedia articles about boffins, their corresponding Wikidata entries – a free knowledge base used for Wikimedia projects.
Over three million sentences collected from news articles and the names and affiliations of authors on 200,000 scientific papers was also thrown at the machine-learning software to find out which scientists were mostly widely mentioned in the news and academia but were missing on Wikipedia.
“The crucial breakthrough for us was using structural data from Wikidata about our seed population of scientists to map them to their mentions in news documents,” Bohannon explained.
"Distant supervision then allowed us to bootstrap models for relation extraction and build a self-updating knowledge base. By adding an [recurrent neural network] trained on Wikipedia articles, it becomes a knowledge base that can describe itself in natural language."
You will be judged
The model gets to decide who is worthy of a Wikipedia page, and it's more likely to pick someone based on how many times their name crops up in the news. "We are being very careful to not make this judgement," Bohannon told The Register.
"We did explore a model of Wikipedia "notoriety" prediction, but we found that using the existing distribution of personal details from existing Wikipedia articles to determine who "deserves" a page will only reinforce biases.
"Instead we decided to simply extract as much information as possible about scientists from the news. The more information there is, the higher the chances that the person is eligible for an article, in general. Quicksilver gives human editors the information they need to build Wikipedia pages for these individuals, based on information about them published in fully sourced news articles, but it is ultimately up to the editors to decide to build a page."
The generated pages are pretty short, and definitely not as complete as most Wikipedia pages. It lacks sections, and gives a short introduction and a list of events that the person has been involved in. They aren’t ready to go straight onto Wikipedia, and instead are supposed to aid human editors with essentially stub pages.
Quicksilver can also help netizens maintain entries, too. The idea is that if the knowledge base is kept up to date by regularly inspecting the latest news articles, the model can also update information for existing pages.
“As it becomes more and more essential to the world, biased and missing information on Wikipedia will have serious impacts,” Bohannon concluded. "The human editors of the most important source of public information can be supported by machine learning. Algorithms are already used to detect vandalism and identify underpopulated articles. But the machines can do much more." ®