AI + ML

This article is more than 1 year old

Meta's AI-based Wikipedia successor 'may be the next big break in NLP'

Don't believe everything you read on the internet

Mon 11 Jul 2022 // 21:31 UTC

Meta has open-sourced a machine-learning resource that could one day supplant Wikipedia as the world's biggest publicly available knowledge-verification database.

Dubbed Sphere, it can be used to perform knowledge-intensive natural language processing, or KI-NLP, we're told. In practical terms, that means it can be used to answer complicated questions using natural language, and find sources for claims.

A given example of its use is asking Sphere, "Who is Joëlle Sambi Nzeba?" Wikipedia doesn't have an entry for her, but Sphere said she was "born in Belgium and grew up partly in Kinshasa (Congo). She currently lives in Brussels. She is a writer and slammer, alongside her activism in a feminist movement," and links to a website where it got that information about her work.

Wikipedia has pretty much served as the corpus of record, Meta's eggheads wrote in a paper discussing the design of Sphere, claiming the volunteer-maintained uber-wiki is "accurate, well-structured, and small enough to use easily in testing environments."

The tech and social impact of AI's powerful, emerging 'foundation models'

Seeking to build something bigger and better than Wikipedia, though, Meta pulled together content from all over the web – sans wikipedia.org – to form a "universal, uncurated and unstructured knowledge source for multiple KI-NLP tasks at once." The result is Sphere, which is more or less a mountain of processed data that can be queried using a bunch of machine-learning tools.

The team adds that Sphere "can match and outperform baselines grounded in Wikipedia" on some tasks using the KILT AI benchmark. That is to say, Sphere performs better than AI systems built on Wikipedia's content.

The primary aim of Sphere was to see what impact replacing Wikipedia, as a source, had on the performance of knowledge-intensive systems, and while the team did report that Sphere had some issues, its performance indicates that, at the very least, it can add value to KI-NLP tasks beyond what Wikipedia corpora can offer.

The researchers behind Sphere claim their work marks "the first time a general purpose search index improves language models on common sense tasks."

Sphere isn't the only AI platform Meta has released on GitHub: last week it released NLLB-200, the first translation AI to pass the 200 language threshold, or so the Facebook parent claimed. Like Sphere, NLLB-200 has been put to use at Wikipedia; the former system for automatically checking citations in edited articles, and the latter to improve translation of pages into less commonly spoken languages.

When transitioning to a web corpus, we no longer have the certainty that any document is good, truthful or unique

Sphere goes beyond similar web corpora in terms of scale, consisting of 906 million passages and 134 million documents. The next largest in terms of passages/documents is the Internet Augmented Dialog generator, which pulls data from 250 million passages and 109 million documents.

But the internet contains no controls for quality or accuracy, which the researchers admit is a key problem for actually deploying this thing. "Using Wikipedia as the knowledge source allows researchers to assume the high quality of the corpus documents. When transitioning to a web corpus, we no longer have the certainty that any document is good, truthful or unique," the researchers wrote.

Sphere's creators think iterative efforts should focus on assessing quality of the data it retrieves, detecting false claims and contradictions, determining how to prioritize trustworthy sources, and when to decide not to answer a question because of a lack of information. You know, making it actually useful.

If it can successfully turn Sphere into a white-box AI with reliable and trustworthy information, Meta said, Sphere "may be the next big break in NLP." ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

Meta's AI-based Wikipedia successor 'may be the next big break in NLP'

Don't believe everything you read on the internet

The tech and social impact of AI's powerful, emerging 'foundation models'

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Next-gen Meta AI chip serves up ads while sipping power

Google Cloud chief is really psyched about this AI thing

What's up with AI lately? Let's start with soaring costs, public anger, regulations...

Reducing the cloud security overhead

AI spam is winning the battle against search engine quality

Arm flexes silicon muscles to push generative AI at the edge

Developers are calling the shots on AI planning, judging by your experience

Intel CEO suggests AI can help to create a one-person Unicorn

Microsoft puts ex-DeepMind boffin in charge of London AI hub

Tech titans assemble to decide which jobs AI should cut first

US House mulls forcing AI makers to reveal use of copyrighted training data

Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine

About Us

Our Websites

Your Privacy