AI + ML

This article is more than 1 year old

Meta's AI translation breaks 200 language barrier

Open source model improves translation of rarer spoken languages by 70%

Wed 6 Jul 2022 // 16:30 UTC

Meta's quest to translate underserved languages is marking its first victory with the open source release of a language model able to decipher 202 languages.

Named after Meta's No Language Left Behind initiative and dubbed NLLB-200, the model is the first able to translate so many languages, according to its makers, all with the goal to improve translation for languages overlooked by similar projects.

"The vast majority of improvements made in machine translation in the last decades have been for high-resource languages," Meta researchers wrote in a paper [PDF]. "While machine translation continues to grow, the fruits it bears are unevenly distributed," they said.

According to the announcement of NLLB-200, the model can translate 55 African languages "with high-quality results." Prior to NLLB-200's creation, Meta said fewer than 25 African languages were covered by widely used translation tools. When tested against the BLEU standard, Meta said NLLB-200 showed an average improvement of 44 percent over other state-of-the-art translation models. For some African and Indian languages, the improvement reportedly went as high as 70 percent.

Along with its release on GitHub as an open-source model, Meta said it's also providing $200,000 in grants to nonprofits willing to research real-world applications for NLLB-200.

While text-generating AI can write like humans, it lacks common sense

Lofty goals aside, Meta is already putting NLLB-200 to work. The model and other results from the NLLB program "will support more than 25 billion translations served every day on Facebook News Feed, Instagram, and our other platforms."

In addition, Meta has been working with the Wikimedia Foundation to use NLLB-200 as the back end of Wikipedia's Content Translation Tool. By including NLLB-200, the CTT added 10 languages that were unsupported by any other translation tool.

There are still hurdles. Meta explains it had to do quite a bit of work to overcome hurdles in doubling NLLB's capabilities, which it overcame through "regularization and curriculum learning, self-supervised learning and diversifying back-translation." Meta also made extensive use of language model distillation, which reduces previously trained AIs into training data for newer models.

As part of its open sourcing of NLLB-200, Meta is also releasing the new Flores-200 evaluation dataset it built for the project, seed training data, its 200-language toxicity list, its new LASER3 sentence encoder, the stopes data mining library, 3.3 billion and 1.3 billion parameter dense transformer models, 1.3 billion and 600 million parameter models distilled from NLLB-200 and NLLB-200 itself, which contains 54.5 billion parameters.

Not all communities may welcome the inclusion of their language in NLLB, or other programs for that matter. New Zealand's Māori community faced off against translation companies last year, arguing the entities didn't have a right to buy language data and sell the Māori language back to its speakers. ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

Meta's AI translation breaks 200 language barrier

Open source model improves translation of rarer spoken languages by 70%

While text-generating AI can write like humans, it lacks common sense

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Next-gen Meta AI chip serves up ads while sipping power

Google Cloud chief is really psyched about this AI thing

AI spam is winning the battle against search engine quality

Protecting distributed branch office environments from ransomware

What's up with AI lately? Let's start with soaring costs, public anger, regulations...

Arm flexes silicon muscles to push generative AI at the edge

Developers are calling the shots on AI planning, judging by your experience

Psst, hey. It's the NSA. You want some AI security advice?

AI PCs are here but a killer application for biz users? Nope

Stability AI decimates staff just weeks after CEO's exit

Why making pretend people with AGI is a waste of energy

Intel CEO suggests AI can help to create a one-person Unicorn

About Us

Our Websites

Your Privacy