Meta’s AI will be able to provide over 25 million translations per day. Photo: AFP.
Meta (formerly Facebook) wants to compete with the popular translator Google translator with a unique Artificial Intelligence (AI) model capable of translate into 200 different languagesincluding languages such as Kambra, Somali, Indonesian, Basque or Catalan.
Technology, under the name No language left behind (NLLB-200)is part of a project developed by Mark Zuckerberg’s company that aims toand encourage the use of minority languages and foster conversations on their social networks.
Specific, NLLB-200 can translate into 200 languages that until now did not exist in the most used translation tools or did not work properly, the company said in a statement released this week.
“The AI modeling techniques we have used are helping to deliver high quality translations. We trained him using the Search super clusterone of fastest supercomputers in the world”, Underlines the founder and CEO of Meta, Mark Zuckerberg, in a publication on his Facebook account.
AI Research SuperCluster, Meta’s supercomputer.
In its attempt to lead the metaverse revolution, Meta has focused part of its efforts (and financial capital on the development of Artificial Intelligence systems for build your virtual community. In recent months, the company has launched several initiatives that point in that direction.
It recently announced the creation of a linguistic model which mimics the way the human brain processes words. Now, the tech firm has been looking for more a unique model of artificial intelligence capable of translating into 200 different languages.
Open source: the key to NLLB-200
Mark Zuckerberg allocates funds for the research and development of his NLLB-200 artificial intelligence. Photo: MANDEL NGAN / AFP.
The model on which the Meta engineers relied is practically the same as the M2M-100, presented in 2020 and which does not require knowledge of English first. Also, this time, open source the NLLB-200 model and other tools so that other researchers can extend this work to more languages and design more inclusive technologies.
Advances made to the NLLB-200 model will be able to deliver more than 25 million translations per day in the news section of Facebook, Instagram and the rest of the platforms that are part of the technological conglomerate.
With this commitment to the NLLB-200 model, Meta hopes to offer accurate translations that can help detect malicious content and incorrect informationas well as protecting the integrity of political processes such as elections or the containment of cases of sexual exploitation and human trafficking on the Internet.
The problem is, there are a lot of combinations for which there are no parallel sentences that can serve as a translationwhich causes some of these translations to include grammatical errors or inconsistencies.
To show the effectiveness of NLLB-200, Meta has released a demo in which we can see a book translated into multiple languages. A text in Burmese, Khmer, Somali or Indonesian can be translated in seconds into Spanish, Catalan, Basque, English, Portuguese, Russian, Ligurian, Turkish, Korean or Simplified Chinese, among others.
To reach the 200 languages included in the NLLB-200, Artificial Intelligence had to focus on three aspects– Expand available training resources, adapt model size without sacrificing performance, and mitigation and assessment tools for 200 languages.
First, the company noted that in order to collect parallel texts for more accurate translations into other languages, it has enhanced the Transfer Language Agnostic Sentence Representations (LASER) tool. zero shot.
Also, to create concrete and correct grammatical forms, developed toxicity lists for all 200 languages and used them to evaluate and filter errors to reduce the risk of so-called ‘hallucination toxicity’. This occurs when the system enters erroneously problematic content during translations.
On the other hand, Meta recognized that there are still “major challenges to be faced” in expanding the model from 100 to 200 languages ”and focused in particular on three aspects: curricular regularization and learning, supervised machine learning and diversification of retrotranslation (that is, retranslating the translated exactly into the source language).
With information from La Vanguardia.
Source: Clarin