Facebook’s AI can translate between 100 languages
Thursday, 10 22 2020, Category: Technology, Country: World
Facebook is open-sourcing a new AI language model called M2M-100 that can translate between any pair among 100 languages. Of the 4,450 possible language combinations, it translates 1,100 of them directly. This is in contrast to previous multilingual models, which heavily rely on English as an intermediate. A Chinese to French translation, for example, typically passes from Chinese to English and then English to French, which increases the chance of introducing errors.
The model was trained on 7.5 billion sentence pairs. In order to compile a data set that large, the researchers relied heavily on automated curation. They used web crawlers to scrape billions of sentences from the web and had another language model called FastText identify the language. (They didn’t use any Facebook data.) Then they used a program called LASER 2.0, developed previously by Facebook’s AI research lab, which uses unsupervised learning—machine learning that doesn’t require manually labeled data—to match sentences across languages by their meaning.
LASER 2.0 creates what are known as “embeddings” from large, unstructured data sets of sentences. It trains on the available sentence examples within each language and maps out their relationships to one another based on how often and how close together they’re used. These embeddings help the machine-learning model approximate the meaning of each sentence, which then allows LASER 2.0 to automatically pair up sentences that share the same meaning in different languages.