Meta uses multi-language readings from the Bible to train its mass multilingual voice models (MMS). This type of technology could be used for virtual reality and augmented reality applications in a person’s preferred language.
Massively multilingual voice (MMS) models extend text-to-speech and speech-to-text technology from about 100 languages to more than 1,100, more than 10 times more than before, and can also identify more than 4,000 spoken languages, 40 times more than with previous technologies. In addition, Meta has announced the opening of their models and codes so that the research community can collaborate with this task.
“Many of the world’s languages are in danger of disappearing, and the limitations of current speech generation and recognition technology will only accelerate this trend. We want to make it easy for people to access information and use devices in their preferred language.” they explain from the company by Mark Zuckerberg.
meta voice technology uses
There are many use cases for speech technology, from augmented and virtual reality technology to Courier serviceswhich can be used in a person’s preferred language and can understand everyone’s voice.
Gathering audio data for thousands of languages was the first challenge Meta faced because the largest existing voice data sets cover 100 languages at most. To overcome this, she has turned to religious texts, such as the Bible, which have been translated into many different languages and whose translations have been extensively studied for text-based language translation research.
These translations have publicly available audio recordings of people reading these texts in different languages. As part of the MMS project, Meta has created a data set of New Testament readings in over 1,100 languageswhich provided an average of 32 hours of data per language.
controversial use of the bible
By considering unlabeled recordings of various other Christian religious readings, we increased the number of available languages to more than 4,000. While this data comes from a specific domain and is often read by male speakers, our analysis shows that our models work equally well for male and female voices. And while the content of the audio recordings is religious, our analysis shows that this does not skew the model to produce more religious language“, they explain.
However, not everyone agrees with this assessment. Chris Emezue, researcher at masakhanean organization that works in the natural Language Processing For African Languages, explains to MIT Technology Review that, Although the scope of the research is impressive, the use of religious texts to train AI models can be controversial. The Bible has many prejudices and misrepresentations.