Many attempts are being made to expand the language model and translation model, which were previously studied mainly in English, into multiple languages. Google's mT5 is a study that extends the existing T5 (text-to-text transfer transformer) to a multilingual corpus, and it has achieved performance improvement in cross-lingual language tasks by collecting data including a total of 101 languages and learning using it. It. Not only the code, but also the training script and the trained model are shared with the public github link.
Meanwhile, Facebook has unveiled M2M-100, a technology that enables mutual translation of 100 languages. In particular, in the case of existing multilingual translations, it was common to first go through the process of translating into English, but this has been improved to enable direct translation between the original language and the target language. Through this process, for example, when translating from Chinese to French, it is said that the BLEU was improved by more than 10 points. For reference, the M2M-100 is said to have been learned for a total of 2,200 language combinations. Here is a link to an article on this and a link to github.