HTML5 Webook
18/64

16Data utilization and analytics platformAdvanced Speech Translation Research and Development Promotion Center The Advanced Speech Translation Research and Development Promotion Center (ASTREC) promotes research and development of multilingual speech translation technology and its social implementation. Our work is based on Japan’s Global Communication Plan (GCP), which aims to eliminate the world’s language barriers and facilitate human inter-action on a global scale, while forming part of a nationwide initiative that includes skilled researchers and engineers both from NICT and private companies. We aim to accelerate open innovation using multilingual speech translation technology to realize an advanced ICT-based society where language barriers do not exist. In FY2017, we continued to make eorts to reduce the language barriers faced by foreigners visiting Japan for the Tokyo 2020 Olympic and Para-lympic Games by improving the accuracy of our multilingual speech translation technology and expanding the range of languages and fields in which it can operate. We also reflected these capabilities in our multilingual speech translation app VoiceTra, and conducted field experiments in collaboration with organizations and private companies from various fields such as disaster prevention, rail travel, shopping, taxi services, medicine, emergency and rescue, and policing. Some of these experiments have yielded new commercial services.R&D of multilingual speech rec-ognition technologyAs the basis of our speech recognition technology, we built a speech corpus con-sisting of a total of 2,265 hours of recorded speech: 500 hours of Korean, 542 hours of Thai, and 516 hours of Myanmar, To im-prove the accuracy of speech translation in fields related to travel and daily life, we increased the size of the Japanese-English bilingual dictionary from 100,000 words to 300,000 words, and we also increased the size of the Japanese-Chinese and Japa-nese-Korean dictionaries from 100,000 words to 210,000 words, respectively. We added 60,000 new words of translation for Thai, Vietnamese, Indonesian, Myanmar, Spanish, and French, respectively. The im-provements made to our speech recogni-tion models significantly increased the recognition accuracy for Japanese, Thai, Vietnamese, Indonesian, and Myanmar, with a reduction of between 28% and 42% in word error rates. These improved mod-els have been incorporated into the Voice-Tra field trial system and have been made available to the public.R&D of multilingual speech syn-thesis technologyTo improve the practicality of our Korean and Vietnamese speech synthesis systems, we increased the scale of the speech corpus used to train the acoustic model for each lan-guage to 15,000–20,000 utterances (15–20 hours) for both male and female speakers, corresponding to 2–5 times the size of the original corpus. This resulted in a highly ac-curate acoustic model and better speech synthesis quality. We also improved the pro-nunciation accuracy of each language by in-troducing a new text normalization process that transforms non-phonetic characters like numerals and symbols into strings of pho-netic characters that are more suitable for reading. The new and improved speech syn-thesis system has been incorporated into Voi-ceTra and made available to the public.As in the speech recognition field, deep learning approaches have also been intro-duced to the speech synthesis field in recent years, and have resulted in a higher quality of synthesized speech compared with conven-tional methods based on hidden Markov models (HMMs). At ASTREC, we have been conducting research on deep learning since 2015 and have developed a new speech syn-thesis system that utilizes deep neural net-works (DNNs). Figure 1 compares the new system with a conventional HMM system, and Fig.2 shows the results of speech synthe-sis listening tests performed using a DNN acoustic model of a Japanese female speaker that we developed. The speech quality of the DNN system was clearly better, achieving an average opinion score 0.6 points higher than that of the conventional system. The Japa-nese female voice DNN synthesis system has been made publicly available on VoiceTra.R&D of machine translation technologyOur translation corpus of spoken language in ten different languages for multiple fields including medicine has been expanded far beyond the original target of one million sen-tences. By using this translation corpus, we have confirmed that our translation system has made steady improvements in accuracy for all languages. In this way, we were able to surpass our original goal in building the trans-lation corpus (which was to provide the foun-Research and Development

元のページ  ../index.html#18

このブックを見る