IntroductionNICT is engaged in the development of highly practical, low-latency multilingual speech translation technologies that can be used in everyday life, such as public transport, business meeting, and international conferences. ese technologies are essential for creating a society without language barriers in which the people of the world can communicate with each other without worrying about the dierences in language or ability.Unlike communication among humans, the current speech recognition-based systems require input direction- and language-switching each time before one speaks, due to the limitation of related technologies. ese extra steps, which are not seen in communication between humans, hinder the application of speech-based systems. Developing speech context detection techniques, for example, language and speaker recognition, is essential for improving the usability of real-time multilingual speech translation sys-tems.As one of the most natural ways of communication, acoustic speech encodes various information. Besides lin-guistic information, non-linguistic or paralinguistic infor-mation is also important, for example, speaker information, 1Accent recognition: where are speakers from?Language recognition: what language is spoken?ASR: what is spoken?Speaker recognition: authentication, identification, or who spoke when?Emotion recognition: happy? sad? angry?Gender recognition: female or male?ig. F1 Information in acoustic signal and related speech technologies to decode them言語識別技術及び話者識別技術は、多言語音声翻訳システムの応用範囲を拡大する上で重要な技術である。本稿では、言語識別及び話者識別に関する我々の最新の研究成果を紹介する。言語識別については、短い発話に対する識別精度の改善手法及びクロスドメイン、クロスチャネルの問題に対してモデルの頑健性を改善する手法を紹介する。話者識別については、生成モデルと判別モデルの特徴を考慮したハイブリッドな学習手法により識別精度を改善する方法を紹介する。Spoken language identification and speaker recognition are key technologies to enhance the application areas of multilingual speech translation systems. In this paper, we overview our latest studies of the two and the results we have achieved thus far. As for spoken language identification, we introduce techniques to improve the performance on short utterances and the model robustness for cross-domain/channel problems. As for speaker recognition, we introduce our proposed hybrid-learning method which takes into account the features of both generative and discriminative models.2-2-5 言語識別・話者識別技術2-2-5Spoken Language and Speaker Recognition Technology沈 鵬 Xugang LuPeng SHEN and Xugang LU392 多言語コミュニケーション技術
元のページ ../index.html#45