ceedings of Interspeech ’14, pp.338–342, Sept. 2014.43A. Graves, N. Jaitly, and A. Mohamed, “Hybrid speech recognition with deep bidirectional LSTM,” Proceedings of ASRU ’13, pp.273–278, Dec. 2013.44X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and W. Woo, “Convo-lutional LSTM network: A machine learning approach for precipitation nowcasting,” Proceedings of NIPS ’15, pp.802–810, Dec. 2015.45T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, “Convolutional, long short-term memory, fully connected deep neural networks,” Proceed-ings of ICASSP ’15, pp.4580–4584, April 2015.46S. Young, G. Evermann, M. J. F. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, A. Ragni, V. Valtchev, P. Woodland, and C. Zhang, “The HTK book (version 3.5a),” https://www.danielpovey.com/files/htkbook.pdf, Dec. 2015.47K. Vesely, A. Ghoshal, L. Burget, and D. Povey, “Sequence-discrimina-tive training of deep neural networks,” Proceedings of Interspeech ’12, pp.2345–2349, Aug. 2013.48D. Povey, V. Peddinti, D. Galvez, P. Ghahrmani, V. Manohar, X. Na, Y. Wang, and S. Khudanpur, “Purely sequence-trained neural networks for ASR based on lattice-free,” Proceedings of Interspeech ‘16, pp.2751–2755, Sept. 2016.49J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” Proceed-ings of NAACL-HLT ’19, pp.4171–4186, June 2019.50A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” Technical report of OpenAI, June 2018.51D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, “SpecAugment: A simple data augmentation method for au-tomatic speech recognition,” Proceedings of Interspeech ’19, pp.2613–2617, Sept. 2019.52C. Du, H. Li, Y. Lu, L. Wang, and Y. Qian, “Data augmentation for end-to-end code-switching speech recognition,” Proceedings of SLT ’21, pp.194–200, Jan. 2021.53林 知樹,“End-to-End音声処理の概要とESPnet2を用いたその実践,” 日本音響学会誌,76巻,12号,pp.720–729,Dec. 2020.54河原 達也,“音声認識技術の変遷と最先端 − 深層学習によるEnd-to-Endモデル −,” 日本音響学会誌,74巻,7号,pp.381–386,July 2018.55I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Proceedings of NIPS ’14, vol.27, Dec. 2014.56A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Proceedings of NeurlIPS ’17, Dec. 2017.57A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, “Connection-ist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” Proceedings of ICML ’06, pp.369–376, June 2006.58D. Bahdanau, K. H. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” Proceedings of ICLR ’15, May 2015.59S. Toshniwal, A. Kannan, C. Chiu, Y. Wu, T. N. Sainath and K. Livescu, “A comparison of techniques for language model integration in encod-er-decoder speech recognition,” Proceedings of SLT ’18, pp.369–375, Dec. 2018.60Y. He, T. N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao, D. Rybach, A. Kannan, Y. Wu, R. Pang, Q. Liang, D. Bhatia, Y. Shang-guan, B. Li, G. Pundak, K. C. Sim, T. Bagby, S. Chang, K. Rao, and A. Gruenstein, “Streaming end-to-end speech recognition for mobile devices,” in Proceedings of ICASSP ’19, pp.12-17, May 2019.61A. Graves and Nav. Jaitly, “Towards end-to-end speech recognition with recurrent neural networks,” Proceedings of ICML ’2014, pp.1764–1772, June 2014.62H. Sak, A. Senior, K. Rao, O. Irsoy, A. Graves, F. Beaufays, and J. Schalkwyk, “Learning acoustic frame labeling for speech recognition with recurrent neural networks,” Proceedings of ICASSP ’15, pp.4280–4284, April 2015.63J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “At-tention-based models for speech recognition,” Proceedings NIPS ’15, pp.577–585, Dec. 2015.64N. Moritz, T. Hori, and J. L. Roux, “Triggered attention for end-to-end speech recognition,” Proceedings of ICASSP ’19, pp.5666-5670, May 2019.65S. Watanabe, T. Hori, S. Kim, J. R. Hershey, and T. Hayashi, “Hybrid CTC/attention architecture for end-to-end speech recognition,” IEEE Journal of Selected Topics in Signal Processing, vol.11, no.8, pp.1240–1253, Dec. 2017.66A. Gulati, J. Qin, C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and, R. Pang, “Convolution-augmented transformer for speech recognition,” Proceedings of Interspeech ’20, pp.5036–5040, Oct. 2020.67Q. Zhang, H. Lu, H. Sak, A. Tripathi, E. McDermott, S. Koo, and S. Kumar, “Transformer transducer: A streamable speech recognition model with transformer encoders and RNN-T loss,” Proceedings of ICASSP ’20, pp.7829–7833, May 2020.68総務省,“グローバルコミュニケーション計画,” https://www.soumu.go.jp/main_content/000285578.pdf, April 2014.69総務省,“グローバルコミュニケーション計画2025,” https://www.soumu.go.jp/main_content/000678485.pdf, March 2020.70H. Yamamoto, Y. Wu, C. Huang, X. Lu, P. R. Dixon, S. Matsuda, C. Hori, and H. Kashioka, “The NICT ASR system for IWSLT2012,” Proceedings of IWSLT ’12, Dec. 2012.71C. Huang, P. R. Dixon, S. Matsuda, Y. Wu, X. Lu, M. Saiko, and C. Hori, “The NICT ASR system for IWSLT 2013,” Proceedings of IWSLT ’13, Dec. 2013.72P. Shen, X. Lu, X. Hu, N. Kanda, M. Saiko and C. Hori, “The NICT ASR system for IWSLT 2014,” Proceedings of IWSLT ’14, Dec. 2014.73Masakiyo Fujimoto, “Factored deep convolutional neural networks for noise robust speech recognition,” Proceedings of Interspeech ’17, pp.3837–3841, Aug. 2017.74Masakiyo Fujimoto and Hisashi Kawai, “Comparative evaluations of various factored deep convolutional RNN architectures for noise robust speech recognition,” Proceedings of ICASSP ’18, pp.4829–4833, April 2018.75Masakiyo Fujimoto and Hisashi Kawai, “Noise robust acoustic modeling for single-channel speech recognition based on a stream-wise trans-former architecture,” Proceedings of Interspeech ’21, pp.281–285, Sept. 2021.76総務省,“聴覚障害者放送視聴支援緊急対策事業,” https://www.soumu.go.jp/menu_news/s-news/01ryutsu09_02000228.html, March 2019.藤本 雅清 (ふじもと まさきよ)ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター先進的音声技術研究室主任研究員博士(工学)音声音響信号処理、音声認識、機械学習【受賞歴】2003年 日本音響学会 第20回粟屋潔学術奨励賞2011年 情報処理学会 2010年度(平成22年度) 山下記念研究賞2015年 IEEE ASRU, ’15 Best Paper Award Honorable Mention372-2-4 音声認識技術
元のページ ../index.html#43