予測の正確さの点でも網羅性の点でも改善の余地がある。可能な限り予測誤りを減らすとともに網羅性も更に高めることで、実用的な精度の正規化システムを実現するとともに、入力テキストの正規化処理が、機械翻訳を始めとする応用タスクの精度向上に寄与することを実証したいと考えている。参考文献】【1B. Marie and A. Fujita, “Synthesizing Parallel Data of User-Generated Texts with Zero-Shot Neural Machine Translation,” Transactions of the Association for Computational Linguistics, MIT Press, vol.8, pp.710 –725, 2020.2S. Higashiyama, M. Utiyama, T. Watanabe, and E. Sumita, “User-Gen-erated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguis-tics: Human Language Technologies, pp.5532–5541, June 2021.3S. Higashiyama, M. Utiyama, T. Watanabe, and E. Sumita, “A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization,” Proceedings of the 7th Workshop on Noisy User-generated Text, pp.67–80, Nov. 2021.4T. Ikeda, H. Shindo, and Y. Matsumoto, “Japanese Text Normalization with Encoder-Decoder Model,” Proceedings of the 2nd Workshop on Noisy User-generated Text, pp.129–137, Dec. 2016.5I. Saito, K. Sadamitsu, H. Asano, and Y. Matsuo, “Morphological Analysis for Japanese Noisy Text Based on Character-level and Word-level Normalization,” Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, pp.1773–1782, Aug. 2014.6N. Kaji and M. Kitsuregawa, “Accurate word segmentation and pos tagging for Japanese microblogs: Corpus annotation and joint modeling with lexical normalization,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp99–109, Oct. 2014.7K. Maekawa, M. Yamazaki, T. Ogiso, T. Maruyama, H. Ogura, W. Kashino, H. Koiso, M. Yamaguchi, M. Tanaka, and Y. Den, “Balanced corpus of contemporary written Japanese,” Language Resources and Evaluation, vol.48, pp.345–371, 2014.8池田 和史,柳原 正,松本 一則,滝嶋 康弘,“くだけた表現を高精度に解析するための正規化ルール自動生成手法,” 情報処理学会論文誌データベース,vol.3,no.3,pp.68–77,Sept. 2020.9鍜治 伸裕,森 信介,高橋 文彦,笹田 鉄朗,斉藤 いつみ,服部 圭悟,村脇 有吾,内海 慶,“形態素解析のエラー分析,”言語処理学会第21回年次大会ワークショップ「自然言語処理におけるエラー分析(兼:Project Next NLP報告会)」,March 2013.10T. Kudo, K. Yamamoto, and Y. Matsumoto, “Applying Conditional Ran-dom Fields to Japanese Morphological Analysis,” Proceedings of the 2004 Conference on Empirical Methods in Natural Language Process-ing, pp.230–237, July 2004. 11R. Sasano, S. Kurohashi, and M. Okumura, “A Simple Approach to Unknown Word Processing in Japanese Morphological Analysis,” Pro-ceedings of the 6th International Joint Conference on Natural Language Processing, pp.162–170, Oct. 2013.12伝 康晴,“多様な目的に適した形態素解析システム用電子化辞書,”人工知能,vol.24,no.5,pp.640–646,Sept. 2009.13G. Chrupała. “Normalizing tweets with edit scripts and recurrent neural embeddings,” Proceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume 2: Short Papers), pp.680–686, 2014.14W. Min and B. Mott, “NCSU_SAS_WOOKHEE: A deep contextual long-short term memory model for text normalization,” Proceedings of the Workshop on Noisy User-generated Text, pp.111–119, 2015.15I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks”, Advances in Neural Information Processing Systems, vol.27, Dec. 2014.16S. Hochreiter and J. Schmidhuber. “Long short-term memory,” Neural Computation, vol.9, no.8, pp.1735–1780, 1997.17Z. Huang, W. Xu, and K. Yu. “Bidirectional LSTM-CRF models for se-quence tagging,” Computing Research Repository, arXiv:1508.01991, 2015.東山 翔平 (ひがしやま しょうへい)ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター先進的翻訳技術研究室研究員博士(工学)自然言語処理【受賞歴】2021年 The 7th Workshop on Noisy User-generated Text, Best Paper Award2021年 言語処理学会 2020年度論文賞2014年 2014 International Conference on Computer & Information Sciences, Best Paper Award72 情報通信研究機構研究報告 Vol.68 No.2 (2022)2 多言語コミュニケーション技術
元のページ ../index.html#78