利用に必要な計算資源の規模とそのためのコストも大きくなり、技術移転等を通じた活用を阻むことになる。そこで、現在は、DIRECTで独自に構築した高品質な日本語コーパスを用いて、200億パラメータのBERTの学習を進めている。学習は現在も実行中であるが、従来から使用していた4億パラメータのBERTと比較してより高い精度が得られており、学習された200億パラメータのBERTを、NICTでこれまで開発してきた大規模Web情報分析システムWISDOM X*8 や、高齢者介護支援用マルチモーダル音声対話システムMICSUS*9等のシステムに組み込むことで、性能の向上が期待される。参考文献】【1J. Kaplan, S. McCandlish, et al., “Scaling laws for neural language models,” CoRR, vol.abs/2001.08361, 2020.2J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” Pro-ceedings of the 17th Annual Conference of the North American Chap-ter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp.4171–4186, 2019.3T. B. Brown, B. Mann, N. Ryder, et al., “Language models are few-shot learners,” CoRR, vol.abs/2005.14165, 2020.4M. Shoeybi, M. Patwary, R. Puri, et al., “Megatron-LM: Training multi-billion parameter language models using model parallelism,” CoRR, vol.abs/1909.08053, 2019.5N. Shazeer, Y. Cheng, N. Parmar, et al., “Mesh-TensorFlow: deep learn-ing for supercomputers,” Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), pp.10435–10444, 2018.6A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017), pp.5998–6008, 2017.7A. Paszke, S. Gross, F. Massa, et al., “PyTorch: An imperative style, high-performance deep learning library,” Proceedings of the 33rd Con-ference on Neural Information Processing Systems (NeurIPS 2019), pp.8024–8035, 2019.8D. Narayanan, A. Phanishayee, K. Shi, X. Chen, and M. Zaharia, “Memory-efficient pipeline-parallel DNN training,” CoRR, vol.abs/2006.09503, 2020.9M. Tanaka, K. Taura, T. Hanawa, and K. Torisawa, “Automatic graph partitioning for very large-scale deep learning,” Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2021), pp.1004–1013, 2021.10J.-H. Oh, R. Iida, J. Kloetzer, and K. Torisawa, “BERTAC: Enhancing transformer-based language models with adversarially pretrained con-volutional neural networks,” Proceedings of the the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguis-tics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), pp.2103–2115, 2021.11Y. Huang, Y. Cheng, A. Bapnao, et al., “GPipe: Efficient training of giant neural networks using pipeline parallelism,” CoRR, vol.abs/1811.06965, 2018.12M. Abadi, A. Agarwal, P. Barham, et al., “TensorFlow: Large-scale machine learning on heterogeneous distributed systems.” CoRR, vol.abs/1603.04467, 2016.13Q. Ho, J. Cipar, H. Cui, J. K. Kim, et al., “More effective distributed ML via a stale synchronous parallel parameter server,” Proceedings of the 27th International Conference on Neural Information Processing Sys-tems (NeurIPS 2018), pp.1223–1231, 2013.14K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp.770–778, 2016.15S. Smith, M. Patwary, B. Norick, et al., “Using DeepSpeed and Mega-tron to train Megatron-Turing NLG 530B, a large-scale generative lan-guage model,” CoRR, vol.abs/2201.11990, 2022.16A. Chowdhery, S. Narang, J. Devlin, et al., “PaLM: Scaling language modeling with Pathways,” CoRR, vol.abs/2204.02311, 2022.17S. Black, S. Biderman, E. Hallahan, et al., “GPT-NeoX-20B: An open-source autoregressive language model,” Proceedings of the ACL Workshop on Challenges and Perspectives in Creating Large Language Models, 2022.田仲 正弘 (たなか まさひろ)ユニバーサルコミュニケーション研究所データ駆動知能システム研究センター主任研究員博士(情報学)大規模並列計算【受賞歴】2022年 産経新聞社 第35回 独創性を拓く 先端技術大賞 経済産業大臣賞(社会人部門最優秀賞)2021年 First Place at PyTorch Annual Hackathon 2021 (PyTorch Devel-oper Tools & Libraries category)2016年 公益財団法人通信文化協会 第61回 前島密賞*8https://www.wisdom-nict.jp/*9https://www.youtube.com/watch?v=gCUrC3f9-Go1173-2 自動並列化深層学習ミドルウェアRaNNC
元のページ ../index.html#123