HTML5 Webook
61/194

29N. Li, S. Liu, Y. Liu, S. Zhao, M. Liu, and M. Zhou, “Neural speech synthesis with Transformer network,” Proc. AAAI, pp.6706–6713, Jan. 2019.30R. Prenger, R. Valle, and B. Catanzaro, “WaveGlow: A flow-based gen-erative network for speech synthesis,” Proc. ICASSP, pp.3617–3621, May 2019.31Y. Ren, Y. Ruan, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y. Liu, “Fast-Speech: Fast, robust and controllable text to speech,” Proc. NeurIPS, pp.3165–3174, Dec. 2019.32I. Elias, H. Zen, J. Shen, Y. Zhang, Y. Jia, R. Skerry-Ryan, and Y. Wu, “Parallel Tacotron 2: A non-autoregressive neural TTS model with dif-ferentiable duration modeling,” Proc. Interspeech, pp.141–145, Aug. 2021.33J. Kong, J. Kim, and J. Bae, “HiFi-GAN: Generative adversarial net-works for efficient and high fidelity speech syn- thesis,” Proc. NeurIPS, pp.17022–17033, Dec. 2020.34N. Kalchbrenner, E. Elsen, K. Simonyan, S. Noury, N. Casagrande, E. Lockhart, F. Stimberg, A. van den Oord, S. Dieleman, and K. Ka-vukcuoglu, “Efficient neural audio synthesis,” Proc. ICML, pp.2415–2424, July 2018.35J.-M. Valin and J. Skoglund, “LPCNet: Improving neural speech syn-thesis through linear prediction,” Proc. ICASSP, pp.5826–7830, May 2019.36Y.-C. Wu, P. L. Tobing, T. Hayashi, K. Kobayashi, and T. Toda, “Non-parallel voice conversion system with WaveNet vocoder and collapsed speech suppression,” IEEE Access, vol.8, pp.62094–62106, 2020.37D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling, “Improved variational inference with inverse autoregressive flow,” Proc. NIPS, pp.4743–4751, Dec. 2016.38W. Ping, K. Peng, and J. Chen, “ClariNet: Parallel wave generation in end-to-end text-to-speech,” Proc. ICLR, May 2019.39D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” Proc. ICML, pp.1530–1538, July 2015.40I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Proc. NIPS, pp.2672–2680, Dec. 2014.41J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Proc. NeurIPS, Dec. 2020.42N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan, “WaveGrad: Estimating gradients for waveform generation,” Proc. ICLR, May 2021.43Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “DiffWave: A versatile diffusion model for audio synthesis,” Proc. ICLR, May 2021.44J. Kim, J. Kong, and J. Son, “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” Proc. ICML, pp.5530–5540, July 2021.45E. Casanova, J. Weber, C. D. Shulby, A. C. Junior, E. Gölge, and M. A. Ponti, “YourTTS: Towards zero-shot multi- speaker TTS and zero-shot voice conversion for everyone,” Proc. ICML, pp.2709–2720, July 2022.46Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, and T. Toda, “Quasi-Peri-odic Parallel WaveGAN: A non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol.29, pp.792–806, 2021.47俵 直弘, “話者認識システムとなりすまし対策,”日本音響学会誌、vol.78, no.6, pp.338–346, June 2022.岡本 拓磨 (おかもと たくま)ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター先進的音声技術研究室主任研究員博士(情報科学)音場制御、音声合成【受賞歴】2022年 日本音響学会 第9回学会活動貢献賞2018年 日本音響学会 第57回佐藤論文賞2012年 日本音響学会 第32回粟屋潔学術奨励賞552-2-6 ニューラル音声合成技術

元のページ  ../index.html#61

このブックを見る