as, Lmin, (3)where L is the multi-class cross-entropy classication loss in source domain, and L, is the OT loss to measure the distribution distance between source and target data samples.Based on our proposed unsupervised adaptation learn-ing, we expect that the mismatch between training and testing will be reduced. Fig. 5 illustrates the eect of unsu-pervised adaptation on language cluster distributions. In this gure, only two languages are shown with label IDs as lang1 and lang5 from a testing data set. In Fig. 5(a), due to the dierence in recording channels, the clusters belong-ing to the same language are separated (pairs of lang5 train vs. lang5 test and lang1 train vs. lang1 test). Since the classier is designed based on the training data set, it is not strange that the performance of the baseline system on the testing data set is degraded. Aer adaptation, as shown in Fig. 5(b), the clusters of the testing data set are pushed to be overlapped with those of the training data set for the same language.Excepted the unsupervised approach, we also proposed to use linguistic features to improve the robustness of the LID tasks [15]. For LID tasks, not only acoustic features, such as phonotactics information, but also linguistic fea-tures, such as contextual information, are important cues to determine a language [1]. erefore, we proposed a novel transducer-based language embedding approach by integrating an RNN transducer (RNN-T) model into a language embedding extraction framework that is illus-trated in Fig. 6. Beneting from the advantages of the RNN-T’s linguistic representation capability and the pro-posed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. Our experimental results showed that compared with the con-former encoder-based baseline method, the proposed method obtained 38% and 28% relative improvement on in-domain and cross-domain datasets, respectively.Dense layerDense layerL-NormsoftmaxClass labelsX-vectorExtraction|Dense layerDense layerL-NormsoftmaxClass labelsX-vectorExtraction|ig. F4The proposed unsupervised OT-based adaptation neural network for LID(a)(b)ig. F5Language cluster distributions based on the TSNE [9] for a test set in cross-domain language recognition task: before adaptation (a), and after adaptation (b)42 情報通信研究機構研究報告 Vol.68 No.2 (2022)2 多言語コミュニケーション技術
元のページ ../index.html#48