HTML5 Webook
170/194

word-visual attention pre-trained model for transforming the query and the dataset to the same space (i.e., joint representative space). We use available text-image pairs open datasets with our datasets collected during interactive querying with users for downstream the pre-trained model to adapt to our domain. Hence, when users input their textual queries, we utilize our crossmodal text-image to nd a set of sample images used again as visual queries to search over the dashcam dataset.We use various datasets gathered from public sources and created by ourselves. We asked two volunteers to label data and created a structured data set Dk = {(Ik; [Tki])} where Ik denotes ith image, and [Tki] represents a list of captions similar with that image. In our dataset, we have only a set of Dk = {(Ik; [Tki])} and in practice, we also use a dataset from RetroTruck and I4W datasets with a set of 5 captions for each image to generate the pre-trained weight set.Table 2 shows that the system has good productivity when users almost found their results in the rst round with the average of P@10 as 7.18 (i.e., seven relevant results over ten retrieved results at the rst try). Statistically, the system works well when nding the expected results within 15 loops with naïve users and 10 loops with expert users. Besides, the simulation results conrm the interac-tive GUI’s an advantage when decreasing the P@K from 200 to 10. Figure 6 illustrates one example of events re-trieved by our system with dierent dicult levels of se-ig. F5 MM-trafficEvent: A cross-modal multi-head attention model ig. F6MM-trafficEvent: A sample of query-result outputs (a) Q: “find an accident made by a van and a red bus” [semantic level = easy], (b) Q: “find an accident where a white truck hit a white van from behind” [semantic level = complex], (c) Q: “find a moment a truck stop closed to the zebra zone where a lot of pedestrians and bicycles are crossing” [semantic level = most complex]. (a)(b) from I4W and RetroTruck datasets, (c) from our datasetable T2 Incident querying results using MM-trafficEvent model [19]164   情報通信研究機構研究報告 Vol.68 No.2 (2022)4 スマートデータ利活用基盤技術

元のページ  ../index.html#170

このブックを見る