Submitted:
01 November 2023
Posted:
03 November 2023
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Overall framework
2.2. Text Encoder Module
2.3. Image Encoder Module
2.4. Modal Match Module
2.4.1. Libukui-search module
2.4.2. D-BUS Interaction
3. Results
3.1. Experiment environment
3.2. Classification Results and Analysis
3.2.1. Search for images through text
3.2.2. Search for audio through text
3.2.3. Search for videos through text
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jhingan, G.D.; Manich, M.; Olivo-Marin,J.C.; Guillen, N. Live Cells Imaging and Comparative Phosphoproteomics Uncover Proteins from the Mechanobiome in Entamoeba histolytica. International Journal of Molecular Sciences 2023, 24, 8726. [CrossRef]
- Dehouche, N. Implicit stereotypes in pre-trained classifiers. IEEE Access 2021, 9, 167936–167947. [CrossRef]
- Qin,J.; Fei, L.; Zhang, Z.; Wen, J.; Xu, Y.; Zhang, D. Joint specifics and consistency hash learning for large-scale cross-modal retrieval. IEEE Transactions on Image Processing 2022, 31, 5343–5358. [CrossRef]
- Chen, D.; Wang, M.; Chen, H.; Wu, L.; Qin, J.; Peng, W. Cross-modal retrieval with heterogeneous graph embedding. In Proceedings of the Proceedings of the 30th ACM International Conference on Multimedia, 2022,pp. 3291–3300. [CrossRef]
- Wang, W.; Liu, X.; Yu, J.; Li,J.; Mao, Z.; Li, Z.; Ding, C.; Zhang, C. The Design and Building of openKylin on RISC-V Architecture. In Proceedings of the 2022 15th International Conference on Advanced Computer Theory and Engineering (ICACTE). IEEE, 2022, pp. 88–91. [CrossRef]
- Yang, A.; Pan,J.; Lin,J.; Men, R.; Zhang, Y.; Zhou,J.; Zhou, C. Chinese clip: Contrastive vision-language pretraining in chinese. arXiv preprint arXiv:2211.01335 2022. [CrossRef]
- Pawlaszczyk, D. SQLite. In Mobile Forensics–The File Format Handbook: Common File Formats and File Systems Used in Mobile Devices; Springer, 2022; pp. 129–155.
- Literák, I.; Raab, R.; Škrábal, J.; Vyhnal, S.; Dostál, M.; Matušík, H.; Mako ˇn, K.; Maderiˇc, B.; Spakovszky, P. Dispersal and philopatry in Central European red kites Milvus milvus. Journal of Ornithology 2022, 163, 469–479.
- Galanopoulos, D.; Mezaris, V. Are all combinations equal? Combining textual and visual features with multiple space learning for text-based video retrieval. In Proceedings of the European Conference on Computer Vision. Springer, 2022,pp. 627–643.
- ADLY, A.S.; HEGAZY, I.; ELARIF, T.; Abdelwahab, M. Development of an Effective Bootleg Videos Retrieval System as a Part of Content-Based Video Search Engine. Int. J. Comput 2022, 21, 214–227. [CrossRef]
- Wu, W.; Zhao, Y.; Li, Z.; Li, J.; Zhou, H.; Shou, M.Z.; Bai, X. A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension. arXiv preprint arXiv:2305.03347 2023. [CrossRef]
- Duarte, A.; Albanie, S.; Giró-iNieto, X.; Varol, G. Sign language video retrieval with free-form textual queries. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,pp. 14094–14104.
- Graves, R.L.; Perrone, J.; Al-Garadi, M.A.; Yang, Y.C.; Love,J.S.; O’Connor, K.; Gonzalez-Hernandez, G.; Sarker, A. Thematic analysis of reddit content about buprenorphine-naloxone using manual annotation and natural language processing techniques. Journal of Addiction Medicine 2022. [CrossRef]
- Zhang, G.; Sun, B.; Chen, Z.; Gao, Y.; Zhang, Z.; Li, K.; Yang, W. Diabetic retinopathy grading by deep graph correlation network on retinal images without manual annotations. Frontiers in Medicine 2022, 9, 872214. [CrossRef]
- Dong, X.; Bao,J.; Zhang, T.; Chen, D.; Gu, S.; Zhang, W.; Yuan, L.; Chen, D.; Wen, F.; Yu, N. Clip itself is a strong fine-tuner: Achieving 85.7% and 88.0% top-1 accuracy with vit-band vit-l on imagenet. arXiv preprint arXiv:2212.06138 2022. [CrossRef]
- Gao, Y.; Yu, X. Design and Implementation of Trusted Plug-in Based on Kylin Operating System Platform. In Proceedings of the Journal of Physics: Conference Series. IOP Publishing, 2020, Vol. 1544,p. 012042. [CrossRef]
- Chen, Y.; Ma, M.; Yu, Q.; Du, Z.; Ding, W. Road Bump Outlier Detection of Moving Videos Based on Domestic Kylin Operating System. In Proceedings of the Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications, 2022,pp. 137–143. [CrossRef]
- Bayet, T.; Denis, C.; Bah, A.; Zucker, J.D. Distribution Shift nested in Web Scraping: Adapting MS COCO for Inclusive Data. In Proceedings of the ICML Workshop on Principles of Distribution Shift 2022, 2022.
- Bayet, T.; Denis, C.; Bah, A.; Zucker, J.D. Distribution Shift nested in Web Scraping: Adapting MS COCO for Inclusive Data. In Proceedings of the ICML Workshop on Principles of Distribution Shift 2022, 2022.
- Chun, S.; Kim, W.; Park, S.; Chang, M.; Oh, S.J. ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO—Supplementary Materials—.
- Damen, D.; Doughty, H.; Farinella, G.M.; Furnari, A.; Kazakos, E.; Ma,J.; Moltisanti, D.; Munro, J.; Perrett, T.; Price, W.; et al. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. International Journal of Computer Vision 2022, pp. 1–23. [CrossRef]
- Lin,N.; Cai,M. EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2022: Team HNU-FPV Technical Report. arXiv preprint arXiv:2207.03095 2022. [CrossRef]
- Lin,N.; Cai,M. EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2022: Team HNU-FPV Technical Report. arXiv preprint arXiv:2207.03095 2022. [CrossRef]
- Totare, M.R.; Bembade, S.; Chavan, S.; Dighe, S.; Gajbhiye, P.; Thakur, A. SPEECH TO SPEECH TRANSLATION USING MACHINE LEARNING.
- Luo, H.; Ji, L.; Zhong, M.; Chen,Y.; Lei,W.; Duan, N.; Li, T. Clip4clip: An empirical study of clip for end to end videoclip retrieval and captioning. Neurocomputing 2022, 508, 293–304. [CrossRef]
- Pei, R.; Liu,J.; Li, W.; Shao, B.; Xu, S.; Dai, P.; Lu,J.; Yan, Y. CLIPPING: Distilling CLIP-Based Models with a Student Base for Video-Language Retrieval. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023,pp. 18983–18992.
- Pei, R.; Liu,J.; Li, W.; Shao, B.; Xu, S.; Dai, P.; Lu,J.; Yan, Y. CLIPPING: Distilling CLIP-Based Models with a Student Base for Video-Language Retrieval. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023,pp. 18983–18992.
- Li, K.; Zhang, Y.; Li, K.; Li, Y.; Fu, Y. Image-text embedding learning via visual and textual semantic reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 45, 641–656.
- Gorti, S.K.; Vouitsis, N.; Ma,J.; Golestan, K.; Volkovs, M.; Garg, A.; Yu, G. X-pool: Cross-modal language-video attention for text-video retrieval. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022,pp. 5006–5015.





Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).