Version 1
: Received: 23 January 2020 / Approved: 24 January 2020 / Online: 24 January 2020 (15:03:34 CET)
How to cite:
Zhu, L.; Song, J.; Wei, X.; Jun, L. Adversarial Learning Based Semantic Correlation Representation for Cross-Modal Retrieval. Preprints2020, 2020010288 (doi: 10.20944/preprints202001.0288.v1).
Zhu, L.; Song, J.; Wei, X.; Jun, L. Adversarial Learning Based Semantic Correlation Representation for Cross-Modal Retrieval. Preprints 2020, 2020010288 (doi: 10.20944/preprints202001.0288.v1).
Cite as:
Zhu, L.; Song, J.; Wei, X.; Jun, L. Adversarial Learning Based Semantic Correlation Representation for Cross-Modal Retrieval. Preprints2020, 2020010288 (doi: 10.20944/preprints202001.0288.v1).
Zhu, L.; Song, J.; Wei, X.; Jun, L. Adversarial Learning Based Semantic Correlation Representation for Cross-Modal Retrieval. Preprints 2020, 2020010288 (doi: 10.20944/preprints202001.0288.v1).
Abstract
With the rapid development of Internet and the widely usage of smart devices, massive multimedia data are generated, collected, stored and shared on the Internet. This trend makes cross-modal retrieval problem become a hot issue in this years. Many existing works pay attentions on correlation learning to generate a common subspace for cross-modal correlation measurement, and others uses adversarial learning technique to abate the heterogeneity of multi-modal data. However, very few works combine correlation learning and adversarial learning to bridge the inter-modal semantic gap and diminish cross-modal heterogeneity. This paper propose a novel cross-modal retrieval method, named ALSCOR, which is an end-to-end framework to integrate cross-modal representation learning, correlation learning and adversarial. CCA model, accompanied by two representation model, VisNet and TxtNet is proposed to capture non-linear correlation. Beside, intra-modal classifier and modality classifier are used to learn intra-modal discrimination and minimize the inter-modal heterogeneity. Comprehensive experiments are conducted on three benchmark datasets. The results demonstrate that the proposed ALSCOR has better performance than the state-of-the-arts.
Keywords
Cross-modal retrieval; Adversarial learning; Semantic correlation; Deep learning
Subject
MATHEMATICS & COMPUTER SCIENCE, Information Technology & Data Management
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.