Wang, Q.; Zhang, W.; Lei, T.; Peng, D. Grouped Contrastive Learning of Self-Supervised Sentence Representation. Appl. Sci.2023, 13, 9873.
Wang, Q.; Zhang, W.; Lei, T.; Peng, D. Grouped Contrastive Learning of Self-Supervised Sentence Representation. Appl. Sci. 2023, 13, 9873.
Wang, Q.; Zhang, W.; Lei, T.; Peng, D. Grouped Contrastive Learning of Self-Supervised Sentence Representation. Appl. Sci.2023, 13, 9873.
Wang, Q.; Zhang, W.; Lei, T.; Peng, D. Grouped Contrastive Learning of Self-Supervised Sentence Representation. Appl. Sci. 2023, 13, 9873.
Abstract
This paper proposes a Grouped Contrastive Learning of self-supervised Sentence Representation (GCLSR), which can learn an effective and meaningful representation of sentences. Previous works maximize the similarity between two vectors to be the objective of contrastive learning, suffering from the high-dimensionality of the vectors. In addition, most previous works have adopted discrete data augmentation to obtain positive samples and directly employed contrastive framework of computer vision to perform contrastive training, which could hamper contrastive training because text data is discrete and sparse compared with image data. To address those issues, we propose a grouped contrastive learning framework, i.e., GCLSR, which divides the high-dimensional feature vector into several groups and respectively computes the groups’ contrastive losses to make use of more local information, eventually obtaining a more fine-grained sentence representation. In addition, in GCLSR, we design a new self-attention mechanism and a continuous as well as partial word vector augmentation (PWVA). For the discrete and sparse text data, the usage of self-attention could help model focus the informative words by measuring the importance of every word in a sentence. By using the PWVA, GCLSR can obtain high-quality positive samples used for contrastive learning. Experimental results demonstrate that our proposed GCLSR achieves an encouraging result on the challenging datasets of the standard semantic textual similarity (STS) task and transfer task.
Keywords
Contrastive Learning; Self-attention; Data Augmentation; Grouped Representation; Unsupervised Learning
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.