Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Grouped Contrastive Learning of Self-supervised Sentence Representation

Version 1 : Received: 24 July 2023 / Approved: 25 July 2023 / Online: 26 July 2023 (09:04:32 CEST)

A peer-reviewed article of this Preprint also exists.

Wang, Q.; Zhang, W.; Lei, T.; Peng, D. Grouped Contrastive Learning of Self-Supervised Sentence Representation. Appl. Sci. 2023, 13, 9873. Wang, Q.; Zhang, W.; Lei, T.; Peng, D. Grouped Contrastive Learning of Self-Supervised Sentence Representation. Appl. Sci. 2023, 13, 9873.

Abstract

This paper proposes a Grouped Contrastive Learning of self-supervised Sentence Representation (GCLSR), which can learn an effective and meaningful representation of sentences. Previous works maximize the similarity between two vectors to be the objective of contrastive learning, suffering from the high-dimensionality of the vectors. In addition, most previous works have adopted discrete data augmentation to obtain positive samples and directly employed contrastive framework of computer vision to perform contrastive training, which could hamper contrastive training because text data is discrete and sparse compared with image data. To address those issues, we propose a grouped contrastive learning framework, i.e., GCLSR, which divides the high-dimensional feature vector into several groups and respectively computes the groups’ contrastive losses to make use of more local information, eventually obtaining a more fine-grained sentence representation. In addition, in GCLSR, we design a new self-attention mechanism and a continuous as well as partial word vector augmentation (PWVA). For the discrete and sparse text data, the usage of self-attention could help model focus the informative words by measuring the importance of every word in a sentence. By using the PWVA, GCLSR can obtain high-quality positive samples used for contrastive learning. Experimental results demonstrate that our proposed GCLSR achieves an encouraging result on the challenging datasets of the standard semantic textual similarity (STS) task and transfer task.

Keywords

Contrastive Learning; Self-attention; Data Augmentation; Grouped Representation; Unsupervised Learning

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.