Submitted:
21 February 2025
Posted:
24 February 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Historical Perspective and Foundations
3. Major Methodologies in Self-Supervised Learning
3.1. Contrastive Learning
- MoCo : Momentum Contrast (MoCo) utilizes a momentum-based encoder to maintain a dynamic dictionary of negative samples, thereby stabilizing contrastive learning and enabling learning with large-scale datasets [37].
- BYOL : Unlike traditional contrastive learning methods, Bootstrap Your Own Latent (BYOL) eliminates the need for negative samples by employing a teacher-student framework with momentum updates [38].
- SimSiam : This method further refines contrastive learning by demonstrating that even without negative samples or momentum encoders, meaningful representations can be learned using a stop-gradient mechanism [39].
3.2. Clustering-Based Approaches
- DeepCluster : This method performs iterative k-means clustering in feature space, updating both feature representations and cluster assignments in an alternating fashion [44].
- SwAV : Swapping Assignments between Views (SwAV) introduces a novel clustering-based approach where representations are learned by solving a swapped prediction problem across different augmentations [45].
- SeLa : Self-Labelling (SeLa) uses optimal transport techniques to assign cluster labels, improving the stability and effectiveness of clustering-based learning [46].
3.3. Generative Approaches
- Autoencoders : Autoencoders and their variants, such as Variational Autoencoders (VAEs), learn compressed representations by encoding data into a latent space and reconstructing it from that space [49].
- Generative Adversarial Networks (GANs) : GANs train a generator and discriminator in an adversarial framework, leading to the learning of high-quality generative representations [50].
- Masked Image Modeling : Methods such as MAE (Masked Autoencoder) extend BERT-like pretraining ideas to computer vision by masking image patches and training models to reconstruct them.
3.4. Predictive Learning Techniques
- BERT : The Bidirectional Encoder Representations from Transformers (BERT) model introduced masked language modeling (MLM), where certain tokens are masked and the model is trained to predict them.
- GPT : The Generative Pretrained Transformer (GPT) series follows an autoregressive approach, predicting the next token given previous tokens [54].
- wav2vec : Applied to speech processing, wav2vec pretrains models by learning to predict masked audio segments from their context [55].
- Self-Supervised Learning for Robotics : Predictive tasks in robotics involve learning future states or dynamics to enable better control policies [56].
3.5. Comparison of SSL Approaches
4. Applications of Self-Supervised Learning
4.1. Computer Vision
- Image Classification: Self-supervised models such as SimCLR , MoCo , and SwAV have achieved performance on par with fully supervised models on image classification tasks by leveraging large-scale unlabeled datasets such as ImageNet.
- Object Detection and Segmentation: Pretrained self-supervised models provide strong feature representations that transfer well to downstream tasks like object detection (e.g., Faster R-CNN, YOLO) and segmentation (e.g., Mask R-CNN, DeepLab) [65].
- Video Understanding: Self-supervised video models use temporal consistency and frame prediction to learn representations for action recognition, video retrieval, and anomaly detection in surveillance systems [68].
4.2. Natural Language Processing
- Text Understanding and Generation: Transformer-based models such as BERT , RoBERTa , and GPT utilize self-supervised objectives like masked language modeling and autoregressive text prediction to achieve state-of-the-art performance in tasks such as sentiment analysis, machine translation, and text summarization [72].
- Question Answering and Chatbots: Self-supervised models have significantly improved natural language understanding in conversational AI systems, powering advanced chatbots and virtual assistants [73].
- Semantic Search and Information Retrieval: SSL has enhanced document ranking, question-answering retrieval systems, and knowledge extraction in search engines [74].
- Biomedical and Legal Text Analysis: SSL models have been fine-tuned for specialized domains such as bioinformatics (BioBERT ) and legal document processing, where labeled data is scarce [75].
4.3. Speech and Audio Processing
- Speech Recognition: Models like wav2vec 2.0 use contrastive pretraining on raw audio waveforms, achieving significant improvements in automatic speech recognition (ASR) without requiring transcribed speech data [77].
- Speaker Identification and Verification: SSL-based embeddings such as HuBERT enhance speaker recognition and verification systems in real-world applications like voice authentication.
- Music and Sound Classification: SSL models have been applied to tasks such as music genre classification, environmental sound recognition, and audio event detection, improving generalization across diverse datasets [78].
- Speech Enhancement and Denoising: Self-supervised representations help in denoising speech signals, making speech processing models more robust in noisy environments [79].
4.4. Robotics and Reinforcement Learning
- Robot Perception: Self-supervised vision models allow robots to understand and interpret their surroundings, facilitating object recognition, depth estimation, and scene understanding [81].
- Control and Policy Learning: SSL is used in RL to pretrain representations for state estimation and action prediction, improving sample efficiency in robotic control tasks [82].
- Autonomous Navigation: Self-supervised methods enable robots and autonomous vehicles to learn from unlabeled sensor data, enhancing obstacle avoidance and motion planning [83].
- Grasping and Manipulation: SSL has been applied to robotic grasping, allowing robots to learn object interactions through self-generated experiences .
4.5. Healthcare and Biomedical Applications
- Disease Diagnosis and Prognosis: SSL-based models have been applied to X-ray, MRI, and histopathology image analysis, improving diagnostic accuracy while reducing the need for labeled medical images .
- Drug Discovery: SSL has been used to predict molecular properties, accelerating drug discovery by leveraging vast amounts of unlabeled chemical compound data [85].
- Genomics and Bioinformatics: Self-supervised techniques have enabled better representation learning in genomics, leading to improvements in gene sequence analysis and protein structure prediction .
4.6. Finance and Anomaly Detection
- Fraud Detection: Self-supervised models can learn transaction patterns and identify anomalies in financial transactions, improving fraud detection in banking and e-commerce [87].
- Stock Market Prediction: SSL has been used to pretrain models for financial forecasting, leveraging unlabeled market data to identify trends and patterns.
- Cybersecurity: Self-supervised anomaly detection methods help detect unusual network activity and security breaches with minimal labeled data [88].
4.7. Scientific Research and Other Domains
- Astronomy: SSL models help analyze vast amounts of unlabeled astronomical data for galaxy classification and cosmic event detection.
- Climate Science: Self-supervised models are being used to improve climate predictions and analyze satellite imagery for environmental monitoring.
- Material Science: SSL assists in predicting material properties, accelerating discoveries in chemistry and physics [90].
4.8. Summary
5. Challenges and Future Directions
5.1. Challenges in Self-Supervised Learning
5.1.1. Designing Effective Pretext Tasks
- Contrastive learning relies on negative samples, which can be difficult to define optimally [97].
- Clustering-based methods require careful initialization and hyperparameter tuning [98].
- Predictive modeling approaches may lead to trivial solutions where the model learns to exploit shortcuts instead of meaningful representations [99].
5.1.2. Computational Costs and Scalability
5.1.3. Evaluation and Benchmarking
5.1.4. Domain Adaptation and Generalization
5.1.5. Robustness to Noisy and Biased Data
5.1.6. Lack of Theoretical Understanding
5.2. Future Directions in Self-Supervised Learning
5.2.1. Beyond Contrastive Learning: Towards More Efficient Methods
- Novel architectures that reduce the dependency on contrastive loss.
- Hybrid SSL approaches that combine contrastive, clustering, and generative methods.
- Self-distillation techniques for improving SSL efficiency [122].
5.2.2. Self-Supervised Learning for Multimodal Data
5.2.3. Self-Supervised Learning for Small Data Regimes
5.2.4. Integrating SSL with Supervised and Reinforcement Learning
5.2.5. Towards More Human-Like Learning
5.3. Summary
6. Conclusion
References
- Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the CIKM; 2019; pp. 1441–1450. [Google Scholar]
- Bach, F.; Jenatton, R.; Mairal, J.; Obozinski, G. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 2012, 4, 1–106. [Google Scholar] [CrossRef]
- Denton, E.L.; Chintala, S.; Fergus, R.; et al. Deep generative image models using a laplacian pyramid of adversarial networks. In Proceedings of the Neural Inf. Process. Syst. 2015; pp. 1486–1494. [Google Scholar]
- Tran, D.; Ranganath, R.; Blei, D.M. Hierarchical Implicit Models and Likelihood-Free Variational Inference. In Proceedings of the Neural Inf. Process. Syst. 2017; pp. 2794–2802. [Google Scholar]
- Xu, H.; Zhou, Z.; Qiao, Y.; Kang, W.; Wu, Q. Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation. In Proceedings of the AAAI Conf.Artif. Intell. 2021; pp. 3030–3038. [Google Scholar]
- Hoang, Q.; Nguyen, T.D.; Le, T.; Phung, D. MGAN: Training Generative Adversarial Nets with Multiple Generators. In Proceedings of the Int. Conf. Learn. Represent. 2018; pp. 1–24. [Google Scholar]
- Wu, H.; Zheng, S.; Zhang, J.; Huang, K. Gp-gan: Towards realistic high-resolution image blending. In Proceedings of the ACM Int. Conf. Multimedia; 2019; pp. 2487–2495. [Google Scholar]
- Li, C.L.; Chang, W.C.; Cheng, Y.; Yang, Y.; Póczos, B. Mmd gan: Towards deeper understanding of moment matching network. In Proceedings of the Neural Inf. Process. Syst. 2017; pp. 2203–2213. [Google Scholar]
- Wang, X.; Tang, X. Face photo-sketch synthesis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 1955–1967. [Google Scholar] [CrossRef]
- Frey, B.J.; Hinton, G.E.; Dayan, P. Does the wake-sleep algorithm produce good density estimators? In Proceedings of the Neural Inf. Process. Syst. 1996; pp. 661–667. [Google Scholar]
- Zhou, X.; Zhou, H.; Liu, Y.; Zeng, Z.; Miao, C.; Wang, P.; You, Y.; Jiang, F. Bootstrap latent representations for multi-modal recommendation. In Proceedings of the WWW; 2023; pp. 845–854. [Google Scholar]
- Liu, Z.; Gui, J.; Luo, H. Good helper is around you: Attention-driven Masked Image Modeling. In Proceedings of the AAAI Conf.Artif. Intell. 2023; pp. 1799–1807. [Google Scholar]
- Liu, M.; Ding, Y.; Xia, M.; Liu, X.; Ding, E.; Zuo, W.; Wen, S. STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2019; pp. 3673–3682. [Google Scholar]
- Wei, C.; Xie, L.; Ren, X.; Xia, Y.; Su, C.; Liu, J.; Tian, Q.; Yuille, A.L. Iterative reorganization with weak spatial constraints: Solving arbitrary jigsaw puzzles for unsupervised representation learning. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2019; pp. 1910–1919. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Neural Inf. Process. Syst. 2017; pp. 6626–6637. [Google Scholar]
- Liang, X.; Lee, L.; Dai, W.; Xing, E.P. Dual motion gan for future-flow embedded video prediction. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2017; pp. 1744–1752. [Google Scholar]
- Vapnik, V. The nature of statistical learning theory; Springer Science & Business Media, 2013.
- Chen, Y.; Lai, Y.K.; Liu, Y.J. Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2018; pp. 9465–9474. [Google Scholar]
- Wang, G.; Manhardt, F.; Shao, J.; Ji, X.; Navab, N.; Tombari, F. Self6D: Self-Supervised Monocular 6D Object Pose Estimation. In Proceedings of the Eur. Conf. Comput. Vis. 2020. [Google Scholar]
- Yan, S.; Xu, X.; Xu, D.; Lin, S.; Li, X. Image Classification with Densely Sampled Image Windows and Generalized Adaptive Multiple Kernel Learning. IEEE Transactions on Cybernetics.
- Chen, N.; Zhu, J.; Xing, E.P. Predictive subspace learning for multi-view data: a large margin approach. In Proceedings of the NIPS; 2010; pp. 361–369. [Google Scholar]
- Wu, X.D.; Yu, K.; Ding, W.; Wang, H.; Zhu, X.Q. Online Feature Selection with Streaming Features. IEEE Transactions on Pattern Analysis and Machine Intelligence 2013, 35, 1178–1192. [Google Scholar] [PubMed]
- Meng, Y.; Xiong, C.; Bajaj, P.; Bennett, P.; Han, J.; Song, X.; et al. Coco-lm: Correcting and contrasting text sequences for language model pretraining. Neural Inf. Process. Syst. 2021, 34, 23102–23114. [Google Scholar]
- Thomas, M.; Cover, T. Elements of information theory. Wiley-Interscience, 2nd edition, 2006. [Google Scholar]
- Li, X.; Gui, J.; Li, P. Random Fourier Features for Kernel Multi-view Discriminant Analysis. In Proceedings of the European Conference on Artificial Intelligence; 2020. [Google Scholar]
- Xia, L.; Huang, C.; Xu, Y.; Zhao, J.; Yin, D.; Huang, J. Hypergraph contrastive collaborative filtering. In Proceedings of the SIGIR; 2022; pp. 70–79. [Google Scholar]
- Zhang, Y.; Yeung, D.Y. A convex formulation for learning task relationships in multi-task learning. In Proceedings of the Conference on Uncertainty in Artificial Intelligence; 2010; pp. 733–742. [Google Scholar]
- Kim, J.; Monteiro, R.D.; Park, H. Group sparsity in nonnegative matrix factorization. In Proceedings of the SIAM International Conference on Data Mining; 2012. [Google Scholar]
- Girdhar, R.; Fouhey, D.F.; Rodriguez, M.; Gupta, A. Learning a predictable and generative vector representation for objects. In Proceedings of the Eur. Conf. Comput. Vis. 2016; pp. 484–499. [Google Scholar]
- Li, N.; Guo, G.D.; Chen, L.F.; Chen, S. Optimal subspace classification method for complex data. International Journal of Machine Learning and Cybernetics 2013, 4, 163–171. [Google Scholar] [CrossRef]
- Yu, J.; Yin, H.; Xia, X.; Chen, T.; Li, J.; Huang, Z. Self-Supervised Learning for Recommender Systems: A Survey. arXiv preprint arXiv:2203.15876, 2022. [Google Scholar]
- Arora, S.; Ge, R.; Liang, Y.; Ma, T.; Zhang, Y. Generalization and equilibrium in generative adversarial nets (gans). In Proceedings of the Int. Conf. Mach. Learn. 2017; pp. 224–232. [Google Scholar]
- Jia, W.; Hu, R.X.; Zhao, Y.; Gui, J.; Zhu, Y.H. Palmprint Recognition Using Band-Limited Minimum Average Correlation Energy Filter. In Proceedings of the International Conference on Hand-Based Biometrics; 2011; pp. 1–6. [Google Scholar]
- Zhou, D.; Burges, C.J. Spectral clustering and transductive learning with multiple views. In Proceedings of the Int. Conf. Mach. Learn. 2007; pp. 1159–1166. [Google Scholar]
- Bishop, C.M. Pattern recognition and machine learning; Springer, New York, 2006.
- El-Nouby, A.; Sharma, S.; Schulz, H.; Hjelm, D.; El Asri, L.; Kahou, S.E.; Bengio, Y.; Taylor, G.W. Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2019; pp. 10303–10311. [Google Scholar]
- Tschannen, M.; Djolonga, J.; Ritter, M.; Mahendran, A.; Houlsby, N.; Gelly, S.; Lucic, M. Self-Supervised Learning of Video-Induced Visual Invariances. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2020; pp. 13806–13815. [Google Scholar]
- Xia, L.; Kao, B.; Huang, C. OpenGraph: Towards Open Graph Foundation Models. arXiv preprint arXiv:2403.01121, 2024. [Google Scholar]
- Hu, K.; Shao, J.; Liu, Y.; Raj, B.; Savvides, M.; Shen, Z. Contrast and order representations for video self-supervised learning. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2021; pp. 7939–7949. [Google Scholar]
- Wang, J.; Kumar, S.; Chang, S.F. Semi-supervised hashing for large-scale search. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2393–2406. [Google Scholar] [CrossRef]
- Haykin, S. Neural networks and learning machines; Prentice Hall, 2008.
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2017; pp. 1222–1230. [Google Scholar]
- Maaten, L.v.d.; Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research 2008, 9, 2579–2605. [Google Scholar]
- Pan, J.; Dong, J.; Liu, Y.; Zhang, J.; Ren, J.; Tang, J.; Tai, Y.W.; Yang, M.H. Physics-based generative adversarial models for image restoration and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2449–2462. [Google Scholar] [CrossRef]
- Wang, W.Y.; Mazaitis, K.; Cohen, W.W. A Soft Version of Predicate Invention Based on Structured Sparsity. In Proceedings of the Int. Joint Conf. Artif. Intell. 2015; pp. 3918–3924. [Google Scholar]
- Chen, T.; Luo, C.; Li, L. Intriguing Properties of Contrastive Losses. In Proceedings of the Neural Inf. Process. Syst. Curran Associates, Inc. 2021; 34, pp. 11834–11845. [Google Scholar]
- Naikal, N.; Yang, A.Y.; Sastry, S.S. Informative feature selection for object recognition via sparse PCA. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2011; pp. 818–825. [Google Scholar]
- Dai, Z.; Yang, Z.; Yang, F.; Cohen, W.W.; Salakhutdinov, R.R. Good semi-supervised learning that requires a bad gan. In Proceedings of the Neural Inf. Process. Syst. 2017; pp. 6510–6520. [Google Scholar]
- Hu, Z.; Dong, Y.; Wang, K.; Chang, K.W.; Sun, Y. GPT-GNN: Generative Pre-Training of Graph Neural Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2020; pp. 1857–1867. [Google Scholar]
- Berthelot, D.; Schumm, T.; Metz, L. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Xiong, B.; Girshick, R.; He, K. A large-scale study on unsupervised spatiotemporal representation learning. In Proceedings of the Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit; 2021; pp. 3299–3309. [Google Scholar]
- Chua, T.S.; Tang, J.; Hong, R.; Li, H.; Luo, Z.; Zheng, Y.T. NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In Proceedings of the ACM Conference on Image and Video Retrieval; 2009; pp. 1–9. [Google Scholar]
- Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy; 2017; pp. 39–57. [Google Scholar]
- Peng, H.C.; Long, F.H.; Ding, C. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Tao, Y.; Gao, M.; Yu, J.; Wang, Z.; Xiong, Q.; Wang, X. Predictive and contrastive: Dual-auxiliary learning for recommendation. TCSS 2022. [Google Scholar] [CrossRef]
- Liu, Z.Q.; Lin, S.L.; Tan, M.T. Sparse Support Vector Machines with Lp Penalty for Biomarker Identification. IEEE-ACM Transactions on Computational Biology and Bioinformatics 2010, 7, 100–107. [Google Scholar] [PubMed]
- Frey, B.J.; Brendan, J.F.; Frey, B.J. Graphical models for machine learning and digital communication; MIT press, 1998.
- Liu, W.; Wang, J.; Ji, R.; Jiang, Y.G.; Chang, S.F. Supervised hashing with kernels. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2012; pp. 2074–2081. [Google Scholar]
- Spurr, A.; Aksan, E.; Hilliges, O. Guiding infogan with semi-supervision. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer; 2017; pp. 119–134. [Google Scholar]
- Cai, X.; Wang, C.; Xiao, B.; Chen, X.; Zhou, J. Regularized Latent Least Square Regression for Cross Pose Face Recognition. In Proceedings of the IJCAI; 2013; pp. 1247–1253. [Google Scholar]
- Xie, Y.; Wang, Z.; Ji, S. Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising. In Proceedings of the Neural Inf. Process. Syst. 2020. [Google Scholar]
- Wu, Y.; Xie, R.; Zhu, Y.; Ao, X.; Chen, X.; Zhang, X.; Zhuang, F.; Lin, L.; He, Q. Multi-view multi-behavior contrastive learning in recommendation. In Proceedings of the DASFAA. Springer; 2022; pp. 166–182. [Google Scholar]
- Lin, K.; Li, D.; He, X.; Zhang, Z.; Sun, M.T. Adversarial ranking for language generation. In Proceedings of the Neural Inf. Process. Syst. 2017; pp. 3155–3165. [Google Scholar]
- Rusu, A.A.; Rabinowitz, N.C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; Hadsell, R. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016. [Google Scholar]
- Goodfellow, I. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2017. [Google Scholar]
- Zhai, D.; Liu, X.; Ji, X.; Zhao, D.; Satoh, S.; Gao, W. Supervised distributed hashing for large-scale multimedia retrieval. IEEE Transactions on Multimedia 2017, 20, 675–686. [Google Scholar] [CrossRef]
- Jia, W.; Cai, H.Y.; Gui, J.; Hu, R.X.; Lei, Y.K.; Wang, X.F. Newborn footprint recognition using orientation feature. Neural Computing and Applications 2012, 21, 1855–1863. [Google Scholar] [CrossRef]
- Zhao, J.; Xiong, L.; Li, J.; Xing, J.; Yan, S.; Feng, J. 3d-aided dual-agent gans for unconstrained face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2380–2394. [Google Scholar] [CrossRef]
- Truong, Q.T.; Salah, A.; Lauw, H.W. Bilateral variational autoencoder for collaborative filtering. In Proceedings of the WSDM; 2021; pp. 292–300. [Google Scholar]
- Zniyed, Y.; Nguyen, T.P.; et al. Enhanced network compression through tensor decompositions and pruning. IEEE Transactions on Neural Networks and Learning Systems 2024. [Google Scholar]
- Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A kernel two-sample test. Journal of Machine Learning Research 2012, 13, 723–773. [Google Scholar]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. A review of feature selection methods on synthetic data. Knowledge and information systems 2013, 34, 483–519. [Google Scholar] [CrossRef]
- Li, J.; Tao, D. Simple exponential family PCA. IEEE Transactions on Neural Networks and Learning Systems Mar. 2013, 24, 485–497. [Google Scholar]
- Yan, X.; Misra, I.; Gupta, A.; Ghadiyaram, D.; Mahajan, D. ClusterFit: Improving Generalization of Visual Representations. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2020; pp. 6509–6518. [Google Scholar]
- Duan, R.; Ma, X.; Wang, Y.; Bailey, J.; Qin, A.K.; Yang, Y. Adversarial camouflage: Hiding physical-world attacks with natural styles. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2020; pp. 1000–1008. [Google Scholar]
- Zhu, Y.; Elhoseiny, M.; Liu, B.; Peng, X.; Elgammal, A. A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2018; pp. 1004–1013. [Google Scholar]
- Bian, W.; Tao, D. Constrained Empirical Risk Minimization Framework for Distance Metric Learning. IEEE Transactions on Neural Networks and Learning Systems Aug. 2012, 23, 1194–1205. [Google Scholar] [CrossRef]
- Zhang, M.; Ding, C.; Zhang, Y.; Nie, F. Feature Selection at the Discrete Limit. In Proceedings of the AAAI Conf.Artif. Intell. 2014; pp. 1355–1361. [Google Scholar]
- Arora, S.; Risteski, A.; Zhang, Y. Do GANs learn the distribution? In Some theory and empirics. In Proceedings of the Int. Conf. Learn. Represent. 2018; pp. 1–16. [Google Scholar]
- Moosavi-Dezfooli, S.M.; Fawzi, A.; Fawzi, O.; Frossard, P. Universal adversarial perturbations. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2017; pp. 1765–1773. [Google Scholar]
- Chae, D.K.; Kang, J.S.; Kim, S.W.; Choi, J. Rating augmentation with generative adversarial networks towards accurate collaborative filtering. In Proceedings of the WWW; 2019; pp. 2616–2622. [Google Scholar]
- Cao, J.; Hu, Y.; Zhang, H.; He, R.; Sun, Z. Learning a high fidelity pose invariant model for high-resolution face frontalization. In Proceedings of the Neural Inf. Process. Syst. 2018; pp. 2867–2877. [Google Scholar]
- Li, T.; Ogihara, M. Toward intelligent music information retrieval. IEEE Transactions on Multimedia 2006, 8, 564–574. [Google Scholar]
- Yang, J.; Kannan, A.; Batra, D.; Parikh, D. Lr-gan: Layered recursive generative adversarial networks for image generation. In Proceedings of the Int. Conf. Learn. Represent. 2017; pp. 1–21. [Google Scholar]
- Yuan, F.; He, X.; Karatzoglou, A.; Zhang, L. Parameter-efficient transfer from sequential behaviors for user modeling and recommendation. In Proceedings of the SIGIR; 2020; pp. 1469–1478. [Google Scholar]
- Nutt, C.L.; Mani, D.R.; Betensky, R.A.; Tamayo, P.; Cairncross, J.G.; Ladd, C.; Pohl, U.; Hartmann, C.; McLaughlin, M.E.; Batchelor, T.T.; et al. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research 2003, 63, 1602–1607. [Google Scholar] [PubMed]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the Neural Inf. Process. Syst. 2016; pp. 2172–2180. [Google Scholar]
- Lu, C.; Tang, J.; Lin, M.; Lin, L.; Yan, S.; Lin, Z. Correntropy Induced L2 Graph for Robust Subspace Clustering. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2013. [Google Scholar]
- Iyer, G.; Krishna Murthy, J.; Gupta, G.; Krishna, M.; Paull, L. Geometric consistency for self-supervised end-to-end visual odometry. In Proceedings of the CVPR Workshops; 2018; pp. 267–275. [Google Scholar]
- Gui, J.; Sun, Z.; Ji, S.; Tao, D.; Tan, T. Feature Selection Based on Structured Sparsity: A Comprehensive Study. IEEE Transactions on Neural Networks and Learning Systems 2017, 28, 1490–1507. [Google Scholar] [CrossRef] [PubMed]
- Yan, S.; Wang, H. Semi-supervised learning by sparse representation. In Proceedings of the SIAM International Conference on Data Mining; 2009; pp. 792–801. [Google Scholar]
- An, Y.; Xue, H.; Zhao, X.; Zhang, L. Conditional Self-Supervised Learning for Few-Shot Classification. In Proceedings of the Int. Joint Conf. Artif. Intell. 2021; pp. 2140–2146. [Google Scholar]
- Wang, X.; Liu, N.; Han, H.; Shi, C. Self-supervised heterogeneous graph neural network with co-contrastive learning. In Proceedings of the Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining; 2021; pp. 1726–1736. [Google Scholar]
- Wang, X.; Yu, K.; Dong, C.; Change Loy, C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2018; pp. 606–615. [Google Scholar]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2018; pp. 8798–8807. [Google Scholar]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023. [Google Scholar]
- Pascual, S.; Bonafonte, A.; Serra, J. SEGAN: Speech enhancement generative adversarial network. In Proceedings of the Interspeech; 2017; pp. 3642–3646. [Google Scholar]
- Lester, B.; Al-Rfou, R.; Constant, N. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021. [Google Scholar]
- Wang, X.; Chen, W.; Wang, Y.F.; Wang, W.Y. No metrics are perfect: Adversarial reward learning for visual storytelling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics; 2018; pp. 1–15. [Google Scholar]
- Song, Y.; Ma, C.; Wu, X.; Gong, L.; Bao, L.; Zuo, W.; Shen, C.; Lau, R.W.; Yang, M.H. Vital: Visual tracking via adversarial learning. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2018; pp. 8990–8999. [Google Scholar]
- Alemi, A.A.; Fischer, I. GILBO: one metric to measure them all. In Proceedings of the Neural Inf. Process. Syst. 2018; pp. 7037–7046. [Google Scholar]
- Zhang, Z.Y.; Li, T.; Ding, C. Non-negative tri-factor tensor decomposition with applications. Knowledge and information systems 2013, 34, 243–265. [Google Scholar] [CrossRef]
- Chen, Z.; Ye, X.; Du, L.; Yang, W.; Huang, L.; Tan, X.; Shi, Z.; Shen, F.; Ding, E. AggNet for Self-supervised Monocular Depth Estimation: Go An Aggressive Step Furthe. In Proceedings of the ACM Int. Conf. Multimedia; 2021; pp. 1526–1534. [Google Scholar]
- Cai, D.; He, X.; Han, J. Semi-supervised discriminant analysis. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2007; pp. 1–7. [Google Scholar]
- Athalye, A.; Carlini, N.; Wagner, D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the Int. Conf. Mach. Learn. 2018; pp. 274–283. [Google Scholar]
- Hu, H.; Cui, J.; Wang, L. Region-Aware Contrastive Learning for Semantic Segmentation. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2021; pp. 16291–16301. [Google Scholar]
- Gui, J.; Wang, C.; Zhu, L. Locality preserving discriminant projections. In Proceedings of the International Conference on Intelligent Computing; 2009; pp. 566–572. [Google Scholar]
- Yang, M.; Liao, M.; Lu, P.; Wang, J.; Zhu, S.; Luo, H.; Tian, Q.; Bai, X. Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition. arXiv preprint arXiv:2207.00193, 2022. [Google Scholar]
- Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. Jul. 1997; 19, 711–720. [Google Scholar]
- Uesaka, T.; Morino, K.; Sugiura, H.; Kiwaki, T.; Murata, H.; Asaoka, R.; Yamanishi, K. Multi-view Learning over Retinal Thickness and Visual Sensitivity on Glaucomatous Eyes. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2017; pp. 2041–2050. [Google Scholar]
- Boureau, Y.; Le Roux, N.; Bach, F.; Ponce, J.; LeCun, Y. Ask the locals: multi-way local pooling for image recognition. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2011; pp. 2651–2658. [Google Scholar]
- Tenenbaum, J.; De Silva, V.; Langford, J. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Chang, K.; Hui, S.C. Supervised term weighting centroid-based classifiers for text categorization. Knowledge and information systems 2013, 35, 61–85. [Google Scholar] [CrossRef]
- Wang, X.; Gupta, A. Generative image modeling using style and structure adversarial networks. In Proceedings of the Eur. Conf. Comput. Vis. 2016; pp. 318–335. [Google Scholar]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. Journal of Machine Learning Research 2003, 3, 1157–1182. [Google Scholar]
- You, Z.H.; Lei, Y.K.; Gui, J.; Huang, D.S.; Zhou, X. Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 2010, 26, 2744–2751. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Wei, W.; Huang, C.; Xia, L.; Zhang, C. Multi-Modal Self-Supervised Learning for Recommendation. In Proceedings of the WWW; 2023; pp. 790–800. [Google Scholar]
- Hegde, C.; Indyk, P.; Schmidt, L. A Nearly-Linear Time Framework for Graph-Structured Sparsity. In Proceedings of the Int. Conf. Mach. Learn. 2015; pp. 928–937. [Google Scholar]
- Hu, R.X.; Jia, W.; Zhang, D.; Gui, J.; Song, L.T. Hand shape recognition based on coherent distance shape contexts. Pattern Recognition 2012, 45, 3348–3359. [Google Scholar] [CrossRef]
- Amodio, M.; Krishnaswamy, S. TraVeLGAN: Image-to-image Translation by Transformation Vector Learning. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2019; pp. 8983–8992. [Google Scholar]
- Lin, G.; Gao, C.; Li, Y.; Zheng, Y.; Li, Z.; Jin, D.; Li, Y. Dual contrastive network for sequential recommendation. In Proceedings of the SIGIR; 2022; pp. 2686–2691. [Google Scholar]
- Liu, Z.; Ma, Y.; Schubert, M.; Ouyang, Y.; Xiong, Z. Multi-Modal Contrastive Pre-training for Recommendation. In Proceedings of the ICMR; 2022; pp. 99–108. [Google Scholar]
- Larsen, A.B.L.; Sønderby, S.K.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of the Int. Conf. Mach. Learn. 2016; pp. 1558–1566. [Google Scholar]
- Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005, 67, 91–108. [Google Scholar] [CrossRef]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from rgbd images. In Proceedings of the Eur. Conf. Comput. Vis. Springer; 2012; pp. 746–760. [Google Scholar]
- Qin, X.; Yuan, H.; Zhao, P.; Liu, G.; Zhuang, F.; Sheng, V.S. Intent Contrastive Learning with Cross Subsequences for Sequential Recommendation. In Proceedings of the WSDM; 2024; pp. 548–556. [Google Scholar]
- Zhu, F.; Zhu, Y.; Chang, X.; Liang, X. Vision-language navigation with self-supervised auxiliary reasoning tasks. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2020; pp. 10012–10022. [Google Scholar]
- Li, W.J.; Wang, S.; Kang, W.C. Feature learning based deep supervised hashing with pairwise labels 2016. pp. 2016, 1711–1717. [Google Scholar]
- Petzka, H.; Fischer, A.; Lukovnicov, D. On the regularization of Wasserstein GANs. In Proceedings of the Int. Conf. Learn. Represent. 2018; pp. 1–24. [Google Scholar]
- Shetty, R.; Rohrbach, M.; Anne Hendricks, L.; Fritz, M.; Schiele, B. Speaking the same language: Matching machine to human captions by adversarial training. In Proceedings of the IEEE Int. Conf. Comput. Vis. 2017; pp. 4135–4144. [Google Scholar]
- Yu, J.; Gao, M.; Yin, H.; Li, J.; Gao, C.; Wang, Q. Generating reliable friends via adversarial training to improve social recommendation. In Proceedings of the ICDM. IEEE; 2019; pp. 768–777. [Google Scholar]
- Qiao, T.; Zhang, J.; Xu, D.; Tao, D. MirrorGAN: Learning Text-to-image Generation by Redescription. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2019; pp. 1505–1514. [Google Scholar]
- Gui, J.; Liu, T.; Sun, Z.; Tao, D.; Tan, T. Fast supervised discrete hashing. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 490–496. [Google Scholar] [CrossRef] [PubMed]
- Berman, D.; Avidan, S.; Avidan, S. Non-local image dehazing. In Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit. 2016; pp. 1674–1682. [Google Scholar]
- Zniyed, Y.; Nguyen, T.P.; et al. Efficient tensor decomposition-based filter pruning. Neural Networks 2024, 178, 106393. [Google Scholar]
| Method | Key Idea | Strengths and Weaknesses |
|---|---|---|
| Contrastive Learning | Instance discrimination | Highly effective, but computationally expensive |
| Clustering-Based | Iterative clustering refinement | Captures high-level semantics, but sensitive to hyperparameters |
| Generative Approaches | Data reconstruction/generation | Produces detailed representations, but costly to train |
| Predictive Learning | Predict missing data | Flexible and effective, but task design is crucial |
| Domain | Key Applications |
|---|---|
| Computer Vision | Classification, object detection, segmentation |
| NLP | Language modeling, chatbots, search |
| Speech | ASR, speaker identification, audio classification |
| Robotics | Perception, navigation, manipulation |
| Healthcare | Medical imaging, drug discovery, genomics |
| Finance | Fraud detection, stock market prediction |
| Challenges | Future Directions |
|---|---|
| Pretext task design | Hybrid and domain-agnostic SSL tasks |
| High computational cost | Efficient self-distillation and model compression |
| Evaluation difficulties | Standardized SSL benchmarks and metrics |
| Domain generalization issues | Transfer learning and domain adaptation methods |
| Bias and robustness concerns | Fairness-aware and noise-resistant SSL |
| Theoretical limitations | Mathematical frameworks for SSL understanding |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).