Version 1
: Received: 18 March 2019 / Approved: 19 March 2019 / Online: 19 March 2019 (13:11:09 CET)
How to cite:
Liu, M.; Chen, R.; Ai, H.; Chen, Y.; Li, D. Unsupervised Visual Representation Learning for Indoor Scenes with a Siamese ConvNet and Graph Constraints. Preprints2019, 2019030189
Liu, M.; Chen, R.; Ai, H.; Chen, Y.; Li, D. Unsupervised Visual Representation Learning for Indoor Scenes with a Siamese ConvNet and Graph Constraints. Preprints 2019, 2019030189
Liu, M.; Chen, R.; Ai, H.; Chen, Y.; Li, D. Unsupervised Visual Representation Learning for Indoor Scenes with a Siamese ConvNet and Graph Constraints. Preprints2019, 2019030189
APA Style
Liu, M., Chen, R., Ai, H., Chen, Y., & Li, D. (2019). Unsupervised Visual Representation Learning for Indoor Scenes with a Siamese ConvNet and Graph Constraints. Preprints. https://doi.org/
Chicago/Turabian Style
Liu, M., Yujin Chen and Deren Li. 2019 "Unsupervised Visual Representation Learning for Indoor Scenes with a Siamese ConvNet and Graph Constraints" Preprints. https://doi.org/
Abstract
Indoor scene recognition has great significance for intelligent applications such as mobile robots, location-based services (LBS) and so on. Wherever we are or whatever we do, we are under a specific scene. The human brain can easily discern a scene with a quick glance. However, for a machine to achieve this purpose, on one hand, it often requires plenty of well-annotated data which is time-consuming and labor-intensive. On the other hand, it is hard to learn effective visual representations due to large intra-category variation and inter-categories similarity of indoor scenes. To solve these problems, in this paper, we adopted an unsupervised visual representation learning method which can learn from unlabeled data with a Siamese Convolutional Neural Network (Siamese ConvNet) and graph-based constraints. Specifically, we first mined relationships between unlabeled samples with a graph structure. And then, these relationships can be used as supervision for representation learning with a Siamese network. In this method, firstly, a k-NN graph would be constructed by taking each image as a node in the graph and its k nearest neighbors are linked to form the edges. Then, with this graph, cycle consistency and geodesic distance would be considered as criteria for positive and negative pairs mining respectively. In other words, by detecting cycles in the graph, images with large differences but in the same cycle can be considered as same category (positive pairs). By computing geodesic distance instead of Euclidean distance from one node to another, two nodes with large geodesic distance can be regarded as in different categories (negative pairs). After that, visual representations of indoor scenes can be learned by a Siamese network in an unsupervised manner with the mined pairs as inputs. In order to evaluate the proposed method, we tested it on two scene-centric datasets, MIT67 and Places365. Experiments with different number of categories have been conducted to excavate the potential of proposed method. The results demonstrated that semantic visual representations for indoor scenes can be learned in this unsupervised manner. In addition, with the learned visual representations, indoor scene recognition models trained with the learned representations and a few of labeled samples can achieve competitive performance compared to the state-of-the-art approaches.
Keywords
indoor scene recognition; unsupervised representation learning; Siamese network; graph constraints
Subject
Computer Science and Mathematics, Robotics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.