ARTICLE | doi:10.20944/preprints202207.0070.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: scene recognition; object detection; scene classification; TF-IDF
Online: 5 July 2022 (08:38:17 CEST)
Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision (YOLO) and Term Frequency-Inverse Document Frequency (TF-IDF). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on.
ARTICLE | doi:10.20944/preprints202201.0431.v1
Subject: Earth Sciences, Geoinformatics Keywords: cell phone indoor positioning; scene recognition; building map; map location anchor; YOLOv5; geocoding matching
Online: 28 January 2022 (08:55:08 CET)
At present, indoor localization is one of the core technologies of location-based services (LBS), and there exist numerous scenario-oriented application solutions. Visual features, as the main semantic information to help people understand the environment and thus occupy the dominant part, many techniques about indoor scene recognition are widely adopted. However, the engineering application problem of cell phone indoor scene recognition and localization has not been well solved due to insufficient semantic constraint information of building map and the immaturity of building map location anchors (MLA) matching positioning technology. To address the above problems, this paper proposes a cell phone indoor scene recognition and localization method with building map semantic constraints. Firstly, we build a library of geocoded entities for building map location anchors (MLA), which can provide users with "immersive" real-world building maps on the one hand and semantic anchor point constraints for cell phone positioning on the other. Secondly, using the improved YOLOv5s deep learning model carried on the mobile terminal, we recognize the universal map location anchors (MLA) elements in building scenes by cell phone camera video in real-time. Lastly, the spatial location of the scene elements obtained from the cell phone video recognition is matched with the building MLA to achieve real-time positioning and navigation. The experimental results show that the model recognition accuracy of this method is above 97.2%, and the maximum localization error is within the range of 0.775 m, and minimized to 0.5 m after applying the BIMPN road network walking node constraint, which can effectively achieve high positioning accuracy in the building scenes with rich MLA element information. In addition, the building map location anchors (MLA) has universal characteristics, and the positioning algorithm based on scene element recognition is compatible with the extension of indoor map data types, so this method has good prospects for engineering applications.
ARTICLE | doi:10.20944/preprints201909.0088.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: active distribution network; distributed generation; multi-scene analysis; Scene reduction; improved clustering algorithm; bi-level programming; comprehensive security index
Online: 8 September 2019 (16:28:28 CEST)
In recent years, distributed generation technology has developed rapidly. Renewable energy, represented by wind energy and solar energy, has been widely studied and utilized. In order to give full play to the advantages of Distributed Generation (DG) and meet the challenges after power grid access, Active Distribution Network (ADN) is considered as the future development direction of traditional distribution network because of its ability of active management. Nowadays, multi-scenario analysis is widely used in the research of optimal allocation of distributed power supply in active distribution network. Aiming at the problems that may arise when using multi-scenario analysis to plan DG with uncertainties in large-scale scenarios, a scenario reduction method based on improved clustering algorithm is proposed. The validity of the scene reduction method is tested, and the feasibility of the method is verified. At present, there are few studies on the optimal allocation of DG in ADN under fault state. In this paper, comprehensive safety indicators are introduced. Considering the timing characteristics of DG and the influence of active management mode, a bi-level programming model is established, which aims at minimizing the investment of annual life cycle and the removal of active power. The bi-level model is a complex mixed integer non-linear programming model. A hybrid algorithm combining cuckoo search algorithm and primal dual interior point method is used to solve the model. Finally, through the simulation of the IEEE-33 node system, the superiority of the scenario reduction method and the comprehensive security index used in this paper to optimize the configuration of DG in ADN is verified.
ARTICLE | doi:10.20944/preprints202008.0113.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Scene classification; Deep Learning; Convolutional Neural Networks; Feature learning
Online: 5 August 2020 (06:19:27 CEST)
State-of-the-art remote sensing scene classification methods employ different Convolutional Neural Network architectures for achieving very high classification performance. A trait shared by the majority of these methods is that the class associated with each example is ascertained by examining the activations of the last fully connected layer, and the networks are trained to minimize the cross-entropy between predictions extracted from this layer and ground-truth annotations. In this work, we extend this paradigm by introducing an additional output branch which maps the inputs to low dimensional representations, effectively extracting additional feature representations of the inputs. The proposed model imposes additional distance constrains on these representations with respect to identified class representatives, in addition to the traditional categorical cross-entropy between predictions and ground-truth. By extending the typical cross-entropy loss function with a distance learning function, our proposed approach achieves significant gains across a wide set of benchmark datasets in terms of classification, while providing additional evidence related to class membership and classification confidence.
ARTICLE | doi:10.20944/preprints202004.0434.v1
Subject: Life Sciences, Other Keywords: higher education; pedagogy; forensic science; VR; learning technologies; crime scene
Online: 24 April 2020 (10:13:58 CEST)
Simulated crime scene investigation is an essential component of forensic science education, but its implementation poses challenges relating to cost, accessibility and breadth of experience. Virtual reality (VR) is an emerging technology which offers exciting prospects for teaching and learning, especially for imparting practical skills. We document here a multidisciplinary experimental study in which a bespoke VR crime scene app was designed and implemented, after which it was tested by both undergraduate student and staff/postgraduate student cohorts. Through both qualitative and quantitative analyses, we demonstrate that VR applications support learning of practical crime scene processing skills. VR-based practical sessions have the potential to add value to forensic science courses through offering cost-effective practical experience and the ability to work in isolation, in a variety of different scenarios. Both user groups reported high levels of satisfaction with the process and reports of adverse effects (motion sickness) were minimal. With reference to user feedback, we proceed to evaluate the scalability and development challenges associated with large-scale implementation of VR as an adjunct to forensic science education.
Subject: Earth Sciences, Geoinformatics Keywords: indoor scene recognition; unsupervised representation learning; Siamese network; graph constraints
Online: 19 March 2019 (13:11:09 CET)
Indoor scene recognition has great significance for intelligent applications such as mobile robots, location-based services (LBS) and so on. Wherever we are or whatever we do, we are under a specific scene. The human brain can easily discern a scene with a quick glance. However, for a machine to achieve this purpose, on one hand, it often requires plenty of well-annotated data which is time-consuming and labor-intensive. On the other hand, it is hard to learn effective visual representations due to large intra-category variation and inter-categories similarity of indoor scenes. To solve these problems, in this paper, we adopted an unsupervised visual representation learning method which can learn from unlabeled data with a Siamese Convolutional Neural Network (Siamese ConvNet) and graph-based constraints. Specifically, we first mined relationships between unlabeled samples with a graph structure. And then, these relationships can be used as supervision for representation learning with a Siamese network. In this method, firstly, a k-NN graph would be constructed by taking each image as a node in the graph and its k nearest neighbors are linked to form the edges. Then, with this graph, cycle consistency and geodesic distance would be considered as criteria for positive and negative pairs mining respectively. In other words, by detecting cycles in the graph, images with large differences but in the same cycle can be considered as same category (positive pairs). By computing geodesic distance instead of Euclidean distance from one node to another, two nodes with large geodesic distance can be regarded as in different categories (negative pairs). After that, visual representations of indoor scenes can be learned by a Siamese network in an unsupervised manner with the mined pairs as inputs. In order to evaluate the proposed method, we tested it on two scene-centric datasets, MIT67 and Places365. Experiments with different number of categories have been conducted to excavate the potential of proposed method. The results demonstrated that semantic visual representations for indoor scenes can be learned in this unsupervised manner. In addition, with the learned visual representations, indoor scene recognition models trained with the learned representations and a few of labeled samples can achieve competitive performance compared to the state-of-the-art approaches.
ARTICLE | doi:10.20944/preprints202111.0109.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: background reconstruction; background initialization; background generation; motion detection; background subtraction; scene parsing
Online: 5 November 2021 (09:34:37 CET)
The goal of background reconstruction is to recover the background image of a scene from a sequence of frames showing this scene cluttered by various moving objects. This task is fundamental in image analysis, and is generally the first step before more advanced processing, but difficult because there is no formal definition of what should be considered as background or foreground and the results may be severely impacted by various challenges such as illumination changes, intermittent object motions, highly cluttered scenes, etc. We propose in this paper a new iterative algorithm for background reconstruction, where the current estimate of the background is used to guess which image pixels are background pixels and a new background estimation is performed using those pixels only. We then show that the proposed algorithm, which uses stochastic gradient descent for improved regularization, is more accurate than the state of the art on the challenging SBMnet dataset, especially for short videos with low frame rates, and is also fast, reaching an average of 52 fps on this dataset when parameterized for maximal accuracy using GPU acceleration and a Python implementation.
Subject: Mathematics & Computer Science, Other Keywords: aerial scene classification; remote-sensing image classification; few-shot learning; meta-learning
Online: 15 December 2020 (13:21:49 CET)
CNN-based methods have dominated the field of aerial scene classification for the past few years. While achieving remarkable success, CNN-based methods suffer from excessive parameters and notoriously rely on large amounts of training data. In this work, we introduce few-shot learning to the aerial scene classification problem. Few-shot learning aims to learn a model on base-set that can quickly adapt to unseen categories in novel-set, using only a few labeled samples. To this end, we proposed a meta-learning method for few-shot classification of aerial scene images. First, we train a feature extractor on all base categories to learn a representation of inputs. Then in the meta-training stage, the classifier is optimized in the metric space by cosine distance with a learnable scale parameter. At last, in the meta-testing stage, the query sample in the unseen category is predicted by the adapted classifier given a few support samples. We conduct extensive experiments on two challenging datasets: NWPU-RESISC45 and RSD46-WHU. The experimental results show that our method yields state-of-the-art performance. Furthermore, several ablation experiments are conducted to investigate the effects of dataset scale, the impact of different metrics and the number of support shots; the experiment results confirm that our model is specifically effective in few-shot settings.
ARTICLE | doi:10.20944/preprints201705.0214.v1
Subject: Earth Sciences, Geoinformatics Keywords: multi-spectral analysis; remote sensing images; sparse coding; generalized aggregation; scene recognition
Online: 30 May 2017 (08:54:08 CEST)
Satellite scene classification is challenging because of the high variability inherent in satellite data. Although rapid progress in remote sensing techniques has been witnessed in recent years, the resolution of the available satellite images remains limited compared with the general images acquired using a common camera. On the other hand, a satellite image usually has a greater number of spectral bands than a general image, thereby permitting the multi-spectral analysis of different land materials and promoting low-resolution satellite scene recognition. This study advocates multi-spectral analysis and explores the middle-level statistics of spectral information for satellite scene representation instead of using spatial analysis. This approach is widely utilized in general image and natural scene classification and achieved promising recognition performance for different applications. The proposed multi-spectral analysis firstly learns the multi-spectral prototypes (codebook) for representing any pixel-wise spectral data, and then based on the learned codebook, a sparse coded spectral vector can be obtained with machine learning techniques. Furthermore, in order to combine the set of coded spectral vectors in a satellite scene image, we propose a hybrid aggregation (pooling) approach, instead of conventional averaging and max pooling, which includes the benefits of the two existing methods but avoids extremely noisy coded values. Experiments on three satellite datasets validated that the performance of our proposed approach is much more accurate than even the deep learning framework for spatial analysis.
ARTICLE | doi:10.20944/preprints201611.0036.v1
Subject: Earth Sciences, Geoinformatics Keywords: multi-task learning; feature fusion; sparse representation; low-rank representation; scene classification
Online: 7 November 2016 (05:25:11 CET)
Scene classification plays an important role in the intelligent processing of high-resolution satellite (HRS) remotely sensed image. In HRS image classification, multiple features, e.g. shape, color, and texture features, are employed to represent scenes from different perspectives. Accordingly, effective integration of multiple features always results in better performance compared to methods based on a single feature in the interpretation of HRS image. In this paper, we introduce a multi-task joint sparse and low-rank representation model to combine the strength of multiple features for HRS image interpretation. Specifically, a multi-task learning formulation is applied to simultaneously consider sparse and low-rank structure across multiple tasks. The proposed model is optimized as a non-smooth convex optimization problem using an accelerated proximal gradient method. Experiments on two public scene classification datasets demonstrate that the proposed method achieves remarkable performance and improves upon the state-of-art methods in respective applications.
REVIEW | doi:10.20944/preprints201804.0072.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: ANN; biometric; crime-scene; fuzzy logic; gait; human footprint; Hidden Markov Model; PCA; Recognition
Online: 6 April 2018 (08:54:28 CEST)
Human footprint is having a unique set of ridges unmatched by any other human being, and therefore it can be used in different identity documents for example birth certificate, Indian biometric identification system AADHAR card, driving license, PAN card, and passport. There are many instances of the crime scene where an accused must walk around and left the footwear impressions as well as barefoot prints and therefore it is very crucial to recovering the footprints to identify the criminals. Footprint-based biometric is a considerably newer technique for personal identification. Fingerprints, retina, iris and face recognition are the methods most useful for attendance record of the person. This time world is facing the problem of global terrorism. It is challenging to identify the terrorist because they are living as regular as the citizens do. Their soft target includes the industries of special interests such as defense, silicon and nanotechnology chip manufacturing units, pharmacy sectors. They pretend themselves as religious persons, so temples and other holy places, even in markets is in their targets. These are the places where one can obtain their footprints easily. The gait itself is sufficient to predict the behaviour of the suspects. The present research is driven to identify the usefulness of footprint and gait as an alternative to personal identification.
ARTICLE | doi:10.20944/preprints202108.0389.v1
Subject: Mathematics & Computer Science, Other Keywords: remote-sensing classification; scene classification; few-shot learning; meta-learning; vision transformers; multi-scale feature fusion
Online: 18 August 2021 (14:29:29 CEST)
The central goal of few-shot scene classification is to learn a model that can generalize well to a novel scene category (UNSEEN) from only one or a few labeled examples. Recent works in the remote sensing (RS) community tackle this challenge by developing algorithms in a meta-learning manner. However, most prior approaches have either focused on rapidly optimizing a meta-learner or aimed at finding good similarity metrics while overlooking the embedding power. Here we propose a novel Task-Adaptive Embedding Learning (TAEL) framework that complements the existing methods by giving full play to feature embedding’s dual roles in few-shot scene classification - representing images and constructing classifiers in the embedding space. First, we design a lightweight network that enriches the diversity and expressive capacity of embeddings by dynamically fusing information from multiple kernels. Second, we present a task-adaptive strategy that helps to generate more discriminative representations by transforming the universal embeddings into task-specific embeddings via a self-attention mechanism. We evaluate our model in the standard few-shot learning setting on two challenging datasets: NWPU-RESISC4 and RSD46-WHU. Experimental results demonstrate that, on all tasks, our method achieves state-of-the-art performance by a significant margin.
ARTICLE | doi:10.20944/preprints201801.0235.v1
Subject: Engineering, Civil Engineering Keywords: infrastructure inspection; computer vision; structure from motion; dam inspection; 3D scene reconstruction; aerial robots; remote sensing; structural health monitoring; unmanned aerial vehicles
Online: 25 January 2018 (05:00:51 CET)
Dams are a critical infrastructure system for many communities, but they are also one of the most challenging to inspect. Dams are typically very large and complex structures, and the result is that inspections are often time-intensive and require expensive, specialized equipment and training to provide inspectors with comprehensive access to the structure. The scale and nature of dam inspections also introduces additional safety risks to the inspectors. Unmanned aerial vehicles (UAV) have the potential to address many of these challenges, particularly when used as a data acquisition platform for photogrammetric three-dimensional (3D) reconstruction and analysis, though the nature of both UAV and modern photogrammetric methods necessitates careful planning and coordination for integration. This paper presents a case study on one such integration at the Brighton Dam, a large-scale concrete gravity dam in Maryland, USA. A combination of multiple UAV platforms and multi-scale photogrammetry was used to create two comprehensive and high-resolution 3D point clouds of the dam and surrounding environment at intervals. These models were then assessed for their overall quality, as well as their ability to resolve flaws and defects that were artificially applied to the structure between inspection intervals. The results indicate that the integrated process is capable of generating models that accurately render a variety of defect types with sub-millimeter accuracy. Recommendations for mission planning and imaging specifications are provided as well.