Study on Temperature ( τ ) Variation for SimCLR based Activity Recognition

Human Activity Recognition (HAR) is a pro-cess to automatically detect human activities based on stream data generated from various sensors, including inertial sensors, physiological sensors, location sensors, cameras, time, and many others. Unsupervised contrastive learning has been excellent, while the contrastive loss mechanism is less studied. In this paper, we provide a temperature ( τ ) variance study affecting the loss of SimCLR model and ultimately full HAR evaluation results. We focus on understanding the implications of unsupervised contrastive loss in context of HAR data. In this work, also regulation of the temperature( τ ) coefficient is incorporated for improving the HAR feature qualities and overall performance for downstream tasks in healthcare setting. Performance boost of 1.3% is ob-served in experimentation.


Introduction
The purpose of human activity recognition (HAR), which consists of observations and analysis of human behaviour and its environment, is to determine the current behaviour and goals of the human body. HAR research has gained attention by its advantages in smart surveillance systems, healthcare systems, connections between virtual reality, smart homes, aberrant behaviour detection and other areas and its capacity to support and connect with unique disciplines. One of the most widelydiscussed research areas is HAR [1] among academics from both academia and industry whose aim is the progress of all-round computing and human computer interaction. The advances in deep learning have made the field a key component of the most smart systems. In the majority of computer vision tasks such as image classification, object detection, image separation and activity recognition, and natural languages processing (NLP). Due to the intensive work required by manually notifying millions of data samples, supervised strategy to learn features from labelled data has nearly been saturated. Though a plethora of information is available, researchers have been urged to find alternative ways of making use of it by lack of annotations. Unsupervised learning makes it possible for us to learn feature representations without the supervision of human beings.Contrastive learning has reached a state of the arts in a variety of tasks, which was recently proposed as an unsupervised study [2][3][4][5][6][7]. The main difference from other techniques is that the data transformation and contrastive loss strategy used. In short, most contrastive learning methods construct first a series of augmentated data to build positive and negative pairs on an instance level. Similiarity between positive pairs could then be maximised by different con-cently, SimCLR was incorporated for healthcare and HAR in particular for the first time [12]. Motivated from the work done by Tang et al., in this paper, we lay out a module that would deemed to be beneficial for HAR systems and other healthcare-related applications. Main contribution of this work is summarized below: -We provide a study for understanding the behaviour of contrastive learning (emphasising on temperature coefficient, τ ) in sensor data context for human activity recognition. -We optimise the SimCLR module by regulating the temperature coefficient in order to enhance the quality of features for downstream tasks.
CLR [3] reduce the gap between unsupervised and supervised pre-training representations in linear classification performance.

Contrastive Learning
The purpose of contrastive methodology is to comprehend a function that maps the input data features to the features on a hypersphere dimension. Wang et al. depicted that the contrastive loss for a given unlabelled training sample set X = {x1, . . . , xN } is given as follows [11]: -Improved performance for overall model [12].

Related Work
Many studies have examined the identification of human activities from diverse points of view. These include: by specialised approach [13]; by algorithm type [14], by sensor type [1,15,16]; by fuse type [17] or by device type [18], although other analyses have been carried out more generally by the HAR categories [19,20]. HAR accomplished five primary tasks, namely the recognition of the fundamental activities [21], the recognition of everyday activities [22], uncommon events [23], biometric subjects [24], and energy expenditure predictions [25]. Different sensors such as video cameras, ambient temperature sensors, relative humidity, light, pressure and wearable sensors are used. The major forms of wearable sensors are generally integrated smartphone sensors or sensors incorporated into wearable devices. Dong and Biawas [26] introduced a wearable sensor network designed to monitor human activity. In a similar study, Curone et al. have used wearable triaxial accelerometers to monitor activity [27]. Progress in deep learning has made the field a central part of the smart systems. The ability to learn rich patterns from today's vast amount of data makes the use of deep neural networks (DNNs) an important approach in HAR. The amount of annotated training data available is very reliant on traditional supervised learning approaches. Self-supervised learning methods have recently integrated both generative [28] and contrastive [3] approaches that have been able to use unlabelled data to understand the underlying representations. In recent studies [29,7,[30][31][32][33][34] on unsupervised feature representation for images, concept known as contrastive learning was incorporated [5]. Contrastive learning (CL) is a discriminatory approach that aims to group similar samples closer and far away. The results after application of contrastive learning are astounding: for example, Sim- is an extractor that maps pixel space corresponding images onto space in a hypersphere. g(·) could serve the same purpose as of f [3]. τ is a temperature hyper-parameter that helps in distinguishing positive and negative samples. The contrastive loss attempts to attract positive key samples and separate the negative key samples. This goal can also be achieved with a simpler contrastive loss function as shown below [11]: i/ =j The goal of contrastive learning is to learn augmented data alignment and discriminatory embedding. The contrastive loss does not restrict negative sample distribution. The temperature contributes to the control of penalty strength on hard negative samples. Specifically, small contrastive losses tend to penalise much more the most severe negative samples in the form of a more separate local structure in a sample and a more uniform embedding distribution [11].

Methodology
SimCLR [3] architecture consists of these primary modules.
-A data incrementation module that randomly transforms a given example of data leading to two correlated views on the same example.

Dataset
MotionSense [36] was used in our assessment as a publicly available dataset. This dataset comprises data from 24 individuals who carried an iPhone 6s in the front pocket of their pants and perform 6 different activities: walking downstairs, upstairs, walking, jogging, sitting and standing. In this study 6630 windows, each 400 timestamping and 50 percent overlap, were used for data from a 50% tri-axial accelerometer.

Experimental Setup
Linear and fine tune evaluation was administered on NVIDIA TESLA V100 SXM2. During pre-entraining for 200 epochs and batch size 512 the SGD optimizer with a cosine decay of learning rate is used. TPN [37] is incorporated as a base encoder for the HAR systems to suit the needs of a comparatively lightweight neural network architecture. The projection head was utilised as a three-layer, fully connected MLP with a loss function of NT-Xent. The base decoder is composed of three temporal (1D) layers, each with 24, 16, 8 and 32, 64 and 96 kernel sizes. During preparation the projection head is composed of 3 fully connected layers with 256, 128 and 50 units, and the grading head is composed of two fully connected layers of 1024 and 6 units in the fine-tunated evaluation. A 0.1 drop-off rate is used to activate the ReLU function. At the end there is an additional global maximum pooling layer. The model is trained at the SGD optimizer for linear assessment for 50 epochs and a learning rate of 0,03. The model is perfectly tuned with Adam optimizer and a study rate of 0.001 for 50 epochs for a finely tuned assessment.

Quantitative Results
In this section we conduct extensive experimentation on the temperature coefficient, in order to understand the modeling relationship of the proposed network using activity prediction precision as the assessment metric. The effect of the temperature is assessed. In the first place, we try to determine whether the temperature precisely checks the severity of the penalties in severe negative samples. Numerical results are tabulated in Table 2.
-When the temperature is 0.2 or 0.3, the model achieve the best results. Small or large temperature model achieve inadequate performance. -The current model shows a 1.3% increase in performance than the previous [12].

Qualitative Results
If the loss value is extremely minimal, the contrastive loss function will inflict substantial penalties on closest neighbours. Semantically similar instances of data will very likely be distributed with the anchor point.
Considering the depictions in T-SNE plots in Figure 2, we follow that embedding with τ = 0.07 is distributed better and evenly, although the embedding with τ = 1 is more reasonable and locally clustered and globally separated.
-With the τ decreasing, there is a larger gap from positive samples to other misleading negatives, i.e. more distinguishable positive and negative samples. -Indeed, as shown in Figure 2, small temperatures tend to increase the impact of the hard negative samples. -Results demonstrate that the positive samples are more aligned with the increased temperature and that the model tends to develop more invariant features with regard to the different transformations applied to sensor data.

Comparative Study with Baseline Models
In this section, we compare our best model with the most advanced methods. A linear and finally defined evaluation was conducted using the MotionSense dataset to evaluate the impact of using different temperature(τ ) variances for SimCLR pre-training. Results are shown in Table 1. F1 scores are taken directly from work already carried out by Tang et al. for supervised and selfsupervised models.

Conclusion
In this work, we have studied one of the most important tasks in digital health applications i.e., HAR and the SimCLR contrastive learning framework from visual representation learning. We have examined the effect of temperature(τ ) changes on contrastive loss in connection with sensor data to improve the feature quality and performance for downstream tasks.