1. Introduction
In this paper, we demonstrate the inherent ability of convolutional neural network encoders to read horses’ emotions [
1] without the requirement for labeled data. Using contrastive learning we identify well-known emotions of horses automatically, verifying it with manually labelled horse emotion images. Contrastive learning has revolutionized the domain of representation learning through a self-supervised approach that helps eliminate dependence on large labeled datasets. It consists of building up representations that maximize the similarity between positive pairs (semantically similar examples) and the dissimilarity between negative pairs or dissimilar examples so that learned embeddings include discriminative and invariant features. Methods such as SimCLR by Chen et al. [
2], MoCo by He et al. [
3], and BYOL by Grill et al. [
4] demonstrated the popularity of contrastive learning not only in image classification, object detection, and natural language processing but also in many other domains. These methods rely on data augmentations and mechanisms involving memories to gain deeper richness in feature representations.
The same method can be extended naturally to complex problems like emotion detection, where crucial subtle context-dependent cues must be captured. For instance, self-supervised models like SimSiam [
5] introduced by Chen & He in 2021 and more sophisticated frameworks like SupCon [
6] introduced by Khosla et al. in 2020 have been demonstrated to learn robust features even for unbalanced datasets. Contrastive loss-based frameworks, such as [
7], have distilled meaningful emotional features from raw audio data and significantly outperform traditional supervised approaches for emotion detection by speech. More recently, temporal contrastive learning has been found quite effective for video-based emotion recognition, where modeling spatiotemporal dependencies is critical for understanding affective states [
8]. These results demonstrate the flexibility of contrastive learning to capture not only static but also dynamic properties of features and make the method an excellent candidate for multi-modal emotion detection tasks. Furthermore, contrastive approaches have proven robustness to data scarcity, using augmentations and proxy tasks to generalize across diverse emotional expressions [
2,
4]. For example, for subtle affective states, the environment is noisy or not controlled; works have used temporal and contextual augmentations to enrich feature representations [
8]. Such techniques are applicable in real-world scenarios where labeled data for complex emotions might be limiting or biased. As a result, the potential of contrastive learning can go beyond detecting emotions by covering a promising, yet challenging area that deals with scalable and efficient multi-modal and context-dependent affect recognition.
These advances make contrastive learning particularly well-suited to the application of equine emotion recognition where expressive signals tend to be subtle and context-dependent. The manifestations of equine emotions include multi-modal cues in the form of facial expressions, body postures, and motion patterns, necessitating a learning framework that incorporates these complexities and interdependencies. Contrastive learning is presented as a promising solution, structuring the feature space in such a way as to illuminate distinctions between emotional states while generalizing across variabilities in expression. This paper attempts to bridge the gap in computational methodologies in equine emotion recognition by providing a pathway toward improved understanding and interaction between humans and animals by using both self-supervised and supervised variants. This also positions contrastive learning as a superior approach to traditional methods for equine emotion recognition, most of which are based on hand-crafted features that are not applicable for capturing the wide scope of emotional variability. The contrastive frameworks draw upon large-scale, unlabeled datasets to discover latent patterns in equine behavior that are hard to model otherwise. Multi-modal data thus used, including visual, motion-based, and contextual cues, provide richer contextual information that could classify equine affective states more holistically. In particular, the temporal contrastive method is capable of modeling sequential dependencies in equine behaviors, which are critical for the identification of emotional state transitions. Such capabilities enhance the robustness of the system while augmenting the interpretability of learned representations. It ultimately focuses on the potentials of contrastive learning as a transformative tool towards further advancements within the domain of animal affective computing with an aspect of somewhat improved welfare and interaction paradigms for human-to-horse relationships.
We have based our research on the seven primal emotions described by Jaak Panksepp. Jaak Panksepp was one of the most influential neuroscientists of our time, and in his research on affective consciousness [
9], he suggested that all mammalian species have seven emotional behaviors in common, owing to the fact that all mammals share similar neurological circuitry within their limbic systems. Our primary goal here was to demonstrate that a neural network trained without labels can still differentiate between the seven emotional behaviors. These seven emotions are ’seeking’, ’rage’, ’fear’, ’lust’, ’care’, ’grief’ and ’play’. In addition to these emotions, we have also included an additional emotional behavior ’pain’, which aims to identify the subtle cues that indicate that a horse is in pain. For this paper, we have created our own dataset, consisting of 3,929 unlabeled horse images. These images were captured by our own equipment and research team, across a variety of wild horse breeds located in different geographical locations. We have used a modified version of Momentum Contrast for Unsupervised Visual Representation Learning (MoCo)[
3] framework, developed by Facebook Research in 2019. We have implemented a custom built downstream classification network. Although there have been previous instances of using MoCo for detecting emotions in other mammals such as dogs, [
10], unlike dogs, horse emotions are not centralized to its face. A horse’s body posture serves as an indicator to a large proportion of its emotional display. In this paper we ascertained the effectiveness of previously unused image augmentation techniques and downstream network architectures, that optimize posture segmentation and provide an increased accuracy in comparison to traditionally used methods in contrastive learning.
The chief contributions of this paper are:
We construct a novel, high-quality, diverse and unskewed dataset of 3,929 images consisting of five wild horse breeds worldwide within different geographical locations. These horse breeds are ‘Eriskay Pony’, found in western Scotland, ‘Insh Konik Pony’, found in the RSPB Insh Marshes, ‘Pottoka’, found in Piornal Spain, ’Skyros Pony’ found on the island of Skyros in Greece and ’Konik’ found in the Wicken Fen nature reserve in England. We construct equally sized groups of these images for validation and testing, with labels based around seven Panskepp emotion labels “Exploring”, “Sadness”, “Playing”, “Rage”, “Fear”, “Affectionate” and “Lust”, along with one additional emotion ’Pain’.
We leverage the contrastive learning framework MoCo [
3] (Momentum Contrast for Unsupervised Visual Representation Learning) on our dataset to predict the seven Panksepp emotions and ’Pain’ using unsupervised learning. We significantly modify the MoCo framework to obtain the best possible results on our hardware. We put forth the use of previously unused image augmentation techniques and downstream classification networks that optimize posture segmentation in neural networks trained with contrastive learning.
We build a custom downstream classifier network, that connects with a frozen CNN encoder which is pretrained using the MoCo [
3] framework. This downstream network demonstrates excellent accuracy for its simple architecture and small number of weights.
Our method allows the encoder network to learn similarities and differences within image groups on it’s own without labels. The clusters thus formed are indicative of deeper nuances and complexities within a horse’s emotions, which can possibly hint towards existence of novel and complex equine emotions.
Our research makes significant contributions to the field of Animal–Computer Interaction (ACI) by introducing a novel method for recognizing horse emotions, fostering improved human–animal communication, adhering to the highest ethical standards and utilizing a unique contrastive learning approach. By better recognizing horse emotions, we help bridge the human–animal communication barrier. Additionally, we advance the field of applied machine learning by demonstrating the effectiveness of semi-supervised and self-supervised learning techniques for animal emotion recognition.
2. Behavorial Background
2.1. Emotions
According to Panksepp (2005), basic emotions in humans and animals can be categorized as Care, Seeking, Fear, Lust, Play, Rage, and Grief [
9]. With regards to practical application, all of these basic emotions can be considered spectrums with specific behaviors indicating emotional affect in different animals and ranges of behaviors occurring for each emotion that might indicate the strength of the emotional state. Most of the ethograms and behavioral research on horses fall into categories that do not necessarily align with “emotions”, but rather scientists have studied behaviors that align with physiological indicators of cortisol (indicating stress or irritation) or contextualizing behaviors according to the outcome (such as agonistic, affiliative, or curious). These same researchers have created ethograms to help others better understand these behavioral indicators of emotional affect in horses. Many of these indicators include ear and head position in addition to muzzle tension, body position, leg movement, and tail movement. Ear positions in horses, for example, often help observers determine the “emotion” of the horse based on the correlation of ear position with other behaviors and outcomes, including interactions with humans [
11,
12] which is further supported by research in cows linking ear position with emotional state [
13].
There is little literature showing definitive measures of equine emotional state, a challenge that has been recognized by leaders in the field [
14]. Researchers do, however, agree that we can use behavior to better understand emotional states in horses and potentially use these to assess welfare and modify management or handling to improve subjective equine experiences in our domestic environments [
14,
15,
16,
17]. There are some categories of emotions for which there is much research (FEAR) and some for which there is little research (CARE). Extensive ethograms exist for areas where there is significant research, while behavioral data from both free-living and feral horses help to provide additional insight into the areas for which there is little research in the domestic world.
While there is limited research on horse emotions, there is research on behaviors and the use of behaviors in horses to communicate desires or express states of being. In order to understand the context of these behaviors with regards to species-specific indicators of emotional valence, environmental variables need to be reduced or eliminated to understand baseline behaviors within a species. Because environments (and variables within domestic settings) can cause variations in behavior, basic behavioral indicators of emotions in horses have been selected from studies where horse behavior has been observed, recorded, and studied in feral and free-living groups. Social behaviors in these groups and behaviors of individuals within groups help serve as the basis for how individual and intraspecies interactions can provide additional insights in behavioral indicators of emotions [
18,
19,
20,
21,
22,
23,
24].
2.1.1. Care
Behavioral indicators of the emotion “care” can be generally defined as those that signify nurturing, bonding, affection, or compassion. For horses, these are categorized in the literature as “affiliative” behaviors which are generally defined as behaviors that are either friendly in nature or that occur exclusively between two individuals who have established a social bond recognized by more time spent in the proximity of one another as opposed to other members of the herd [
22,
25,
26,
27,
28]. One of the most recognizable affiliative behaviors indicating the emotion of care includes allogrooming, which occurs when two horses stand next to one another, facing in opposite directions, and scratch one another at the same time. The scratching can be at any body part, but usually the horses will scratch each other’s backs or withers (where the neck meets the back) [
29,
30,
31,
32,
33]. Other behaviors indicating care include moving the head and neck over or under the head or neck of another, the placement of the head on the back of another horse, or when one horse touches their nose against the side of another [
27]. In addition to physical interactions, other behaviors indicating care include synchrony of movement (when two or more animals engage in the same activity at the same time) [
20,
34,
35] and shared close proximity for prolonged periods of time [
22,
23,
24,
36,
37].
2.1.2. Seeking
Behavioral indicators of the emotion for “seeking” include exploration and curiosity behaviors as well as searching behaviors that are expressed when horses are searching for resources and opportunities. Behaviors in this category are aligned with known behaviors of desire, curiosity, exploration, and anticipation. With regards to seeking, there is a specific order of behaviors that indicate curiosity and exploration starting with looking, then approaching and sniffing, and then often retreating and returning [
38].
2.1.3. Fear
Behavioral indicators of the emotion “fear” are associated with behaviors indicating avoidance and panic. Behaviors in this category include everything from mild discomfort and irritation to fear and terror. Much like the other categories, it is a spectrum of behaviors that indicate fear and desire to create more distance between themselves and objects or others or a cautionary approach that results in moving away from something or someone. Most of the behaviors indicating fear have been linked to physiological indicators of stress, mostly cortisol, that have been taken from blood, saliva, feces, and hair (indicating long-term stress responses). [
39,
40,
41,
42,
43,
44,
45,
45]. Behaviors in this category usually include high head position, wide eyes (where the white is visible), and fast or slow movement away from fearful object [
46,
47].
2.1.4. Lust
Behavioral indicators of the emotion “lust” in horses can be defined as those related to sexual and reproductive behaviors or behaviors indicating a desire to lead to reproductive behaviors. Reproduction and related behaviors in horses have been studied extensively for years as part of efforts to make reproduction more efficient in the domestic world. In horses these behaviors vary between mares (reproductive females) and stallions (reproductive males) and range from mares lifting their tails and “winking” their vulvas to stallions biting the mare and mounting [
48,
49,
50,
51].
2.1.5. Play
Play has been studied extensively in horses and, as an emotional category, looks at behaviors indicating joy and social engagement through active, fun bonding. There are two general categories of play: object play and play between conspecifics [
52]. Since these behaviors often momentarily mimic rage, many behaviors categorized as play may also be similar to those that indicate rage and fighting. Ear positions can help differentiate between play and rage, as can level of arousal (stress) and ability to regulate arousal. Common behaviors indicating play include biting each other’s legs, neck or mouth with light or little contact, momentary pinning of the ears (when ears are flattened back against the neck), slow running or sparring, and ability to regulate movements (slow upwards or downwards progression of movement with no sudden bursts and no fear responses). Play often mimics fighting (in the emotional category of rage) but does not include any hard contact and both animals continue to choose to remain the proximity of the other even after contact play has ended.
2.1.6. Rage
Behavioral indicators of the emotion “rage” can be defined as those that signify frustration, anger, or irritation. Behaviors indicating rage take place when one individual feels this kind of emotion towards another living creature or towards an environmental condition or an interaction (including with people and riders). Interspecific communication of emotions is usually towards a conspecific, although horses will display these communication behaviors towards other animals, including humans or in domestic conditions where they are communicating a mild to extreme physical or psychological discomfort [
53,
54]. For horses, the emotion of “Rage”, like other emotions, is a spectrum ranging from irritation or dislike to full anger and rage and can be broken into two categories: Intra- and Interspecies communication. With regards to intra-species rage, we can look at pro-social behaviors that occur between equine conspecifics in the category of agonistic behaviors that indicate when one horse does not like the physical presence of another horse. In general, these behaviors are short in duration and last until the desired space has been achieved. Like stress behaviors, agonistic behaviors have been well studied under different circumstances with full ethograms describing the behavioral indicators [
55,
56]. These behaviors include pinning of the ears back against the neck, wide eyes, tight muzzle, bite threats, biting, kick threats, kicking, tail swishing, head tossing, striking out.
2.1.7. Grief
Behaviors aligned with emotions of grief are considered indicators of sadness, social distress, or social loss. Along with sadness, we can also look at potential signs of depression in this category. Although depression is considered a human diagnosis, researchers have looked at ethological animal models of depression with focus on horses [
57]. These behaviors include lack of movement or motivation to move, lack of curiosity or seeking behaviors, loss of appetite, and behavioral indicators that are also in line with mild physical discomfort such as chronic lowering of the head and loss of interest in social interactions.
2.2. Use of Emotions in Research
Scientists suggest that indicators of emotional states, including behaviors aligning with positive emotional wellbeing, should be included in assessments of psychological wellbeing and welfare in horses. Researchers have also assessed which of these behaviors they can use to assess equine welfare and wellbeing in the domestic setting.
Mutual grooming, also known as allogrooming, occurs when two individuals engage in scratching one another with their upper lip or teeth at the same time, usually at the withers or back, and is considered one of the behavioral indicators of the emotional category of care. In feral horses, this is seen only between two individuals who have demonstrated spatial preference and have shown signs of social bonding through time and proximity [
25]. This kind of known affiliative interaction is often more frequently displayed in domestic horse herds than in feral herds [
58,
59], suggesting that increased frequency of allogrooming and play are often linked with coping behaviors associated with chronic or acute stress [
31,
60,
61]. Recently, researchers have suggested that this correlation of high frequency of affiliative behaviors with stress means that these behaviors are not good indicators of positive welfare in domestic herds of horses and that additional indicators should be used to asses domestic horse welfare [
28].
To control for unnatural physical and social settings that might influence the type, frequency, or behavioral expression of emotions, it’s critical to base these kinds of behavioral studies on free-living and feral horses where horses have choice and freedom to express behaviors based on natural social and environmental conditions. Such studies of free-living, feral, and wild horses serve as the basis for understanding social behaviors in horses to control for the variables of the domestic setting and to provide additional insights into the natural lives of horses [
21,
24,
62]. Data used for this study has therefore focused on videos of free-living and wild horses to avoid potential influences of the constructed domestic world on the emotional expression of horses.
If we can accurately measure or categorize behaviors in horses, we can therefore take steps to manage conditions that either favor positive emotional states or decrease negative ones [
17].
3. Related Work
Recent advancements in animal motion tracking and pose recognition have significantly impacted the study of animal behavior, enabling automated recognition of internal states such as emotions and pain to improve animal welfare. Broomé et al. [
63] present a comprehensive survey of computer vision-based research on recognizing pain and emotional states in animals, analyzing both facial and bodily behaviors.
Additional recent research has been done to assess the use of machine learning in recognition of horse expressions and behavior related to welfare, specifically in using the equine pain face [
64] and grimace scale |[
65,
66]. Studies in 2018 and 2021 supported the use of machine learning to recognize facial expressions of horses, especially related to the research on facial expressions related to pain [
67,
68]. Researchers have used this preliminary research to specifically assess the ability of automated systems to recognize emotional expressions in horse faces [
69] and develop deep-learning models to further read pain expressions in equine faces [
70] by using AI models to recognize horse emotions from facial expressions in a controlled experiment. They tested two approaches: A deep learning pipeline using video footage and a machine learning pipeline utilizing EquiFACS annotations. The deep learning model outperformed, achieving 76% accuracy in distinguishing four emotional states—baseline, positive anticipation, disappointment, and frustration. However, separating anticipation and frustration proved challenging, with only 61% accuracy. Machine learning has expanded beyond horse to recognize facial expressions in other animals [
71]. Additional research in artificial intelligence and machine learning has also supported the use of these technologies to automate behavioral recognition with guided training by specialized animal behaviorists [
72]. The research in using machine learning to read and recognize facial expressions in horses is growing, but there is a lack of work in using machine learning to recognize and assess the emotional expressions of horses using body position or movement.
Tendencies towards using unsupervised methods (e.g., [
73]) can be identified as offering promising results. With our work, we extend previous research which mainly employed supervised ML methods for detecting emotions and well-being in horses. In contrast to using video data, previous research mainly uses image data. For example, Corujo et al. [
74] designed a "proof of concept" system to recognize emotions in horses, combining a fast region-based convolutional neural network (detector) for identifying horses in images and a convolutional neural network (model) for predicting their emotions. The system was trained on 400 labeled images and tested on 40 images, achieving an 80% accuracy on the validation set and 65% on the test set. Four emotional states (alarmed, annoyed, curious and relaxed) were inferred using behavioral ethograms, focusing on head, neck, ear, muzzle, and eye positions.
With respect to unsupervised emotion detection, Bhave et al. [
10] demonstrate how novel canine emotions can be detected using the Momentum Contrast (MoCo) framework. They developed a system to identify dog emotions based on facial expressions and body posture. To do so, they constructed a dataset comprising 2.184 images of ten popular dog breeds, categorized into seven primal mammalian emotions as defined by Jaak Panksepp: Exploring, Sadness, Playing, Rage, Fear, Affectionate, and Lust. Using a modified version of the contrastive learning framework MoCo, they achieved an unsupervised accuracy of 43.2% on their dataset, significantly surpassing the 14% baseline. When applied to a second public dataset, the model achieved 48.46% accuracy against a 25% baseline. This research shows that an unsupervised approach can achieve promising results.
4. Method
In this experimental study we have pursued a novel approach that focuses on giving a convolutional neural network freedom to learn on it’s own, reconfigurating the image augmentation layers and framework architecture. Consequently, we have prepared an original dataset of unlabeled images. We have developed a pipeline that we used to extract high quality horse images from over 3 terabytes of original horse video footage. To conduct a comprehensive representation learning experiment on this data, we used the MoCo [
3] framework. The decision to use this framework was inspired by previous work on dog emotions [
10], where a detailed study on contrastive learning frameworks such as SimCLR [
2] was conducted to determine the best performing framework for dog images. Motivated by the previous successful application of the MoCo framework and its low computing infrastructure requirements, for this study we have implemented a modified version of the MoCo framework [
3] developed by He et al.
4.1. Creating the Dataset
As mentioned previously, we had access to over 3 terabytes of proprietary video footage. This footage was collected exclusively from wild and untamed horses, to preserve the natural element within each emotional display. Our team traveled to five different locations, to obtain the footage of five different wild horse breeds.
Table 1 describes the different wild horse breeds from which our data was collected.
4.1.1. Dataset Description
We created an evenly distributed dataset, collecting a similar number of data points from all five aforementioned horse breeds.
Figure 1 gives the distribution of image count across the five breeds. There are 711 Eriskay Pony images, 781 Insh Konik images, 809 Pottoka images, 796 Skyros Pony images and 832 Konik images.
4.1.2. Splitting the Dataset
In order to conduct contrastive pre-training, and then fine tune a downstream classifier network after freezing the pre-trained encoder, we needed to split the dataset into three sections. The three sections are each individually required for contrastive pre-training, downstream fine tuning and final testing respectively. These splits were conducted in a stratified shuffle, in order to maintain the same breed distribution in each of these splits. We partitioned 86% of our data, amounting to 3,374 images for contrastive pre-training using the MoCo [
3] framework. We partitioned 10% of our data, amounting to 396 images for fine-tuning the downstream classifier network. Finally, the remaining 4% of the data i.e. 159 images were used to test the performance of the final model.
Figure 2 describes the split ratio in the stratified-shuffle split.
Figure 3 and
Figure 4 describe the emotion distribution in the 10% and 4% splits respectively. It is important to note that our team classified 14% of the total collected data, amounting to 555 images, into seven Panksepp [
9] and the ’Pain’ emotions manually. Ashley et al described a comprehensive method of deciphering the ’Pain’ emotion among equines in [
75]. Kieson et al provide a unambiguous idea of identifying affection and lust among equines in [
27] and [
31]. McDonnell et al describe the playful behavior in horses in [
52]. Burke explains in detail the movement patterns associated with exploratory & seeking behaviour among horses in [
38]. Fureix et al give an insight to the display of grief in horses, in [
57]. Leiner gives a detailed analysis of fear and rage among horses in [
76]. Our team has adhered to the directives established by the above research papers while labeling the few hundred images for downstream tasks and testing. It is important to note that labeled data is only required to analyze the performance of the encoder that was trained using unlabeled data.
4.2. Image Retrieval Pipeline
With over 3 terabytes of video footage, collected in different devices across different locations, manually extracting horse images would have been a futile effort. Each frame may contain multiple horses and since a significant proportion of footage was collected without a tripod, some frames may have a motion blur. As a result, we resorted to developing an automated system, powered by YOLOv8 object detector [
77], which would automatically identify the horses. Following this, we set up an image filter phase, that made sure any images with extremely low resolution and high noise were eliminated. We maintained a minimum resolution of
. We also maintained a minimum required value of pixel intensity standard deviation
after normalizing pixel values, such that
.
Figure 5 describes the basic flow of data within our image retrieval pipeline. We used this pipeline on all the videos our team had collected, and through this pipeline we were able to extract the best cropped horse images to constitute a dataset. To achieve this, frames are taken from the video, initially at 30 FPS but later downsampled to 0.1 FPS for more efficient processing. To do this, the heaviest version of the object detection model ’YOLOv8x’ [
77], is adopted because it is highly accurate and efficient in recognizing objects like horses and generating a bounding box around them in those frames. The detected bounding boxes are cropped, and their images are filtered for standard deviation of pixel intensity and resolution. Finally, the filtered cropped horse image are stored in a dataset for further augmentation and training.
The pipeline starts with a high-frame-rate video at 30 FPS, which is then reduced to 0.1 FPS to prevent redundant frame processing. YOLOv8x [
77] detects horses in the frames, using its 68.2M parameters and 257.8 GFLOPs computational power. Each detection is represented as a bounding box around the identified horses within the frames. During crop extraction, only those bounding box regions with low noise
and sufficient resolution are maintained for high-quality data. These are meant for use in tasks such as training ML models or for analysis within a domain for the purpose of analyzing horse images.
Using this pipeline, we were able to collect 3,929 high quality horse images that we used for our project. Contrastive Learning is a data-intensive learning technique, and thus we reserved 3,374 unlabeled images for pre-training the encoder.
4.3. Momentum Contrast for Unsupervised Visual Representation Learning
Contrastive learning has become a prominent paradigm in self-supervised learning, where it draws upon the intuition of bringing similar representations closer and pushing dissimilar ones apart in a latent space as seen in the research work of Hadsell et al. [
78] and Chen et al. [
2]. It is effective because it learns strong features that generalize across tasks by maximizing mutual information between augmented views of the same data, as explored by Oord et al. in [
79]. Momentum Contrast [
3] by He, Yang, Craswell, Wang, Chen, and Gao in 2020 advances this framework by providing a dynamic dictionary with a momentum encoder that allows for large-scale instance discrimination using a memory bank to retain a representation space. Unlike other frameworks, such as SimCLR by Chen et al. [
2], 2020, which requires large batch sizes and intense computation, or BYOL by Grill et al. [
4], 2020, which abandons negative samples, MoCo balances between the decoupling of batch size and dictionary size, ensuring that negatives are always of high quality by using a momentum mechanism. Its flexibility and scalability have made it a cornerstone in unsupervised representation learning, bridging the gap between self-supervised and supervised performance in vision tasks.
4.3.1. Data Augmentation Layer of MoCo
The data augmentation layer of the Momentum Contrast (MoCo) framework is the core component of generating diverse and meaningful views of the input data for enhancing the representation learning process. Inspired by prior works like SimCLR [
2], MoCo uses strong augmentations, such as random cropping, resizing, horizontal flipping, color jittering, grayscale conversion, and Gaussian blurring, to create two augmented versions of the same image. Such augmentation ensures to learn invariant features that survive distortion and transformation as envisioned by contrastive learning objectives, such as those as introduced by Hadsell et al. [
78]. The augmentation pipeline of MoCo prevents the quality degradation of positive pairs (of the same instance) which in turn allows contrasting from the negative pairs (whose instances are different) by residing in the memory bank for effective discriminative representations. By spectrum augmentation of input data to variation without losing any semantic content, the MoCo augmentation layer is a critical component.
The primary function of the augmentation layer is to emphasize the invariant features within an image. This implies that an augmentation layer should be kept flexile, to see which components within the layer perform best. The original MoCo [
3] augmentations were borrowed from SimCLR [
2], which itself was trained on the ImageNet [
80] dataset. Specifically, SimCLR [
2] was trained on ILSVRC-2012 version, which contains over 1.2 million labeled images across 1,000 categories. Our dataset contained only 3,374 images for training. Learning from [
27,
31,
38,
57,
75,
76], we discovered that a horse’s posture and the relative position of its neck, head and tail are the most significant indicators of its emotions. As a result, we developed an augmentation layer with serially organized components that are designed to accentuate these features within a horse image. At the same time, we made sure that our augmentations get rid of any irrelevant features, like the horse’s color and the surrounding terrain. Consequently, we independently developed the query transforms and the key transforms.
This pipeline is carefully designed to introduce randomness in such a way that it keeps the input data structurally intact to enable contrastive learning. The query_transforms pipeline initiates with a random resized cropping operation parameterized with a scaling range of , thus it diversifies the spatial representation of a particular image by extracting variable-sized subregions from the original image. The subsequent grayscale conversion removes the color information and allows the model to work on intensity and texture pattern variations. A random horizontal flip with a probability introduces controlled geometric distortion and enhances orientation insensitivity. Finally, Gaussian blur, used sometimes (with a probability), is also an injection noise method that emulates realistic distortions and hence enhances the robustness of learned features. The Sobel-Filter distinctly highlights gradient information, facilitating accurate extraction of features based on edges. Ultimately, the stages of tensorization and normalization establish compatibility with GPU-accelerated computations while ensuring uniform input distributions, thereby promoting effective optimization.
The key_transforms pipeline is a complementary augmentation strategy which has a focus on structural integrity, with a lower degree of stochastic noise. It starts with a random resized crop with a constrained scale range of , so that more contextual spatial information is likely preserved in the image. It is similar to the query branch and applies grayscale conversion as well as the Sobel-Filter for edge and texture preprocessing necessary to maintain feature consistency across different views. This pipeline deliberately excluded horizontal flips and blur augmentation in order to allow a more deterministic baseline representation aligned with the broader strategy of multi-view contrastive learning. The transformation pipeline finally ends with tensor conversion and normalization for ensuring consistent comparability across inputs from any of the augmentation streams. Pipelines work in concert within the TwoCropsTransform framework to produce contrasting yet complementary views that facilitate the learning of robust, invariant feature embeddings, which are critical for unsupervised representation learning.
The following is the code that describes our data augmentation layer in terms of query and key transforms:
- 1
query_transforms = transforms.Compose([
- 2
transforms.RandomResizedCrop(size=(SIZE, SIZE), scale=(0.66, 1.0)),
- 3
transforms.Grayscale(),
- 4
transforms.RandomHorizontalFlip(p=0.5),
- 5
transforms.RandomApply([GaussianBlur()], p=0.1),
- 6
SobelFilter(),
- 7
transforms.ToTensor(),
- 8
normalize
- 9
])
- 9
10
- 11
key_transforms = transforms.Compose([
- 12
transforms.RandomResizedCrop(size=(SIZE, SIZE), scale=(0.85, 1.0)),
- 13
transforms.Grayscale(),
- 14
SobelFilter(),
- 15
transforms.ToTensor(),
- 16
normalize
- 17
])
- 18
train_transforms = TwoCropsTransform(query_transforms, key_transforms)
Figure 6 displays the work of the image augmentation layer.
4.3.2. Encoder
The augmentation layer produces images in a single channel, as it is designed to convert all images to grayscale. As a result the traditional encoders used with MoCo are not the optimal choice here. The MoCo paper discussed experimentation with ResNet-50-w4x, ResNet-50w2x and ResNet-50 as encoders. These encoders are not only resource-intensive but also difficult to train with limited hardware. The most significant issue with using these encoders is that they require a large amount of GPU video memory (VRAM). The primary reason behind this is that these encoders are designed to handle data inputs with 3 channels (RGB). By grayscaling our inputs we not only got rid of the irrelevant color data, but also made it easy to train with limited VRAM and computing resources. Consequently, we decided to utilize a completely new encoder architecture, inspired from SOPCNN[
81]. With 14,00,000 trainable parameters, MNIST-SOPCNN [
82] is ranked 4th in the accuracy rankings for MNIST Image Classification [
83], with an accuracy of 99.83%. SOPCNN, standing for Stochastic Optimization Plain Convolutional Neural Networks, is an optimization technique that uses a combination of regularization techniques which work together to get better performance, makes plain CNNs less oversensitive to overfitting. Our SOPCNN inspired encoder is a plain CNN structure dedicated to efficient image classification, based on the tenets of simplicity and efficiency as found in [
81]. This model systematically follows the implementation of four convolutional layers characterized by ReLU activations with the use of 3×3 kernels and 2×2 max-pooling for reducing spatial dimensions while extracting hierarchical features simultaneously. The architectural design ends with a densely connected layer with 2048 neurons followed by dropout regularization with a parameter p=0.8 to prevent overfitting by randomly disabling neurons in the training process. It is designed for grayscale input and allows images of different sizes, which is consistent with leading performance on datasets like MNIST and SVHN. Thus, it shows the efficiency of conventional CNNs if appropriately enhanced through robust regularization methods like dropout and data augmentation. It is important to note that MoCo [
3] uses identical architectures in both query and key encoders.
The following is the code that describes our query and key encoder:
- 1
class SOPCNN(nn.Module):
- 2
def __init__(self, num_classes=128, IMG_SIZE=224):
- 3
super(SOPCNN, self).__init__()
- 4
self.IMG_SIZE = IMG_SIZE
- 5
self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
- 6
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
- 7
self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
- 8
self.conv4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
- 9
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
- 10
self.fc1 = nn.Linear(256 * (self.IMG_SIZE//4) * (self.IMG_SIZE//4), 2048)
- 11
self.fc2 = nn.Linear(2048, num_classes)
- 12
self.dropout = nn.Dropout(0.8)
- 13
def forward(self, x):
- 14
x = F.relu(self.conv1(x))
- 15
x = F.relu(self.conv2(x))
- 16
x = self.pool(x)
- 17
x = F.relu(self.conv3(x))
- 18
x = F.relu(self.conv4(x))
- 19
x = self.pool(x)
- 20
x = x.view(-1, 256 * (self.IMG_SIZE//4) * (self.IMG_SIZE//4))
- 21
x = F.relu(self.fc1(x))
- 22
x = self.dropout(x)
- 23
x = self.fc2(x)
- 24
return x
4.4. Final MoCo Network Architecture
Figure 7 displays the work of the image augmentation layer.
The SOPCNN Query Encoder is then trained for sufficient rounds and its settings are frozen. Another network is then attached to its output to see how well it performs. In this phase, the progress is tracked with a loss value that decreases.
With NT-Xent Loss function stochastic gradient descent, the query encoder is improved. At the same time, the momentum encoder is updated using exponential moving averages. According to the method from MoCo [
3], the encoder is designed to produce the SOPCNN output embedding vectors of size 64 for downstream usage.
5. Results
Throughout this project, we have made extensive use of PyTorch, which is a deep learning and hardware acceleration library in Python v.3.9. While experimenting locally with our NVIDIA RTX 3070-Ti GPU, we have used Ubuntu-22.04 LTS, which is an opensource Linux-based operating system built by Canonical, London, England. While training on the NVIDIA Tesla P100 GPU, we have used Kaggle’s default shell for operating the Linux commands during training and analysis. Locally, we have used an Intel-12800HX CPU, manufactured by Intel, Santa Clara, CA, United States, and in Kaggle’s remote environment we have used their Intel-Xeon CPU. It is important to note, however, that the CPU makes little difference for the outcome of our experiments.
5.1. Pre-training with Momentum Contrast
As discussed previously in the manuscript, we have designed the encoder ourselves, taking inspiration from SOPCNN [
81]. While training this encoder using momentum contrast [
3], we used the following hardware shown in
Table 2:
Using the NT-Xent Loss function discussed earlier, we were able to train the encoder and track the training progress. Owing to the novelty of this encoder, we had to work through different sets of hyper-parameters and after thorough experimentation, we found the following hyperparameter values to be the most optimal
Table 3:
After 2000 epochs spent in training, we observed the following decrease in loss value
Figure 8:
In order to track the training progress of our encoder, after every 100 epochs, we analyzed the features produced by a single forward pass of the encoder. This was done by taking the output vector embeddings from the encoder’s output, and using t-distributed Stochastic Neighbor Embedding (t-SNE) to reduce its dimensions from 64D to 2D. The perplexity value we used for this was 30.
Figure 9 shows the distribution of the 3,374 datapoints before training the encoder.
Figure 10 shows the distribution of these datapoints after 2000 epochs of training with the aforementioned hyperparameters.
We can visibly note the cluster formation that has occurred, indicating that the model was inherently able to distinguish between the emotions in an unsupervised fashion. Further visual analysis indicates that the encoder clusterized this data into more than 8 clusters. This means that the encoder has clearly picked up on more than the 8 original emotions. This study was originally developed around the seven Panksepp emotions [
9] and the ’Pain’ emotion, but this result foreshadows the possibility of existence of novel emotions among equines. We hypothesize, the our SOPCNN encoder has detected novel emotions which are more complex and nuanced and which academia may not be aware of until now.
5.2. Downstreaming
In this section, we explore the performance of our encoder, by systematically freezing its weights and linking it to a downstream classifier as follows:
- 1
class CompleteModel(nn.Module):
- 2
def __init__(self, encoder, feature_dim=64, num_classes=8):
- 3
super(CompleteModel, self).__init__()
- 4
self.encoder = encoder
- 5
for param in self.encoder.parameters():
- 6
param.requires_grad = False
- 7
self.fc1 = nn.Linear(feature_dim, 256)
- 8
self.bn1 = nn.BatchNorm1d(256)
- 9
self.dropout1 = nn.Dropout(0.5)
- 10
self.fc2 = nn.Linear(256, 128)
- 11
self.bn2 = nn.BatchNorm1d(128)
- 12
self.dropout2 = nn.Dropout(0.5)
- 13
self.fc3 = nn.Linear(128, 64)
- 14
self.bn3 = nn.BatchNorm1d(64)
- 15
self.dropout3 = nn.Dropout(0.3)
- 16
self.fc_out = nn.Linear(64, num_classes)
- 16
17
- 18
def forward(self, x):
- 19
features = self.encoder(x)
- 20
x = F.relu(self.bn1(self.fc1(features)))
- 21
x = self.dropout1(x)
- 22
x = F.relu(self.bn2(self.fc2(x)))
- 23
x = self.dropout2(x)
- 24
x = F.relu(self.bn3(self.fc3(x)))
- 25
x = self.dropout3(x)
- 26
x = self.fc_out(x)
- 27
return x
After thorough experimentation we discovered that the above downstream architecture is superior for accurate classification. The proposed network architecture consists of a pretrained SOPCNN encoder combined with a custom classification head designed for efficient transfer learning and robust feature refinement. The pre-trained encoder acts as a feature extractor; its parameters are frozen to retain the learned representations from a large-scale source dataset. This reduces the training time and computational resources. The downstream classifier is comprised of the following layers:
Fully Connected Layers (fc1, fc2, fc3): These layers progressively reduce the feature dimensions from the encoder output (feature_dim=64) to align with the classification task. This dimensionality reduction enables the network to extract higher-order representations tailored to the downstream task.
Batch Normalization (bn1, bn2, bn3): Batch normalization is applied after each fully connected layer to stabilize learning by normalizing feature distributions. This helps accelerate convergence and mitigates internal covariate shift.
ReLU Activations: Using the activation function ReLU in the hidden layers introduces a nonlinearity that the network then uses to approximate complex interrelations in data.
Dropout (dropout1, dropout2, dropout3): Utilize dropout to prevent the model from overfitting by randomly disabling neurons by using different dropouts; dropout1 0.5 and dropout2 0.3. The model learns patterns that could be generalized.
Output Layer (fc_out): A final fully connected layer maps the refined features to the required number of classes (num_classes=8), serving as the prediction layer for classification.
We trained this downstream classifier, with the aforementioned fine-tuning and validation datasets, which each contain 396 and 156 labeled images respectively. After training the above architecture for 500 epochs, we found that the following trend emerged
Figure 11:
The classifier was able to achieve a classification accuracy of 55.47% across 8 labels, where the baseline accuracy would be 12.5%. We observed the following accuracy scores while evaluating individual emotions.
5.3. Generalizability of Our Results
Our results demonstrate the strong potential of contrastive learning frameworks, particularly the MoCo variants, for analyzing emotional behaviors in horses. The effectiveness of our models, despite using lighter encoders and operating within computational constraints, underscores the adaptability and robustness of these approaches. However, it is essential to recognize the limitations posed by the specific dataset and the potential variability in real-world scenarios. Future research should focus on evaluating these models across diverse datasets, including those representing different breeds and environments, to ensure broader applicability. Additionally, exploring transfer learning techniques could enhance the generalizability, allowing models trained on horse emotion datasets to be adapted to other species or even human emotional recognition tasks. This cross-species application could open new avenues for understanding and interpreting emotional behaviors using machine learning, fostering advancements in both animal behavior research and human–animal interaction studies.
The potential implications of the generalizability of our results are significant. If these models could be effectively applied across diverse datasets, it could lead to a more accurate and nuanced understanding of horse emotions in various contexts, such as different breeds and environments. This could enhance the development of applications in areas such as animal welfare, training and therapy. Moreover, exploring transfer learning techniques could enable the adaptation of these models to other species or human emotional recognition tasks. Such cross-species applications could advance our understanding of emotional behaviors more broadly, facilitating improvements in human–animal interaction studies and response in both animals and humans.
Our current model relies simply on postures. It is thus possibly biased toward a restricted ability to be accurate while distinguishing between several emotions, as equine emotions also depend upon environmental factors or social information, such as the presence of other horses, animals and even humans. Therefore, future experiments with contextual integration through training on rich datasets with diverse backgrounds that can encompass a variety of interactions will increase accuracy. This could be made possible by integrating contextual cues through multimodal approaches or context-conditioned training, so that the model would better interpret and differentiate subtle emotional differences in horses.
6. Discussion
This project originally began as a continuation of previous work with canines [
10], motivated by the need to investigate how self-supervised learning methods could reveal latent emotional states in animal species. Shifting our focus from canines to equines was a conscious decision to explore a domain where affective behavior is highly nuanced yet relatively underexamined in the literature. Horses communicate their emotions via a broad spectrum of behavioral cues—facial expressions, ear positions, tail carriage, and subtle body postures—that lend themselves well to data-driven feature extraction. By applying Momentum Contrast (MoCo) [
3] to extensive unlabeled equine image data, we aimed to assess whether a CNN encoder could autonomously learn intricate representations that align with distinct emotional or affective states.
We used t-SNE as a visualization tool both before and after pretraining. The result exposed a stark contrast in the way that the network sorts out the feature space. Following training with MoCo, our encoder generated clusters that are significantly more compact and well-delineated than those from the untrained baseline. Our labeled data set (constructed from audio recordings based on the seven canonical Panksepp emotions—CARE, LUST, FEAR, GRIEF, PLAY, RAGE, SEEKING—augmented with PAIN) [
9] was relatively modest in size, but applying these features to downstream classification resulted in an accuracy of about 55.47%. A modest performance that underlines the fact that features that are learned in an unsupervised manner do indeed extract relevant characteristics from equine emotional expressions.
One particularly compelling discovery is that the t-SNE plots exhibit more than eight clusters—suggesting the presence of emotional or behavioral nuances beyond our predefined categories. This finding resonates with longstanding proposals in animal behavior research that actual emotional repertoires are more granular than standard frameworks capture [
84]. For instance, what we label as “FEAR” might comprise multiple sub-states, each characterized by distinct triggers, intensities, or physiological markers. Thus, our work not only validates the utility of self-supervised learning for uncovering hidden structures in the data, but also nudges us toward more fine-grained, ecologically valid categorizations of equine affect [
76].
Algorithmic novelty notwithstanding, such an undertaking hints at something much more fundamental: a paradigm shift in animal emotion research, where large-scale data with minimal annotation can create insights that will challenge and sharpen the existing theoretical models. If these new clusters represent new or poorly recognized emotional states, their ability to be consistently identified in many different contexts would call for further empirical study—potentially changing the landscape of how we understand horse welfare, behavior management, and training practices.
6.1. Broader Impact on the Field
The broader implications of this study extend well beyond equine research. Our results emphasize the applicability of advanced machine learning and sensor data analysis for unveiling intricate facets of animal behavior and welfare. Traditional methods for studying animal affect typically rely on expert observation and subjective coding schemes, which, although valuable, can be time-consuming and prone to human bias. Self-supervised approaches address these drawbacks by scaling to large datasets without explicit labeling and by allowing the algorithm to surface latent feature representations that might not be evident through manual inspection.
As the cost of sensor and imaging technologies decreases, the potential to extend similar pipelines to other domesticated and even wild species becomes more tangible. The principles demonstrated here, contrastive learning on unlabeled images followed by clustering and minimal-labeled fine-tuning, can also provide information for wider initiatives on animal conservation, zoo management, and public policy on the welfare of animals. More systematically, pain or stress could be detected specifically through these means to facilitate timely intervention, improving treatment results in veterinary medicine.
Crucially, our findings also ignite discussion around the interpretability of “black box” neural networks in sensitive domains such as welfare assessment and medical diagnostics. We must contemplate how best to explain or justify the classification decisions made by these models, especially if their outputs influence real-world decisions regarding an animal’s well-being. Advancements in explainable AI (XAI), tailored to animal-centric applications, could greatly bolster practitioner and stakeholder trust in such systems, enabling ethically responsible deployment [
85].
6.2. Further Scope of this Research
The encouraging results obtained here naturally suggest several future directions. One line of work could involve capturing a wider range of equine expressions under different environmental conditions, from calm or routine daily scenarios to highly stimulating or stressful events. A richer dataset might yield even finer partitions in the feature space, potentially uncovering sub-emotions or context-specific behaviors that remain occluded in smaller samples. Including temporal information—e.g., video data—would allow us to investigate transitions between emotional states, giving a more dynamic and realistic view of how emotions unfold over time.
Comparisons among different self-supervised learning frameworks, such as SimCLR [
2], SwAV [
86], or BYOL[
4], could also be highly instructive. While MoCo [
3] has shown strong performance on many tasks, the choice of architecture and contrastive mechanism can profoundly affect the embeddings extracted. Determining which framework best captures the subtle morphological and behavioral cues of horses could lead to domain-specific methodological improvements, potentially enhancing the recognition rates for more elusive emotional states.
In particular, integrating multimodal data stands out as a promising endeavor. Combining image-based features with audio signals (e.g., whinnies, neighs, snorts), physiological data (heart rate, cortisol levels, respiratory patterns), and contextual metadata (temperature, time of day, type of activity) could illuminate additional facets of equine affect. Such a multi-sensor approach would likely increase confidence in the discovered clusters and their interpretability, reducing the chances of conflating unrelated behaviors with specific emotions.
Finally, there is enormous value in collaborative, cross-disciplinary efforts. Incorporating the expertise of veterinarians, equine behaviorists, animal welfare scientists, and machine learning researchers could foster a mutual exchange of knowledge. Expert insights would refine the labeling of behaviors or emotional states, while robust machine learning techniques would accelerate hypothesis testing and discovery of unrecognized patterns. Ideally, such partnerships would produce validated, high-confidence datasets that better approximate real-world horse-human interactions, offering benefits for sport, therapy, and day-to-day caregiving practices.
6.3. Ethical Considerations
Equine emotions trigger a special set of ethical responsibilities. First, it has to be ensured that data collection methods, from taking photographs in private stables to accessing publicly available imagery, are strictly adhering to guidelines protecting animal welfare. Although we work with our own dataset of wild horse images, future expansions may include more invasive modalities, such as biometric sensors or close-up videography, which would require careful ethical review to ensure that the animals are not unduly stressed.
Second, the interpretation of discovered clusters demands caution. Although the model may autonomously group visually similar behaviors, these clusters are not automatically “new emotions.” Overinterpretation risks anthropomorphizing or mislabeling a horse’s response, potentially leading to misguided interventions or welfare policies. As such, the labeling process must remain tightly coupled with domain expertise to validate whether observed behaviors are consistent with ethologically recognized expressions (e.g., ear-pinning for agitation, lip-lifting for pain or discomfort).
Any practical application of these tools must be done with an appreciation for how classification errors might translate into real-world consequences. Erroneous classification (for example, labeling a tired horse as depressed or a nervous horse as aggressive) may result in inappropriate management decisions. This is where transparency and explainability become very important: stakeholders must understand that these models provide probabilistic suggestions, not definitive diagnoses. Frameworks for risk assessment and error mitigation should be put in place, especially in cases where an incorrect label could materially affect an animal’s treatment or living conditions.
Lastly, data privacy and ownership considerations cannot be overlooked, even in animal-focused research. While horses do not have the same legal protections as humans regarding “personal data,” the concerns of owners, caretakers, and property managers must be respected. Clear data governance policies—covering consent for image capture, usage rights, and the secure handling of proprietary biometric information—are foundational to ethical research practices. By addressing these concerns proactively, we can ensure that the pursuit of scientific discovery does not come at the expense of animal welfare or stakeholder trust.
7. Conclusions
In summary, this study shows that self-supervised learning, specifically Momentum Contrast (MoCo), is capable of revealing latent structures in equine emotional behavior from large-scale, unlabeled image datasets. Our approach was able to extract meaningful feature representations, which enhanced downstream classification tasks and opened the possibility for states of emotions beyond those usually categorized. The t-SNE visualizations, indicating the presence of further clusters, indicate that equine emotional repertoires could be more fine-grained than the current conceptualizations acknowledge, and the current study makes a prima facie case for additional empirical work. This work not only demonstrates the utility of data-driven approaches in driving animal emotion research but also underlines the need for interdisciplinary approaches to refine our understanding of animal welfare and behavior. Future studies will be able to extend from these findings to consider nuances in specific contexts and to include multimodal data to better understand the emotional lives of animals.
Author Contributions
Conceptualization, P.G., A.B.; methodology, A.B., P.G.; software, A.B.; validation, E.K., A.B.; formal analysis, E.K.; data curation, E.K., A.B.; writing—original draft preparation, A.B., E.K., A.H.; writing—review and editing, A.H., P.G; supervision, P.G.; All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
The animal study protocol was approved by the Institutional Review Board of Equine International Foundation.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
Appendix A.1
Table A1.
Detailed mapping of acronym to its respective full form.
Table A1.
Detailed mapping of acronym to its respective full form.
| Acronym |
Full Form |
Usage |
| CNN |
Convolutional Neural Networks |
Refers to the neural network architecture used for image classification in emotion recognition. |
| MoCo |
Momentum Contrast |
Refers to the contrastive learning framework used for unsupervised learning on visual data [3]. |
| SimCLR |
Simple Framework for Contrastive Learning of Visual Representations |
Alternative contrastive learning framework used in this research [2]. |
| ACI |
Animal Computer Interaction |
Represents the interdisciplinary field connecting animals and technology for interaction studies. |
| KNN |
K-Nearest Neighbors |
Refers to the classifier used for evaluating model embeddings in this study. |
| RTX |
Ray Tracing Texel eXtreme |
NVIDIA’s GPU model, used here to specify the hardware for model training. |
| GPU |
Graphics Processing Unit |
Refers to the hardware component used for model computation and training. |
| NT-Xent |
Normalized Temperature-scaled Cross-Entropy |
A loss function used in contrastive learning for measuring model accuracy. |
| t-SNE |
t-Distributed Stochastic Neighbor Embedding |
A visualization tool for examining high-dimensional model embeddings. |
| MGBC |
Mini-Batch Gradient Checkpointing |
Allows us to recompute activations and gradients on-the-fly but can be computationally expensive. Checkpointing allows for faster training by recomputing only the necessary values. |
| EMA |
Exponential Moving Average |
Technique used to update the weights in the key encoder. |
References
- Li, S.; Deng, W. Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing 2022, 13, 1195–1215. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G.E. A Simple Framework for Contrastive Learning of Visual Representations. CoRR 2002. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R.B. Momentum Contrast for Unsupervised Visual Representation Learning. CoRR 1911. [Google Scholar]
- Grill, J.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.Á.; Guo, Z.D.; Azar, M.G.; et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. CoRR 2020. [Google Scholar]
- Chen, X.; He, K. Exploring Simple Siamese Representation Learning. CoRR 2020. [Google Scholar]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised Contrastive Learning. CoRR 2020. [Google Scholar]
- Li, M.; Yang, B.; Levy, J.; Stolcke, A.; Rozgic, V.; Matsoukas, S.; Papayiannis, C.; Bone, D.; Wang, C. Contrastive Unsupervised Learning for Speech Emotion Recognition. CoRR 2021. [Google Scholar]
- Sun, L.; Lian, Z.; Liu, B.; Tao, J. HiCMAE: Hierarchical Contrastive Masked Autoencoder for self-supervised Audio-Visual Emotion Recognition. Information Fusion 2024, 108, 102382. [Google Scholar] [CrossRef]
- Panksepp, J. Affective neuroscience: The foundations of human and animal emotions; Oxford university press, 2004.
- Bhave, A.; Hafner, A.; Bhave, A.; Gloor, P.A. Unsupervised Canine Emotion Recognition Using Momentum Contrast. Sensors 2024, 24. [Google Scholar] [CrossRef]
- Chamove, A.S.; Crawley-Hartrick, O.J.; Stafford, K.J. Horse reactions to human attitudes and behavior. Anthrozoös 2002, 15, 323–331. [Google Scholar] [CrossRef]
- Wathan, J.; McComb, K. The eyes and ears are visual indicators of attention in domestic horses. Current Biology 2014, 24, R677–R679. [Google Scholar] [CrossRef]
- Proctor, H.S.; Carder, G. Can ear postures reliably measure the positive emotional state of cows? Applied Animal Behaviour Science 2014, 161, 20–27. [Google Scholar] [CrossRef]
- Hall, C.; Randle, H.; Pearson, G.; Preshaw, L.; Waran, N. Assessing equine emotional state. Applied animal behaviour science 2018, 205, 183–193. [Google Scholar] [CrossRef]
- Stewart, M.; Stratton, R.; Beausoleil, N.; Stafford, K.; Worth, G.; Waran, N. Assessment of positive emotions in horses: Implications for welfare and performance. Journal of Veterinary Behavior: Clinical Applications and Research 2011, 5, 296. [Google Scholar] [CrossRef]
- Stratton, R.B. Assessment of positive emotion in horses: a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Veterinary Science at Massey University, Manawatū, New Zealand. PhD thesis, Massey University, 2022.
- Waran, N.; Randle, H. What we can measure, we can manage: The importance of using robust welfare indicators in Equitation Science. Applied Animal Behaviour Science 2017, 190, 74–81. [Google Scholar] [CrossRef]
- Feh, C. Social behaviour and relationships of Prezewalski horses in Dutch semi-reserves. Applied Animal Behaviour Science 1988, 21, 71–87. [Google Scholar] [CrossRef]
- Feh, C. Relationships and communication in socially natural horse herds. The domestic horse, 2005; 83–93. [Google Scholar]
- Maeda, T.; Sueur, C.; Hirata, S.; Yamamoto, S. Behavioural synchronization in a multilevel society of feral horses. PLoS One 2021, 16, e0258944. [Google Scholar] [CrossRef] [PubMed]
- Maeda, T.; Ochi, S.; Ringhofer, M.; Sosa, S.; Sueur, C.; Hirata, S.; Yamamoto, S. Aerial drone observations identified a multilevel society in feral horses. Scientific Reports 2021, 11, 71. [Google Scholar] [CrossRef]
- Heitor, F.; do Mar Oom, M.; Vicente, L. Social relationships in a herd of Sorraia horses: Part II. Factors affecting affiliative relationships and sexual behaviours. Behavioural Processes 2006, 73, 231–239. [Google Scholar] [CrossRef]
- Heitor, F.; Vicente, L. Maternal care and foal social relationships in a herd of Sorraia horses: Influence of maternal rank and experience. Applied Animal Behaviour Science 2008, 113, 189–205. [Google Scholar] [CrossRef]
- Heitor, F.; do Mar Oom, M.; Vicente, L. Social relationships in a herd of Sorraia horses: Part I. Correlates of social dominance and contexts of aggression. Behavioural Processes 2006, 73, 170–177. [Google Scholar] [CrossRef] [PubMed]
- Shimada, M.; Suzuki, N. The contribution of mutual grooming to affiliative relationships in a feral misaki horse herd. Animals 2020, 10, 1564. [Google Scholar] [CrossRef] [PubMed]
- Farmer, K.; Krüger, K.; Byrne, R.W.; Marr, I. Sensory laterality in affiliative interactions in domestic horses and ponies (Equus caballus). Animal cognition 2018, 21, 631–637. [Google Scholar] [CrossRef]
- Kieson, E.; Sams, J. A preliminary investigation of preferred affiliative interactions within and between select bonded pairs of horses: A first look at equine “Love Languages”. Int. J. Zool. Anim. Biol 2021, 4, 000318. [Google Scholar]
- Zeitler-Feicht, M.H.; Hartmann, E.; Erhard, M.H.; Baumgartner, M. Which affiliative behaviour can be used as a valid, reliable and feasible indicator of positive welfare in horse husbandry? Applied Animal Behaviour Science 2024, 6236. [Google Scholar] [CrossRef]
- Bartlett, E.; Cameron, L.J.; Freeman, M.S. A preliminary comparison between proximity and interaction-based methods to construct equine (Equus caballus) social networks. Journal of Veterinary Behavior 2022, 50, 36–45. [Google Scholar] [CrossRef]
- Benhajali, H.; Richard-Yris, M.A.; Leroux, M.; Ezzaouia, M.; Charfi, F.; Hausberger, M. A note on the time budget and social behaviour of densely housed horses: A case study in Arab breeding mares. Applied Animal Behaviour Science 2008, 112, 196–200. [Google Scholar] [CrossRef]
- Kieson, E.; Goma, A.A.; Radi, M. Tend and Befriend in Horses: Partner Preferences, Lateralization, and Contextualization of Allogrooming in Two Socially Stable Herds of Quarter Horse Mares. Animals 2023, 13. [Google Scholar] [CrossRef] [PubMed]
- Lansade, L.; Lemarchand, J.; Reigner, F.; Arnould, C.; Bertin, A. Automatic brushes induce positive emotions and foster positive social interactions in group-housed horses. Applied Animal Behaviour Science 2022, 246, 105538. [Google Scholar] [CrossRef]
- Van Dierendonck, M.; Sigurjónsdóttir, H.; Colenbrander, B.; Thorhallsdóttir, A.G. Differences in social behaviour between late pregnant, post-partum and barren mares in a herd of Icelandic horses. Applied Animal Behaviour Science 2004, 89, 283–297. [Google Scholar] [CrossRef]
- Pugh, H.K.; Heatwole Shank, K.S. Multispecies Occupations Involving Equines: An Action-Oriented Inquiry to Inform Occupational Therapy Practitioners. OTJR: Occupational Therapy Journal of Research 2024, 44, 196–204. [Google Scholar] [CrossRef]
- Souris, A.C.; Kaczensky, P.; Julliard, R.; Walzer, C. Time budget-, behavioral synchrony-and body score development of a newly released Przewalski’s horse group Equus ferus przewalskii, in the Great Gobi B strictly protected area in SW Mongolia. Applied Animal Behaviour Science 2007, 107, 307–321. [Google Scholar] [CrossRef] [PubMed]
- Mendonça, R.S.; Pinto, P.; Inoue, S.; Ringhofer, M.; Godinho, R.; Hirata, S. Social determinants of affiliation and cohesion in a population of feral horses. Applied Animal Behaviour Science 2021, 245, 105496. [Google Scholar] [CrossRef]
- Wolter, R.; Stefanski, V.; Krueger, K. Parameters for the analysis of social bonds in horses. Animals 2018, 8, 191. [Google Scholar] [CrossRef] [PubMed]
- Burke, C.J.; Whishaw, I.Q. Sniff, look and loop excursions as the unit of “exploration” in the horse (Equus ferus caballis) when free or under saddle in an equestrian arena. Behavioural Processes 2020, 173, 104065. [Google Scholar] [CrossRef]
- Pawluski, J.; Jego, P.; Henry, S.; Bruchet, A.; Palme, R.; Coste, C.; Hausberger, M. Low plasma cortisol and fecal cortisol metabolite measures as indicators of compromised welfare in domestic horses (Equus caballus). PLoS One 2017, 12, e0182257. [Google Scholar] [CrossRef] [PubMed]
- Nuñez, C.M.; Adelman, J.S.; Smith, J.; Gesquiere, L.R.; Rubenstein, D.I. Linking social environment and stress physiology in feral mares (Equus caballus): Group transfers elevate fecal cortisol levels. General and Comparative Endocrinology 2014, 196, 26–33. [Google Scholar] [CrossRef] [PubMed]
- Sauveroche, M.; Henriksson, J.; Theodorsson, E.; Holm, A.C.S.; Roth, L.S. Hair cortisol in horses (Equus caballus) in relation to management regimes, personality, and breed. Journal of Veterinary Behavior 2020, 37, 1–7. [Google Scholar] [CrossRef]
- Heimbürge, S.; Kanitz, E.; Otten, W. The use of hair cortisol for the assessment of stress in animals. General and Comparative Endocrinology 2019, 270, 10–17. [Google Scholar] [CrossRef] [PubMed]
- Johnson, R.A.; Johnson, P.J.; Megarani, D.V.; Patel, S.D.; Yaglom, H.D.; Osterlind, S.; Grindler, K.; Vogelweid, C.M.; Parker, T.M.; Pascua, C.K.; et al. Horses working in therapeutic riding programs: Cortisol, adrenocorticotropic hormone, glucose, and behavior stress indicators. Journal of Equine Veterinary Science 2017, 57, 77–85. [Google Scholar] [CrossRef]
- SGORBINI, M.; Bonelli, F.; Rota, A.; Aurich, C.; Ille, N.; Camillo, F.; Panzani, D. Determination of salivary cortisol in donkey stallions 2018.
- Sikorska, U.; Maśko, M.; Ciesielska, A.; Zdrojkowski; Domino, M. Role of Cortisol in Horse’s Welfare and Health. Agriculture 2023, 13, 2219. [Google Scholar] [CrossRef]
- Dai, F.; Cogi, N.H.; Heinzl, E.U.L.; Dalla Costa, E.; Canali, E.; Minero, M. Validation of a fear test in sport horses using infrared thermography. Journal of veterinary behavior 2015, 10, 128–136. [Google Scholar]
- Forkman, B.; Boissy, A.; Meunier-Salaün, M.C.; Canali, E.; Jones, R.B. A critical review of fear tests used on cattle, pigs, sheep, poultry and horses. Physiology & behavior 2007, 92, 340–374. [Google Scholar]
- McDonnell, S.M. Reproductive behavior of stallions and mares: comparison of free-running and domestic in-hand breeding. Animal reproduction science 2000, 60, 211–219. [Google Scholar] [CrossRef] [PubMed]
- Crowell-Davis, S.L. Sexual behavior of mares. Hormones and behavior 2007, 52, 12–17. [Google Scholar] [CrossRef]
- Stout, T. Modulating reproductive activity in stallions: a review. Animal reproduction science 2005, 89, 93–103. [Google Scholar] [CrossRef] [PubMed]
- Boyd, L.E. Time budgets of adult Przewalski horses: effects of sex, reproductive status and enclosure. Applied Animal Behaviour Science 1988, 21, 19–39. [Google Scholar] [CrossRef]
- McDonnell, S.M.; Poulin, A. Equid play ethogram. Applied Animal Behaviour Science 2002, 78, 263–290. [Google Scholar] [CrossRef]
- Torcivia, C.; McDonnell, S. Equine discomfort ethogram. Animals 2021, 11, 1–19. [Google Scholar] [CrossRef]
- Ludewig, A.; Gauly, M.; von Borstel, U.K. Effect of shortened reins on rein tension, stress and discomfort behavior in dressage horses. Journal of Veterinary Behavior: Clinical Applications and Research 2013, 2, e15–e16. [Google Scholar] [CrossRef]
- Arnold, G.; Grassia, A. Ethogram of agonistic behaviour for thoroughbred horses. Applied Animal Ethology 1982, 8, 5–25. [Google Scholar] [CrossRef]
- McDonnell, S.; Haviland, J. Agonistic ethogram of the equid bachelor band. Applied Animal Behaviour Science 1995, 43, 147–188. [Google Scholar] [CrossRef]
- Fureix, C.; Jego, P.; Henry, S.; Lansade, L.; Hausberger, M. Towards an Ethological Animal Model of Depression? A Study on Horses. PLOS ONE 2012, 7, 1–9. [Google Scholar] [CrossRef]
- Blois-Heulin, C.; Rochais, C.; Camus, S.; Fureix, C.; Lemasson, A.; Lunel, C.; Bezard, E.; Hausberger, M. Animal welfare: could adult play be a false friend? Animal Behavior and Cognition 2015, 2, 156–185. [Google Scholar]
- Hausberger, M.; Fureix, C.; Bourjade, M.; Wessel-Robert, S.; Richard-Yris, M.A. On the significance of adult play: what does social play tell us about adult horse welfare? Naturwissenschaften 2012, 99, 291–302. [Google Scholar] [CrossRef] [PubMed]
- Sigurjónsdóttir, H.; Haraldsson, H. Significance of group composition for the welfare of pastured horses. Animals 2019, 9, 14. [Google Scholar] [CrossRef]
- VanDierendonck, M.C.; Spruijt, B.M. Coping in groups of domestic horses–Review from a social and neurobiological perspective. Applied animal behaviour science 2012, 138, 194–202. [Google Scholar] [CrossRef]
- Cameron, E.Z.; Linklater, W.L.; Stafford, K.J.; Minot, E.O. Social grouping and maternal behaviour in feral horses (Equus caballus): the influence of males on maternal protectiveness. Behavioral Ecology and Sociobiology 2003, 53, 92–101. [Google Scholar] [CrossRef]
- Broomé, S.; Feighelstein, M.; Zamansky, A.; Carreira Lencioni, G.; Haubro Andersen, P.; Pessanha, F.; Mahmoud, M.; Kjellström, H.; Salah, A.A. Going deeper than tracking: A survey of computer-vision based recognition of animal pain and emotions. International Journal of Computer Vision 2023, 131, 572–590. [Google Scholar] [CrossRef]
- Gleerup, K.B.; Forkman, B.; Lindegaard, C.; Andersen, P.H. An equine pain face. Veterinary anaesthesia and analgesia 2015, 42, 103–114. [Google Scholar] [CrossRef] [PubMed]
- Dalla Costa, E.; Minero, M.; Lebelt, D.; Stucke, D.; Canali, E.; Leach, M.C. Development of the Horse Grimace Scale (HGS) as a pain assessment tool in horses undergoing routine castration. PLoS one 2014, 9, e92281. [Google Scholar] [CrossRef]
- Coneglian, M.M.; Borges, T.D.; Weber, S.H.; Bertagnon, H.G.; Michelotto, P.V. Use of the horse grimace scale to identify and quantify pain due to dental disorders in horses. Applied Animal Behaviour Science 2020, 225, 104970. [Google Scholar]
- Andersen, P.H.; Broomé, S.; Rashid, M.; Lundblad, J.; Ask, K.; Li, Z.; Hernlund, E.; Rhodin, M.; Kjellström, H. Towards machine recognition of facial expressions of pain in horses. Animals 2021, 11, 1643. [Google Scholar] [CrossRef] [PubMed]
- Andersen, P.H.; Gleerup, K.; Wathan, J.; Coles, B.; Kjellström, H.; Broomé, S.; Lee, Y.; Rashid, M.; Sonder, C.; Resenberg, E.; et al. Can a machine learn to see horse pain? An interdisciplinary approach towards automated decoding of facial expressions of pain in the horse. Proc Meas Behav 2018, 6–8. [Google Scholar]
- Feighelstein, M.; Riccie-Bonot, C.; Hasan, H.; Weinberg, H.; Rettig, T.; Segal, M.; Distelfeld, T.; Shimshoni, I.; Mills, D.S.; Zamansky, A. Automated recognition of emotional states of horses from facial expressions. Plos one 2024, 19, e0302893. [Google Scholar] [CrossRef] [PubMed]
- Lencioni, G.C.; de Sousa, R.V.; de Souza Sardinha, E.J.; Corrêa, R.R.; Zanella, A.J. Pain assessment in horses using automatic facial expression recognition through deep learning-based modeling. PloS one 2021, 16, e0258672. [Google Scholar] [CrossRef]
- Chiavaccini, L.; Gupta, A.; Chiavaccini, G. From facial expressions to algorithms: a narrative review of animal pain recognition technologies. Frontiers in Veterinary Science 2024, 11, 1436795. [Google Scholar] [CrossRef]
- Lund, S.M.; Nielsen, J.; Gammelgård, F.; Nielsen, M.G.; Jensen, T.H.; Pertoldi, C. Behavioral Coding of Captive African Elephants (Loxodonta africana): Utilizing DeepLabCut and Create ML for Nocturnal Activity Tracking. Animals 2024, 14, 2820. [Google Scholar] [CrossRef] [PubMed]
- Blumrosen, G.; Hawellek, D.; Pesaran, B. Towards automated recognition of facial expressions in animal models. In Proceedings of the Proceedings of the IEEE International Conferenceon Computer Vision Workshops,2017, pp.2810–2819.
- Corujo, L.A.; Kieson, E.; Schloesser, T.; Gloor, P.A. Emotion recognition in horses with convolutional neural networks. Future Internet 2021, 13, 250. [Google Scholar] [CrossRef]
- Ashley, F.; Waterman-Pearson, A.; Whay, H. Behavioural assessment of pain in horses and donkeys: application to clinical practice and future studies. Equine veterinary journal 2005, 37, 565–575. [Google Scholar]
- Leiner, L.; Fendt, M. Behavioural fear and heart rate responses of horses after exposure to novel objects: Effects of habituation. Applied Animal Behaviour Science 2011, 131, 104–109. [Google Scholar] [CrossRef]
- Yaseen, M. What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detecto. A: is YOLOv8, 2024; arXiv:cs.CV/2408.15857]. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality Reduction by Learning an Invariant Mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2; 2006; pp. 1735–1742. [Google Scholar] [CrossRef]
- van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. CoRR, 2018. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE conference on computer vision and pattern recognition. Ieee; 2009; pp. 248–255. [Google Scholar]
- Assiri, Y. Stochastic Optimization of Plain Convolutional Neural Networks with Simple methods. CoRR 2020. [Google Scholar]
- Ali, J. MNIST-SOPCNN. https://github.com/junaidaliop/MNIST-SOPCNN, 2024. Accessed: 2024-12-01.
- Image Classification on MNIST. https://paperswithcode.com/sota/image-classification-on-mnist, 2024. Accessed: 2024-12-01.
- Hoemann, K.; Lee, Y.; Kuppens, P.; Gendron, M.; Boyd, R.L. Emotional granularity is associated with daily experiential diversity. Affective Science 2023, 4, 291–306. [Google Scholar] [CrossRef]
- Hoxhallari, K.; Purcell, W.; Neubauer, T. , The potential of Explainable Artificial Intelligence in Precision Livestock Farming; 2022; p. 710.
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems 2020, 33, 9912–9924. [Google Scholar]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).