2. Materials and Methods
From the time each cow spends on these activities, an hourly activity level is calculated [
22], resulting in datasets continuously collected by sensors monitoring cows’ primary activities (resting, eating, and ruminating). The Fast Fourier Transform (FFT) is then applied with two thresholds to establish three stress levels. Finally, a Long Short-Term Memory (LSTM) neural network model is implemented to predict stress levels using the cows’ basic activities, as illustrated in
Figure 1.
Table I lists the materials and tools used in this project, which are essential for data collection. Accurate classification of behaviors such as rumination and resting through various sensor types is fundamental for these systems [
11].
This forms the raw input vector of the time series (Equation 1)
Where each vector xᵢ ∈ ℝⁿ contains: [feeding, resting, ruminating, time] as hourly values.
2.1. Data Collection
The Nedap CowControl system [
12] was deployed in conjunction with Nedap SmartTag collars, wearable devices equipped with ultra-wideband (UWB) signal-emitting tags. These signals were captured by ceiling-mounted antennas in the barn, enabling continuous, real-time monitoring of individual cattle activity[
23,
24]. The system accurately recorded three core behavioral states: feeding, resting, and rumination periods (see
Figure 2).
2.2. Activity Level Calculation
The collected data includes the time each cow spent on each of the three activities per hour: eating, resting, and ruminating. A period for miscellaneous activities labeled as “other” (infrequent activities such as trotting, walking, etc.) was also added, following the approach proposed by [
11]. However, in this case, correspondence factor analysis (CFA) [
25] was replaced by principal component analysis (PCA) [
26].
The weights assigned to each activity were determined through a preliminary analysis of the data corresponding to the first 14 days of the observation period. This analysis allowed for establishing the relative influence of each activity on the activity level. The application of expression (1) revealed characteristic behavior over 24-hour periods, highlighting low activity levels during nighttime hours and peaks of high activity during the day. These results align with typical bovine behavioral patterns, which exhibit increased activity during daylight hours and a notable reduction at night, when activities related to rest predominate, See
Figure 3. .
2.3. Circadian Rhythm
For data analysis, we worked with 36-hour time series, extracted using a sliding window with a 1-hour shift. This approach allowed for a continuous and detailed traversal of the data. Following the methodology of [
27], the data obtained from a single cow over a 30-day period generated 685 time series of 36 hours each, calculated according to the formula: Number of series= (30 days x 24h / day - 35h) =685.
As shown in
Figure 4, each of these 36-hour time series was further divided into two 24-hour subseries (A and B), offset by 12 hours from each other. This division enables a more detailed capture of day-night activity fluctuations. Subsequently, the Fast Fourier Transform (FFT) was applied to convert the temporal data into the frequency domain, as defined by equation (2).
Figure 4.
Example of cycles A and B within a 36-hour time series. Solid line: Original activity level. Dashed lines: Reconstructed cycles after applying the Fast Fourier Transform (FFT).
Figure 4.
Example of cycles A and B within a 36-hour time series. Solid line: Original activity level. Dashed lines: Reconstructed cycles after applying the Fast Fourier Transform (FFT).
The dominant frequency f
∗ is identified through (3)
The inverse Fast Fourier Transform (IFFT) reconstructs the denoised signal through (4)
The processing of these series is performed by applying the FFT in three steps:
Transformation to the frequency domain: The original signal is decomposed into its spectral components, facilitating the identification of harmonics associated with the activity level.
Harmonic filtering: Specific harmonics corresponding to the circadian range were selected, excluding irrelevant frequencies such as noise or signals outside the study’s scope.
Signal reconstruction: The Inverse Fast Fourier Transform (IFFT) was applied to reconstruct a refined version of the signal in the time domain, preserving only essential circadian characteristics.
This approach allows for an accurate reconstruction of the cows’ circadian cycle, capturing fundamental patterns necessary to evaluate behavioral dynamics and variations. These patterns are essential for analyzing potential correlations with stress levels and other factors of interest in cattle.
2.4. Detection of Changes and Stress Level Labeling
To detect deviations in the circadian cycle, the extracted cycles from 36-hour subseries were synchronized, and the Euclidean distance was computed over overlapping sections, following [
11]. An increase in the Euclidean distance reflects activity pattern disruptions, which are associated with stress or physiological alterations [
25].
Thresholds for classifying stress into three levels (normal, mild, high) were defined using a statistical approach based on the distribution of Euclidean distances:
The first threshold (normal to mild) was set at the mean plus one standard deviation, capturing moderate deviations.
The second threshold (mild to high) was fixed at the mean plus two squared standard deviations, identifying extreme deviations.
This method, adapted from [
13], ensures robust segmentation of stress levels, aligned with observed variability in circadian data.
If A and B represent two 24-hour temporal windows, their dissimilarity can be quantified using Euclidean distances (Equation 5):
Threshold1 =μ+σ
Threshold 2 =μ+2σ2
2.5. Training and Testing
From data collection to circadian cycle extraction and stress level labeling, all data were processed as a unified pipeline to ensure continuous and sequential computation of activity levels and detection of temporal changes. The dataset spanned a 12-month period (January–December 2024). To structure the analysis:
Training set: First eight months (January–August) were used for model training.
Test set: Remaining four months (September–December) formed an independent evaluation period.
This temporal partitioning ensures model performance is assessed on data strictly excluded from training. Notably, data from each cow was collected and analyzed separately to account for individual behavioral variability.
The daily activity signal, recorded at regular intervals via accelerometers embedded in collars, was transformed using the Fast Fourier Transform (FFT). This signal captures 24-hour movement patterns and identifies frequency components linked to circadian periodicity. The FFT-derived amplitude and phase of activity rhythms serve as input features for machine learning models.
2.6. Algorithm Selection
A total of 52 articles on artificial intelligence applications in livestock farming were analyzed [
28], including studies on:
Thermal stress prediction [
19], classification and monitoring of stress factors [
6,
29]machine learning for disease detection[
14].
Other relevant work leveraged algorithms such as support vector machines (SVM) and tree ensembles to predict general livestock activities [
13]. These findings were applied in a computer engineering thesis [
30], which conducted a comparative evaluation of eight algorithms based on performance and feasibility:
CNN, DNN, LSTM, Random Forest, KNN, SVM, and XGBoost.
The top-performing models were CNN (accuracy: 0.8986), LSTM (0.8985), and DNN (0.8985).
The LSTM architecture was ultimately selected due to its superior ability to model temporal sequences and capture long-term data patterns as a critical feature for predicting cattle stress levels. Unlike CNN (which prioritizes spatial feature extraction) or DNN (which may lose relevant temporal information), LSTM excels at detecting gradual changes and circadian cycle variations. This advantage makes it particularly suited for identifying livestock behavioral fluctuations, enabling more accurate and robust stress predictions based on historical activity patterns and trends [
5]
2.7. Model Construction
The model construction involved several fundamental steps to ensure robust and efficient performance. First, an appropriate working environment was configured. Libraries such as TensorFlow were used for model construction and training, along with pandas and NumPy for data handling, and scikit-learn for preprocessing tasks and class imbalance treatment. Additionally, a random seed was set to ensure the reproducibility of results. Data were loaded with CSV files, and stress level labels were mapped to numerical values.
Subsequently, feature and label separation were performed for the training and test datasets. The features, representing time periods dedicated to cows’ basic activities (eating, resting, ruminating, and other activities), were stored in separate matrices (X_train and X_test). The labels, corresponding to stress levels, were separated into y_train and y_test. This step was essential for preparing the data for modeling and analysis. Concurrently, input features were standardized to optimize model efficiency and convergence during training.
For complete daily behavior, the data were expanded and structured into 24-hour windows to match the expected input format of the LSTM model.
To address class imbalance in the training set, class weights were calculated to adjust the importance of each category, preventing bias toward majority classes. Furthermore, stress level labels were converted to categorical format, enabling the model to process them as binary vectors in the multiclass classification task.
The LSTM model was designed to analyze temporal sequences and extract complex behavioral patterns from the data [
15]. In this study, it was implemented as a sequential network consisting of:
An LSTM layer with 144 units, using the ReLU activation function to handle nonlinear relationships. This layer processes temporal dependencies in the activity sequences.
An intermediate dense layer with 36 units, which reduced dimensionality and refined the learned representations.
An output layer with 3 units applying the SoftMax activation, suitable for multiclass classification, generating probabilities associated with each stress level: normal, mild, and high.
The model was trained using the training dataset for 100 epochs with a batch size of 36. These values were determined through experimental testing, balancing model convergence and overfitting. The 100-epoch limit was chosen because it allowed effective learning without performance degradation on validation data. Similarly, a batch size of 36 was selected to ensure gradient stability and computational efficiency during training. Test data were used for validation during training to monitor overfitting across iterations. Additionally, class weights were incorporated to counteract stress class imbalances. This approach enabled the model to learn patterns reflecting temporal dynamics in the data without significant bias toward any specific category.
This study focuses on identifying physiological and behavioral stress arising primarily from intensive management practices and environmental mismatches, such as handling stress and thermal stress. To validate stress states, a triangulation of evidence was used:
Expert veterinary observations of anomalous behaviors,
Environmental records (temperature and relative humidity) to calculate the Temperature-Humidity Index (THI),
Comparison with expected circadian activity patterns.
The combination of these data sources helped establish reliable labels for training and validating machine learning-based prediction models.
Input sequence: Xt
LSTM output: ht=LSTM(Xt)
Combination with additional features (FFT): ut=concat(ht,zt)
2.8. Algorithm Performance
To evaluate the model, predictions were generated on the test set using the model. predict method, which provides estimated probabilities for each stress level category. These probabilities were then used to determine the most likely category for each instance through the np.argmax function. Subsequently, occurrences of each actual and predicted category were counted. These counts allowed for observation of the distribution of both true and predicted labels, providing a preliminary analysis of the model’s behavior in classifying stress levels.
To assess the necessity of a complex model like LSTM, its performance was compared against a baseline logistic regression model. The logistic regression, trained on the same features (eating, resting, and ruminating activities), achieved an accuracy of 65%, significantly lower than the LSTM model’s 80%. This difference highlights LSTM’s ability to capture temporal dependencies in circadian data, justifying its use in this context over simpler models that fail to model the temporal dynamics of bovine behavior.
Model evaluation was conducted using key performance metrics:
Accuracy measures the model’s ability to correctly classify stress levels (normal, mild, and high) from cattle activity data, achieving values above 80%.
Area Under the Curve (AUC) provided additional insight, with values approaching 0.84, ensuring a more detailed performance analysis that accounts for class imbalance and the model’s capacity to distinguish between stress levels.
This comprehensive evaluation demonstrates the LSTM model’s superior capability in handling the temporal patterns inherent in behavioral stress assessment.