Causal-Aware Time Series Regression for IoT Energy Consumption Using Structured Attention and LSTM Networks

Chang Liu; Qi Wang; Litong Song; Xin Hu

doi:10.20944/preprints202509.1134.v1

Submitted:

12 September 2025

Posted:

15 September 2025

You are already at the latest version

Abstract

This study focuses on energy consumption modeling for IoT devices and proposes a regression prediction method that integrates causal features with Long Short-Term Memory (LSTM) networks. The method introduces a structured causal attention mechanism to model the influence of external factors on energy consumption. It combines this with LSTM to dynamically learn temporal dependencies, forming a structure-enhanced prediction framework. At the input layer, the model integrates multiple features such as device operating states, transmission parameters, and task identifiers. A causal mask guides the computation of attention weights to enhance the model’s sensitivity to key variables. At the prediction layer, a fusion mechanism combines the LSTM outputs with the weighted causal features as input to the regression module, producing the final energy consumption predictions. The experiments are conducted on a real industrial IoT dataset. Performance is evaluated under different network depths, training data proportions, and environmental fluctuations through sensitivity analysis. Results show that the proposed model outperforms traditional baseline models on metrics including RMSE, MAE, and MAPE. It demonstrates higher accuracy and robustness. The study also shows that the causal modeling mechanism plays a key role in improving model stability and enhancing the ability to capture complex dependencies. The method is suitable for predicting energy consumption patterns across diverse devices and nonlinear usage behaviors.

Keywords:

Causal modeling

;

LSTM network

;

energy consumption prediction

;

time series regression

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

I. Introduction

The rapid development of Internet of Things (IoT) technologies is reshaping operational paradigms across various industries. From smart manufacturing and smart cities to home automation, billions of intelligent devices are interconnected through sensing, communication, and computing capabilities. As the number of devices continues to grow, managing and predicting their energy consumption has become an urgent challenge[1]. Effective energy management not only relates to economic efficiency and environmental sustainability but also directly impacts the scalability and long-term viability of IoT systems. In resource-constrained edge environments in particular, accurate energy consumption forecasting supports system design, task scheduling, and power allocation. It is a fundamental step toward realizing the vision of “green IoT.”

Traditional energy modeling approaches mainly rely on physical models or empirical formulas[2]. While these methods are effective in specific domains, they often fail to adapt to the complexity and variability of IoT applications. Energy consumption in IoT devices is influenced by multiple factors, including task load, operating status, transmission frequency, external environmental conditions, and inter-device coordination. These factors often exhibit nonlinear and temporal dependencies, making modeling highly challenging. Additionally, the heterogeneous and distributed nature of IoT systems results in data that is multimodal, high-dimensional, and time-dependent. These characteristics significantly increase the difficulty of predictive model design. Modeling from a single perspective cannot effectively capture the dynamic interactions among these complex features[3].

In recent years, data-driven methods have shown growing advantages in energy consumption modeling. Among them, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network (RNN), have demonstrated strong performance in handling time series data, making them a popular tool for IoT energy modeling. LSTM networks can capture long-term dependencies in historical states, enabling dynamic modeling of device energy use[4,5]. However, most existing LSTM-based methods primarily focus on intrinsic temporal patterns, often overlooking the integration and modeling of external causal features[6]. In IoT scenarios, many variables affecting energy use have explicit causal structures. For example, task type directly influences CPU load, and network conditions affect transmission power. Failure to extract and model these causal factors can limit a model’s generalization ability and interpretability, reducing its practical value in deployment.

Incorporating causal features into modeling can enhance the model’s structural understanding of input variables and improve the robustness and controllability of energy forecasting. In complex systems, causal relationships provide reasoning paths that help models maintain effectiveness under anomalies, interventions, or deployment shifts. For IoT energy forecasting, causal features act as a structural enhancement mechanism. They help reduce data redundancy, improve sensitivity to key variables, and mitigate issues such as overfitting or excessive reliance on historical data. Additionally, causal modeling improves the interpretability of predictions. This allows system developers to better understand the model’s decision-making process and design more effective energy control strategies[7].

Against this background, exploring energy forecasting methods that combine LSTM architectures with causal features is a promising direction. It complements existing time series modeling approaches and supports the development of intelligent energy management systems in IoT. This approach integrates the modeling capacity of deep learning with the structural advantages of causal reasoning. It holds strong theoretical significance and practical potential. As edge computing, low-power communication protocols, and intelligent sensing devices continue to evolve, energy forecasting models will play an increasingly critical role. They will support decision-making in smart cities, industrial automation, and smart homes. Therefore, research into energy prediction based on LSTM and causal features not only aligns with the trend of AI-IoT integration but also lays a solid foundation for efficient, reliable, and interpretable energy management.

II. Related Work and Foundation

Methodological innovations in time series modeling and feature integration have substantially advanced energy prediction in complex systems. The use of bidirectional LSTM networks combined with multi-scale attention mechanisms offers a robust foundation for capturing temporal dependencies and selectively focusing on key input features—an approach that directly motivates our own structure-enhanced regression framework [7]. Multiscale temporal modeling techniques, especially those employing deep learning for anomaly detection, further illustrate how hierarchical temporal representations can boost model robustness and sensitivity [8]. Similarly, reinforcement learning-driven sequence optimization strategies underscore the importance of dynamic and adaptive modeling in sequential decision tasks [9].

Advances in deep representation learning, particularly the fusion of transformer and CNN architectures, have enabled more expressive and discriminative modeling by integrating diverse feature representations [10]. Cross-modal feature fusion using CNN-transformer designs presents additional pathways for uniting different input modalities and improving regression performance in high-dimensional settings [11]. Furthermore, integrating graph neural networks with transformers has facilitated unsupervised structure discovery and advanced attention-based modeling, which are highly relevant for capturing both sequential and relational dependencies in IoT data [12]. Context-guided dynamic retrieval mechanisms, driven by attention and retrieval cues, provide methodological support for flexible feature selection and dynamic model input, aligning with our fusion of causal and temporal features [13].

Interpretability and structural reasoning are increasingly emphasized in deep regression tasks. Semantic and structural analysis methods based on interpretable models offer technical strategies for integrating causal reasoning and enhancing model transparency [14]. The selective injection of knowledge through adapter modules demonstrates how dynamic feature selection and robust module integration can enhance adaptation, echoing our approach to causal feature integration [15]. Structural priors, including contrastive alignment and structural guidance, have proven valuable for guiding deep models towards more stable and generalizable representations [16].

The development of adaptive scheduling and orchestration methods, particularly through multi-agent and reinforcement learning frameworks, further expands the methodological toolbox for robust, distributed prediction in resource-constrained environments [17,18]. Finally, structural reconfiguration and parameter-efficient fine-tuning offer valuable approaches for optimizing network structure and model adaptation without excessive parameter overhead [19].

Together, these methodological developments across temporal modeling, attention and fusion, causal and structural reasoning, and adaptive optimization form the basis for our causal-attention LSTM framework, enabling robust, accurate, and interpretable energy consumption prediction for IoT devices.

III. Proposed Approach

This research method focuses on integrating the powerful long short-term memory (LSTM) network with structurally instructive causal features to improve the prediction performance of IoT device energy consumption. First, an input sequence

X = {x_{1}, x_{2}, . ., x_{T}}

is assumed, where each item

x_{t} \in R^{d}

represents a multidimensional feature vector collected at time step t. This feature vector consists of two components: traditional historical state variables, such as CPU usage, temperature, and humidity, and channel quality; and external factors with causal inference value, such as task type and resource scheduling flags. The overall model architecture is shown in Figure 1.

To model temporal dependencies, the input sequence is first fed into the LSTM unit, whose internal recursive structure can be expressed as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(1)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(2)

{\tilde{c}}_{t} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(3)

c_{t} = f_{t} \otimes c_{t - 1} + i_{t} \otimes {\tilde{c}}_{t}

(4)

h_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) \otimes \tanh (c_{t})

(5)

In the above formula,

f_{t}, i_{t}, o_{t}

represents the activation vectors of the forget gate, input gate, and output gate, respectively;

c_{t}

represents the memory cell state;

h_{t}

represents the hidden state output at the current time step;

σ (\cdot)

represents the sigmoid function; and

\otimes

represents the Hadamard product. This structure effectively extracts short-term and long-term dependencies in time series processes, establishing an information foundation for energy consumption prediction.

To further integrate causal information, this study introduces a structure-guided causal attention mechanism to perform weighted modeling of different causal features [20]. Specifically, a causal mask matrix

M \in {0, 1}^{d \times d}

is constructed to indicate which input dimensions have explicit causal relationships. Based on this, a causal weight vector

α_{t} \in R^{d}

is defined, which is calculated as follows:

α_{t} = softmax (M \cdot \tanh (W_{α} \cdot x_{t} + b_{α}))

(6)

This mechanism can guide the model to prioritize feature dimensions with causal relationships, enhancing the rationality and interpretability of feature selection. Ultimately, the final regression prediction result is obtained by fusing the weighted causal features with the LSTM output:

y_{t} = W_{y} \cdot [h_{t} | | (α_{t} \otimes x_{t})] + b_{y}

(7)

Here,

| |

represents the feature concatenation operation, and

{\overset{⌢}{y}}_{t} \in R

is the prediction of the energy consumption value corresponding to time step t. This fusion strategy enables the model to possess both temporal memory and causal perception, enabling more stable and accurate regression output in a multi-source heterogeneous IoT data environment.

To optimize the model learning process, the objective function is defined as the standard mean square error (MSE) loss function to measure the difference between the predicted value and the actual energy consumption value. Its expression is as follows:

L_{M S E} = \frac{1}{T} \sum_{t = 1}^{T} {(y_{t} - {\overset{⌢}{y}}_{t})}^{2}

(8)

Where

y_{t}

is the true value, and

{\overset{⌢}{y}}_{t}

is the predicted value. This loss function effectively guides the model to minimize bias and improve prediction accuracy. The overall method framework offers excellent scalability and modularity, allowing for flexible adaptation to different types of IoT devices and task scenarios, providing theoretical and model support for practical deployment.

IV. Performance Evaluation

A. Dataset

This study employs the Edge-IIoTset dataset, which is derived from a diverse range of edge IoT devices deployed in industrial environments. The dataset encompasses comprehensive information on device states and operational characteristics, offering a representative view of energy usage patterns in real-world industrial contexts. The data sources span several typical industrial IoT scenarios, including temperature regulation systems, motor drivers, sensor gateways, and networked devices. Each record within the dataset consists of multiple feature variables, forming a multivariate time series structure. The features include CPU utilization, memory usage, device workload, voltage and current states, sensing frequency, network communication frequency, and timestamps, among others. This multidimensional configuration provides a robust basis for developing predictive models of energy consumption. Furthermore, each time slice is annotated with an actual measured energy consumption value, which serves as the target variable for regression tasks.

The dataset exhibits temporal continuity, device heterogeneity, and diverse feature types. These characteristics align with the modeling complexity and generalization requirements in IoT scenarios. Modeling analysis based on this dataset can help validate the prediction capability of algorithms in multi-device and multi-task environments. It also supports future applications in intelligent edge scheduling and energy efficiency optimization.

B. Experimental Results

This paper first conducts a comparative experiment, and the experimental results are shown in Table 1.

Overall, the proposed energy consumption prediction model, which integrates causal features with the LSTM architecture, outperforms existing time-series modeling methods across all evaluation metrics. Compared with Attention-LSTM, the proposed model reduces RMSE and MAE by 1.41 and 1.16, respectively. This indicates a clear advantage in error control. The results suggest that the introduction of causal features helps alleviate the problem of feature redundancy often associated with traditional attention mechanisms. This leads to improved accuracy and stability in energy consumption modeling.

Further comparisons were made with BI-LSTM and CNN-LSTM, two widely used deep learning architectures for time-series modeling. BI-LSTM focuses on capturing bidirectional dependencies, while CNN-LSTM emphasizes local pattern extraction. However, both models cannot effectively model external causal information. This limitation affects their performance when dealing with heterogeneous inputs and complex dependencies in IoT energy prediction tasks. In contrast, the proposed model improves RMSE and MAPE by approximately 1.17 and 2.27 percentage points, respectively. These results validate the effectiveness of the structure-enhanced design.

It is worth noting that the improvement in the MAPE metric is particularly significant. The proposed model reduces MAPE from 11.92 percent in BI-LSTM and 11.34 percent in CNN-LSTM to 9.07 percent. Since MAPE is sensitive to abnormal fluctuations and small values, this result demonstrates that the model maintains strong robustness under low-power device conditions and sudden task scenarios in IoT environments. Such robustness is critical for energy management in practical deployment and can support more efficient decisions in resource allocation and task scheduling.

In essence, by integrating a causal attention mechanism with the LSTM structure, the proposed model significantly enhances its modeling capabilities for time-series data. Moreover, it improves the identification and utilization of crucial causal features, resulting in more interpretable and generalizable energy consumption predictions. The proposed fusion strategy exemplifies the power of combining deep learning with structural knowledge. It presents a novel paradigm for energy modeling in intricate IoT systems, offering both theoretical and practical insights for future model development.

This paper analyzes the impact of different numbers of LSTM layers on prediction performance, and the experimental results are shown in Figure 2.

The results in demonstrate that increasing the number of LSTM layers initially enhances the model’s prediction performance, but excessive depth leads to diminishing returns. The model achieves optimal accuracy with three LSTM layers, obtaining RMSE, MAE, and MAPE values of 4.91, 3.98, and 9.07, respectively, indicating a well-balanced capacity to capture temporal dependencies and nonlinear patterns. Shallower architectures with one or two layers show higher prediction errors due to limited modeling depth, while deeper structures with four layers experience slight accuracy declines, likely caused by overfitting or gradient vanishing. This is particularly evident in the increased MAPE value, reflecting reduced robustness to outliers and small fluctuations. These findings underscore the importance of appropriate model depth in causal-enhanced time-series forecasting, confirming that a three-layer LSTM strikes the best balance between accuracy and stability for IoT energy consumption prediction. This provides valuable guidance for model selection and deployment in similar applications.

The results in Figure 3 presented in the figure indicate that increasing the proportion of training samples substantially improves the model’s prediction performance, as evidenced by consistently decreasing RMSE, MAE, and MAPE values when the training set ratio rises from 40% to 80%. This demonstrates that a larger training set enhances both the model’s fitting ability and generalization. However, once the training sample ratio exceeds 80%, error reduction becomes marginal, with RMSE and MAE stabilizing around 4.89 and 3.95, and MAPE remaining below 9.01, suggesting the model has reached a learning saturation point. Notably, the sharp decline in MAPE—from 12.82% to approximately 9%—reflects the model’s increased stability and improved sensitivity to key structural features as more data become available. Overall, these findings validate the proposed LSTM model’s robustness and scalability, showing it can deliver accurate and stable predictions even with limited data, making it well-suited for practical IoT energy management scenarios.

V. Conclusion

This paper focuses on energy consumption prediction for Internet of Things (IoT) devices and proposes a deep learning model that integrates causal features with an LSTM architecture. The method combines the dynamic nature of time-series modeling with the structural guidance of causal attention mechanisms. It demonstrates strong predictive performance in complex and dynamic IoT scenarios. By constructing an end-to-end learning framework, the model captures deep interactions between historical states and external causal variables. This enables accurate modeling and efficient regression of device energy consumption. The proposed approach addresses limitations of traditional time-series models in feature selection and interpretability.. In a series of comparative experiments and sensitivity analyses, the proposed method outperforms existing approaches on key regression metrics including RMSE, MAE, and MAPE. These results confirm that integrating causal features significantly improves model robustness and generalization. The model also shows good adaptability to variations in training data size, structural parameters, and environmental conditions. It maintains high prediction accuracy even under data scarcity or heterogeneous device distributions. This highlights the practical value of the method in industrial and edge computing scenarios.

From an application perspective, the results of this study can benefit several key areas, including smart manufacturing, edge deployment, and green energy management. In resource-constrained and task-distributed IoT environments, accurate energy consumption prediction supports dynamic scheduling and energy optimization. Moreover, causal feature modeling enhances system interpretability, making the decision-making process more transparent. This helps improve deployment security and system trustworthiness. The integration of deep learning with structural modeling provides a new direction for IoT energy prediction. It also offers solid support for intelligent sensing and energy-driven computing in future smart IoT systems.

Future research can extend this work in several directions. First, more advanced causal discovery mechanisms can be explored to improve the model’s ability to detect latent causal relationships. Second, the model can be integrated with federated learning or edge intelligence frameworks. This would enable cross-device energy modeling and shared optimization under data privacy constraints. Third, incorporating multimodal sensing data such as images, audio, and environmental states can further enrich the input and enhance both accuracy and robustness. In conclusion, this study presents an innovative solution for energy modeling in IoT systems and provides technical insights for the deep integration of artificial intelligence and the Internet of Things.

References

Alıoghlı, A.A.; Okay, F.Y. IoT-Based Energy Consumption Prediction Using Transformers. Gazi University Journal of Science Part A: Engineering and Innovation 2024, 11, 304–323. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Hawash, H.; Chakrabortty, R.K.; et al. Energy-net: a deep learning approach for smart energy management in IoT-based smart cities. IEEE Internet of Things Journal 2021, 8, 12422–12435. [Google Scholar] [CrossRef]
Natarajan, Y.; K. R., S.P.; Wadhwa, G.; Choi, Y.; Chen, Z.; Lee, D.-E.; Mi, Y. Enhancing building energy efficiency with IoT-driven hybrid deep learning models for accurate energy consumption prediction. Sustainability 2024, 16, 1925. [Google Scholar] [CrossRef]
Binbusayyis, A.; Sha, M. Energy consumption prediction using modified deep CNN-Bi LSTM with attention mechanism. Heliyon 2025, 11. [Google Scholar] [CrossRef] [PubMed]
Somu, N.; M. R., G.R.; Ramamritham, K. A deep learning framework for building energy consumption forecast. Renewable and Sustainable Energy Reviews 2021, 137, 110591. [Google Scholar] [CrossRef]
Ma, C.; Pan, S.; Cui, T.; et al. Energy consumption prediction for office buildings: Performance evaluation and application of ensemble machine learning techniques. Journal of Building Engineering 2025, 102, 112021. [Google Scholar] [CrossRef]
Yang, T.; Cheng, Y.; Ren, Y.; Lou, Y.; Wei, M.; Xin, H. A Deep Learning Framework for Sequence Mining with Bidirectional LSTM and Multi-Scale Attention. Proceedings of the 2025 2nd International Conference on Innovation Management and Information System, pp. 472-476, 2025.
Lian, L.; Li, Y.; Han, S.; Meng, R.; Wang, S.; Wang, M. Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services. arXiv e-prints arXiv:2508.14503, 2025. [CrossRef]
Zhang, X.; Wang, X.; Wang, X. A Reinforcement Learning-Driven Task Scheduling Algorithm for Multi-Tenant Distributed Systems. arXiv e-prints arXiv:2508.08525, 2025.
Wang, X.; Zhang, X.; Wang, X. Deep Skin Lesion Segmentation with Transformer-CNN Fusion: Toward Intelligent Skin Cancer Analysis. arXiv e-prints arXiv:2508.14509, 2025. [CrossRef]
Li, M.; Hao, R.; Shi, S.; Yu, Z.; He, Q.; Zhan, J. A CNN-Transformer Approach for Image-Text Multimodal Classification with Cross-Modal Feature Fusion. Proceedings of the 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), pp. 1182-1186, 2025.
Zi, Y.; Gong, M.; Xue, Z.; Zou, Y.; Qi, N.; Deng, Y. Graph Neural Network and Transformer Integration for Unsupervised System Anomaly Discovery. arXiv e-prints arXiv:2508.09401, 2025. [CrossRef]
He, J.; Liu, G.; Zhu, B.; Zhang, H.; Zheng, H.; Wang, X. Context-Guided Dynamic Retrieval for Improving Generation Quality in RAG Models. arXiv e-prints arXiv:2504.19436, 2025. [CrossRef]
Zhang, R.; Lian, L.; Qi, Z.; Liu, G. Semantic and Structural Analysis of Implicit Biases in Large Language Models: An Interpretable Approach. arXiv e-prints arXiv:2508.06155, 2025. [CrossRef]
Zheng, H.; Zhu, L.; Cui, W.; Pan, R.; Yan, X.; Xing, Y. Selective Knowledge Injection via Adapter Modules in Large-Scale Language Models. 2025. [CrossRef]
Gao, D. High Fidelity Text to Image Generation with Contrastive Alignment and Structural Guidance. arXiv e-prints arXiv:2508.10280, 2025. [CrossRef]
Zhang, R. AI-Driven Multi-Agent Scheduling and Service Quality Optimization in Microservice Systems. Transactions on Computational and Scientific Methods 2025, 5. [Google Scholar] [CrossRef]
Yao, G.; Liu, H.; Dai, L. Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters. arXiv e-prints arXiv:2508.10253, 2025.
Wu, Q. Task-Aware Structural Reconfiguration for Parameter-Efficient Fine-Tuning of LLMs. Journal of Computer Technology and Software 2024, 3. [Google Scholar] [CrossRef]
Hoendarto, G.; Saikhu, A.; Hari Ginardi, R. V. Bridging IoT devices and machine learning for predicting power consumption: case study Universitas Widya Dharma Pontianak. Energy Informatics 2025, 8, 87. [Google Scholar] [CrossRef]
Kim, S.; Kang, M. Financial series prediction using Attention LSTM. arXiv e-prints 2019, arXiv:1902.10877. [Google Scholar] [CrossRef]
Suebsombut, P.; Sekhari, A.; Sureephong, P.; et al. Field data forecasting using LSTM and Bi-LSTM approaches. Applied Sciences 2021, 11, 11820. [Google Scholar] [CrossRef]
Lu, W.; Li, J.; Li, Y.; et al. A CNN-LSTM-based model to forecast stock prices. Complexity 2020, 2020, 6622927. [Google Scholar] [CrossRef]

Figure 1. Overall model architecture.

Figure 2. Analysis of the impact of different LSTM layers on prediction performance.

Figure 3. Sensitivity experiment of changes in the training sample ratio on model performance.

Table 1. Comparative experimental results.

Model	RMSE	MAE	MAPE
Attention-LSTM[21]	6.32	5.14	12.85
BI-LSTM[22]	6.08	4.97	11.92
CNN-LSTM[23]	5.87	4.76	11.34
Ours	4.91	3.98	9.07

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.