Submitted:
26 July 2024
Posted:
30 July 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
2.1. Future Research Directions
- Integration with IoT: The integration of RL with IoT devices can enhance real-time data acquisition and decision-making processes in greenhouses. Future research should focus on developing seamless IoT-RL integration frameworks.
- Scalability: Research on scaling RL solutions to larger, more complex greenhouse systems is necessary to ensure widespread adoption. Studies should address computational challenges and the ability to handle large datasets.
- Interdisciplinary Approaches: Combining RL with other AI techniques, such as genetic algorithms and fuzzy logic, could yield more robust energy management solutions. Exploration of hybrid models that leverage the strengths of different AI paradigms is essential.
- Environmental Adaptability: Developing RL algorithms capable of adapting to diverse environmental conditions will be crucial for global applications. This includes designing algorithms that can learn and adapt to changing weather patterns, pest infestations, and other environmental variables.
- Economic Viability: Studies on the cost-effectiveness of RL implementations in greenhouses can drive commercial interest and investment. Future research should focus on performing cost-benefit analyses and developing business models that highlight the economic advantages of RL-based energy management systems.
- User-Friendly Interfaces: Developing user-friendly interfaces and control systems for greenhouse operators is vital for the practical implementation of RL. Research should focus on creating intuitive dashboards and control panels that allow operators to easily interact with and oversee RL systems.
- Sustainability Metrics: Future work should also explore the development of sustainability metrics that RL systems can optimize. This includes not only energy efficiency but also water usage, pesticide application, and overall environmental impact.
- Policy and Regulatory Compliance: Research should address how RL systems can be designed to comply with local and international policies and regulations concerning energy usage and environmental protection.
- Data Privacy and Security: With the increasing use of IoT and RL, ensuring data privacy and security is essential. Future research should develop robust security protocols to protect sensitive data in greenhouse management systems.
- Real-World Case Studies: Conducting real-world case studies and pilot projects can provide valuable insights into the practical challenges and benefits of implementing RL in greenhouses. These studies can help refine RL models and identify best practices for successful adoption.
2.2. Literature review conclusion
3. Model proposed
3.1. Development in an Automated Greenhouse
3.2. Methodology
4. Phase 1. Control Stategies (set point in the environmental regulation loop)
5. Phase 2. IoT infrastructure
- Climate energy data is captured and stored in the system.
- Sensors collect environmental conditions.
- The data is sent to the gateway, which then transmits it to the local server and cloud services.
- The data is stored on the local server. The datasets are created with the main variables
- Local and cloud services allow remote monitoring and control of the system.
- The prediction of the environmental conditions in the next few hours is obtained
- With the data and predictions the learning algorithm is executed
- The algorithm proposes a modification of the set-point by increasing or decreasing its value, using the values of the reinforcements or Q values
| Number | Component Description |
|---|---|
| 1 | IoT Gateway or Router: This device acts as a central communication point between various sensors and devices and the cloud server. It uses IoT protocols to transmit data. |
| 2 | Sensors and Meters: These devices collect data from different sources: |
| 2a | Energy meter (A-Wh) that measures the amount of energy consumed. |
| 2b | Temperature sensor (thermostat) that measures the ambient temperature. |
| 3 | Solar Energy Controller: This component receives data from the sensors and manages the distribution of energy. |
| 3a | Inverter that converts solar energy from direct current (DC) to alternating current (AC). |
| 3b | Batteries for energy storage. |
| 3c | Switches and fuses for protection. |
| 4 | Local Server or Database: Stores and processes data locally. It is where all the data is collected for processing before being sent to the cloud. |
| 5 | Cloud Services: The data is sent to the cloud for additional storage and analysis. Cloud services can provide interfaces to monitor and control the system. |
| M | Monitoring Computer: Allows users to interact with the system, probably through a graphical interface for real-time monitoring and control of the system. |
6. Phase 3. Data set generation
- Sensor Placement. Strategically place sensors to capture a representative sample of environmental conditions within the greenhouse. This includes placing sensors at various depths and locations throughout the greenhouse to monitor microclimates.
- Data Logging and Transmission. Utilise data loggers and wireless networks to ensure continuous data capture and transmission. This includes setting up a reliable network infrastructure that can handle the data volume and frequency required for RL applications.
- Data storage. Implement a centralized data storage solution, preferably cloud-based, to store the large volumes of data generated by the sensors. Ensure the storage system supports efficient data retrieval and processing.
- Data prepossessing: cleaning, and normalisation. Address issues related to missing values, sensor malfunctions, and noise in the data. Techniques such as interpolation and filtering should be applied to ensure data quality and consistency. Normalise sensor data to a common scale to facilitate accurate analysis and model training. This step is crucial for integrating diverse data types into a cohesive dataset.
6.1. Materials and methods
- Humidifier with osmosis water mist.
- Air conditioners to heat and cool the modules. Twin, Triple Mitsubishi PUHZ-P200YKA Three-phase Classic Inverter Nominal cooling capacity (Min. - Max.) kW 19.00 Nominal heat capacity (Min. - Max.) kW 22.40
- Thermal shading screen.
- Extractor fan and zenithal opening windows with anti-trip mesh.
- Artificial light lamps to increase net assimilation.
- Micro-sprinkler and drip or flood irrigation system.
- Temperature and humidity probes.
- Electrical, compressed air, mains water and osmosed water connections.
- Embedded device (raspberrypi4) that deploys an intranet (WiFi, Bluetooth Low Energy) for communication, monitoring, and control.
- Electric energy meter in three-phase and single-phase circuits (shelly 3 EM). This consumption meter communicates with the embedded system through the WiFi and IP protocol.
- Communication to Web servers to obtain the temperature prediction in the greenhouse area.

7. Phase 4. Digital model and RL algorithm
7.1. Greenhouse model based in differential equations.
- is the heat capacity of the greenhouse (J/°C),
- is the inside temperature of the greenhouse at time t (°C),
- is the inside temperature of the greenhouse at time (°C),
- is the outside temperature at time t (°C),
- is the solar radiation entering the greenhouse at time t (W),
- is the heating power applied to the greenhouse at time t (W),
- is the cooling power applied to the greenhouse at time t (W),
- U is the overall heat transfer coefficient (W/m°C),
- A is the surface area of the greenhouse (m),
- is the time step (s).
7.2. Greenhouse model based in predictions
- TE: Exterior Temperature
- HRE: Exterior Relative Humidity
- RGE: Exterior Global Radiation
- VV: Wind Speed
- DV: Wind Direction
- LL: Rainfall
- TI: Interior Temperature
- HRI: Interior Relative Humidity
7.3. Reinforcement Learning deployment
- State (s): Represents the current situation of the environment. In our case, the state can include the internal temperature of the greenhouse, the external temperature, and solar radiation.
- Action (a): The decision taken by the agent in each state. In our case, actions are turning the heater on () or off ().
- Reward (r): The feedback received by the agent after taking an action in a state. The reward can be a function of the internal temperature and energy consumption.
- Policy (): The strategy followed by the agent to make decisions. The policy maps states to actions.
- Value (): The expected value of the cumulative reward starting from state s following policy .
- Q-Value (): The expected value of the cumulative reward starting from state s taking action a and following policy .
RL Model Formulas
8. Phase 5. Training
8.1. RL Algorithm application in greenhouse model based in differential equations model
- Observe Current State: The agent observes the current state , which includes the internal temperature (), the external temperature (), and solar radiation ().
- Select Action: Based on its policy , the agent chooses an action (turn the heater on or off).
- Apply Action: The chosen action is applied to the environment.
- Observe Reward and Next State: The agent receives a reward and observes the next state .
- Update Policy: The agent updates its policy using the learning algorithm, such as Q-Learning.
- , , and are weight coefficients that can be tuned.
- E is the energy consumption penalty.
- is the current inside temperature.
- is the setpoint temperature.
- is the penalty for temperature deviation.
- are the limit penalty for exceeding the maximum or minimum temperature.The limit penalty is applied when the inside temperature exceeds the maximum or minimum temperature limits:
- S is the set point change penalty.
- The set-point change penalty discourages frequent and large adjustments to the set-point temperature:
Analysis of the reward functions for greenhouse control
1a) Cooling temperature control analysis using the differential equation model (Figure 7)
General Improvement with RL Control
Temperature Control Effectiveness
Conclusion in cooling process using the differential equation model
1b) Heating temperature control analysis using the differential equation model (Figure 8)
General Improvement with RL Control
Temperature Control Effectiveness
Conclusion in heating process using the differential equation model
8.2. RL greenhouse control based in temperature prediction model
Prediction model
- External Temperature (TE)
- External Relative Humidity (HRE)
- Wind Direction (DV)
- Wind Speed (VV)
- External Global Radiation (RGE)
- Internal Relative Humidity (HRI)
2a) Cooling control analysis using the temperature prediction model (Figure 10 )
2b) Heating control using the temperature prediction model (Figure 11)
| Action 1 | = |
|---|---|
| Action 2 | = |
| Action 3 | = |
| Action 4 | = |
| Action 5 | = |
Analysis of Energy Consumption and Savings for different
Key Observations:
-
Energy Consumption Patterns:
- -
- The energy consumption increases over time for both RL and fixed set point controls.
- -
- For lower values, the energy consumption is generally lower. As increases, energy consumption increases.
-
Comparison between RL and Fixed Set point:
- -
- RL control consistently consumes less energy compared to fixed set point control for all set points.
- -
- The difference in energy consumption between the RL and the fixed set point control is more pronounced at higher set points.
Right Graph: Energy Savings for Different
Key Observations:
-
Energy Savings Trend:
- -
- The energy savings increase with higher values.
- -
- The energy savings range from approximately 36% at to nearly 50% at .
-
Efficiency of RL Control:
- -
- The RL control becomes more efficient in terms of energy savings as the increases.
- -
- This indicates that RL control is particularly beneficial in scenarios where higher set points are required, resulting in significant energy savings.
9. Conclusion
Author Contributions
Funding
Abbreviations
| MDPI | Multidisciplinary Digital Publishing Institute |
| DOAJ | Directory of open access journals |
| TLA | Three letter acronym |
| LD | linear dichroism |
References
- Kiumarsi, B.; Vamvoudakis, K.; Modares, H.; Lewis, F. Optimal and Autonomous Control Using Reinforcement Learning: A Survey. IEEE Transactions on Neural Networks and Learning Systems 2018. [Google Scholar] [CrossRef] [PubMed]
- Perera, A.T.D.; Kamalaruban, P. Applications of reinforcement learning in energy systems. Renewable & Sustainable Energy Reviews 2021. [Google Scholar] [CrossRef]
- Wang, Y.; Velswamy, K.; Huang, B. A Long-Short Term Memory Recurrent Neural Network Based Reinforcement Learning Controller for Office Heating Ventilation and Air Conditioning Systems 2017. 2017. [Google Scholar] [CrossRef]
- Kazmi, H.; Mehmood, F.; Lodeweyckx, S.; Driesen, J. Deep Reinforcement Learning based Optimal Control of Hot Water Systems. Energy 2018. [Google Scholar] [CrossRef]
- Ruelens, F.; Claessens, B.; Quaiyum, S.; Schutter, B.D.; Babuska, R.; Belmans, R. Reinforcement Learning Applied to an Electric Water Heater: From Theory to Practice. IEEE Transactions on Smart Grid 2018. [Google Scholar] [CrossRef]
- Zhang, Z.; Chong, A.; Pan, Y.; Zhang, C.; Lam, K.P. Whole building energy model for HVAC optimal control: A practical framework based on deep reinforcement learning. Energy and Buildings 2019. [Google Scholar] [CrossRef]
- Mason, K.; Grijalva, S. A review of reinforcement learning for autonomous building energy management. Computers & Electrical Engineering 2019. [Google Scholar] [CrossRef]
- Hosseinloo, A.H.; Ryzhov, A.; Bischi, A.; Ouerdane, H.; Turitsyn, K.; Dahleh, M.A. Data-driven control of micro-climate in buildings: An event-triggered reinforcement learning approach. Applied Energy 2020. [Google Scholar] [CrossRef]
- Sun, Q.; Wang, X.; Liu, Z.; Mirsaeidi, S.; He, J.; Pei, W. Multi-agent energy management optimization for integrated energy systems under the energy and carbon co-trading market 2022. 2022. [Google Scholar] [CrossRef]
- al Ani, O.; Das, S. Reinforcement Learning: Theory and Applications in HEMS. Energies 2022. [Google Scholar] [CrossRef]
- Fu, Q.; Han, Z.; Chen, J.; Lu, Y.; Wu, H.; Wang, Y.; Fu, Q.; Han, Z.; Chen, J.; Lu, Y.; Wu, H.; Wang, Y. Applications of reinforcement learning for building energy efficiency control: A review 2022. 2022. [Google Scholar] [CrossRef]
- Mauree, D.; Naboni, E.; Coccolo, S.; Perera, A.T.D.; Nik, V.M.; Scartezzini, J.L. A review of assessment methods for the urban environment and its energy sustainability to guarantee climate adaptation of future cities. Renewable & Sustainable Energy Reviews 2019. [Google Scholar] [CrossRef]
- Kazmi, H.; Suykens, J.A.K.; Balint, A.; Driesen, J. Multi-agent reinforcement learning for modeling and control of thermostatically controlled loads. Applied Energy 2019. [Google Scholar] [CrossRef]
- Chen, L.; Xu, L.; Wei, R.; Chen, L.; Xu, L.; Wei, R. Energy-Saving Control Algorithm of Venlo Greenhouse Skylight and Wet Curtain Fan Based on Reinforcement Learning with Soft Action Mask 2023. 2023. [Google Scholar] [CrossRef]
- Ruelens, F.; Iacovella, S.; Claessens, B.; Belmans, R. Learning Agent for a Heat-Pump Thermostat with a Set-Back Strategy Using Model-Free Reinforcement Learning. Energies 2015. [Google Scholar] [CrossRef]
- Vázquez-Canteli, J.R.; Ulyanin, S.; Kämpf, J.H.; Nagy, Z. Fusing TensorFlow with building energy simulation for intelligent energy management in smart cities. Sustainable Cities and Society 2019. [Google Scholar] [CrossRef]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction. IEEE Transactions on Neural Networks 2005. [Google Scholar] [CrossRef]
- Peters, M.; Ketter, W.; Saar-Tsechansky, M.; Collins, J. A reinforcement learning approach to autonomous decision-making in smart electricity markets. Machine Learning 2013. [Google Scholar] [CrossRef]
- Kar, S.; Moura, J.M.F.; Poor, H.V. QD -Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus+Innovations. IEEE Transactions on Signal Processing 2013. [Google Scholar] [CrossRef]
- Sierla, S.; Ihasalo, H.; Vyatkin, V. A Review of Reinforcement Learning Applications to Control of Heating, Ventilation and Air Conditioning Systems. Energies 2022. [Google Scholar] [CrossRef]
- Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques. Applied Energy 2019. [Google Scholar] [CrossRef]
- Ferrández-Pastor, F.J.; García-Chamizo, J.M.; Nieto-Hidalgo, M.; Mora-Pascual, J.; Mora-Martínez, J. Developing Ubiquitous Sensor Network Platform Using Internet of Things: Application in Precision Agriculture. Sensors 2016, 16. [Google Scholar] [CrossRef] [PubMed]











| Phase | Description |
|---|---|
| Phase 1 | Control strategies. Analysis of environmental and control variables with the agronomic expert. Examination of all possible strategies that can be part of optimization. Strategies for choosing set-points. |
| Phase 2 | IoT infrastructure. Design of the IoT infrastructure needed for the greenhouse to carry out the control and data set generation. Deployment of various embedded systems interconnected with the required IoT technologies. |
| Phase 3 | Data-set generation. Sensors generate data that are analyzed to determine greenhouse behavior models. Each greenhouse has specific characteristics that must be taken into account when applying the model. Data set are captured in normalized format (csv, json) |
| Phase 4 | Digital model and an RL algorithm. The objective is to maintain the greenhouse temperature between the minimum and maximum limits by optimizing the connection and disconnect of the air conditioning system. Implementation by comparing all strategies with those proposed in the model to determine their effectiveness. To act on the installation, a digital model is previously created on which the RL algorithm begins its calculations. This model will be adjusted to the reality of the behavior in the greenhouse application. Theoretical analysis validates reward strategies and policies that will be applied in an applied way. |
| Phase 5 | Training and Evaluation Train the RL agent using the constructed data set and the results of the analysis, iteratively updating the policy based on observed rewards and state transitions. With the data set obtained with the capture of IoT data and the results of the theoretical simulations, the policies, reward functions, and actions most appropriate to the type of greenhouse and installation are promoted. |
| Strategy | Description | IoT sensors |
|---|---|---|
| Set-point selection (standard control) | This is the simplest and most commonly used strategy. The technician selects the temperature, and the maximum and minimum values | temperature sensors. |
| Set-point adjustment with RL algorithm (RL actions) to optimize energy consumption | The values assigned at the set point are adjusted and modified by predicting expected conditions in the greenhouse. These changes are made at scheduled sampling times | Temperature, energy consumption, weather forecast, and temperature prediction inside the greenhouse. |
| Action 1 | |
|---|---|
| Action 2 | |
| Action 3 | |
| Action 4 | |
| Action 5 |
| Action 1 | = |
|---|---|
| Action 2 | = |
| Action 3 | = |
| Action 4 | = |
| Action 5 | = |
| Strategy | Reward Function Proposal |
|---|---|
| Reward function penalizes the agent for using the heating system |
![]() |
| Reward function penalizes the use of the heating system taking into account the efficiency and the actual deviation from the desired temperature range |
![]() |
| Refined Reward Function: Energy consumption penalty. Temperature stability. Exceeding maximum and minimum temperatures. Penalty for frequent changes in the set point |
![]() |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).


