Integrating Deep Reinforcement Learning Networks with Health System Simulations

Background and motivation: Combining Deep Reinforcement Learning (Deep RL) and Health Systems Simulations has significant potential, for both research into improving Deep RL performance and safety, and in operational practice. While individual toolkits exist for Deep RL and Health Systems Simulations, no framework to integrate the two has been established. Aim: Provide a framework for integrating Deep RL Networks with Health System Simulations, and to ensure this framework is compatible with Deep RL agents that have been developed and tested using OpenAI Gym. Methods: We developed our framework based on the OpenAI Gym framework, and demonstrate its use on a simple hospital bed capacity model. We built the Deep RL agents using PyTorch, and the Hospital Simulatation using SimPy. Results: We demonstrate example models using a Double Deep Q Network or a Duelling Double Deep Q Network as the Deep RL agent. Conclusion: SimPy may be used to create Health System Simulations that are compatible with agents developed and tested on OpenAI Gym environments. GitHub repository of code: https://github.com/MichaelAllen1966/learninghospital


Introduction
Deep Reinforcement Learning and Health System Simulations are two complementary and parallel methods that have the potential to improve the delivery of health systems.
Deep Reinforcement Learning (Deep RL) is a rapidly developing area of research, finding application in areas as diverse as game playing, robotics, natural language processing, computer vision, and systems control 1 . Deep RL involves an agent that interacts with an environment with the aim of developing a policy that maximises long term return of rewards. Deep RL has a framework that allows for generic problem solving that is not dependent on pre-existing domain knowledge, making these techniques applicable to a wide range of problems.
Health Systems Simulation seeks to mimic the behaviour of real systems. These may be used to optimise services such as emergency departments 2 , hospital ward operation and capacity 3 and community hospital capacity 4 . These examples of health service simulations are used for off-line planning and optimization of service configuration.
Health Systems simulations are usually used for planning of service delivery changes. There is potential for these type of simulations to be used to test, develop and train Deep RL agents. The motivation for this integration includes: • To perform research on the relative performance of different Deep RL methods (e.g. comparison of techniques such as Deep Q Learning and Actor-Critic methods).
• To perform research on the effect of differing • Ultimately, to be able to pre-train Deep RL agents which would then be transferred to, and used in, real world settings.
In order to test, train, and develop Deep RL agents, we need a standardised structure that we can use across different types of health systems. One such standardised structure, used across many differing domains, already exists, and that is OpenAI Gym (gym.openai.com) 5 . Gym provides a common interface to a range of problems, from control systems through to video games. The common interface allows the easy transfer of agents from one problemsolving environment to another. Gym is structured on an episodic framework to learning. The agent is exposed to multiple iterations, where the environment is reset to a fixed or random state, and the agent then interacts with the environment through a series of steps until some terminal state is reached indicating the end of the episode. With each step that agent passes an action to the environment. The environment returns an updated set of observations about the environment state, a reward, whether the terminal state has been reached, and any extra information available. Agents are designed to maximise the return of long term rewards.
In this paper we present a framework for coding Health Systems simulations, using the commonly used Python discrete event simulation package, SimPy 6 in a framework that allows interaction with a Deep RL agent.

Generic simulation properties
All simulations will share some common structure, methods, and attributes.

Generic structure
Algorithm 1 shows a high level structure of the code. This will be common to all interactions of Deep RL agents and SimPy simulations with only RL-specific alterations (such as the use of target networks and memory).

Generic simulation methods
The simulation is set up with three methods that interface the Deep RL agent and the simulation: • reset: resets the sim to a starting state and returns the first set of state observations.
• step: takes a step in the simulation. Passes an action to the simulation. Runs the simulation until the end of the next time step, and returns a tuple of next state, reward, terminal, info. The step method uses the SimPy method env.run(until=target-time), with target-time being incremented in the desired time steps. When the simulation time reaches the desired maximum simulation duration, the simulation returns terminal=True.
• render : displays the current state of the simulation.
Other internal methods in the simulation (not accessed by the Deep RL agent) that will be common to all simulations are: • calculate reward : calculates the reward to pass back to the Deep RL agents.
• get observations: creates a list of observations from the state.
• islegal : checks whether an action from the Deep RL agent is legal. If the action is not legal, this method will raise an exception.

Generic simulation attributes
All simulations will contain the following attributes.
• actions: A list of possible actions.
• action size: The number of possible actions.
• observation size: The number of features in the observation.
• state: An object containing the state of the simulation. This may be a simple object, such as a list or dictionary, or may be a custom Python object.

Hospital bed simulation overview
The hospital bed simulation is a very simplified model of a real hospital. Patients arrive at a hospital, stay for a given length-of-stay, and leave. The inter-arrival time of patients is sampled from an exponential distribution, the mean of which depends on the day of week (with average arrival numbers being higher on weekdays than weekends). The length-of-stay is also sampled from an exponential distribution, the mean of which does not depend on day of week. The hospital has a certain number of beds at any time. The Deep RL agent can request a change to the number of staffed beds, but this change is only enacted after 2 days. The simulation runs for 365 days by default, and the hospital is loaded initially with the expected average number of patients.

Hospital bed simulation state
The state in the simulation is held by a dictionary. This dictionary contains: • weekday: The current day of week (0-6).
• beds: The total number of staffed beds in the hospital (free or occupied).
• patients: The total number of patients in the hospital.
• spare beds: The number of unoccupied beds. If the number of patients exceeds the number of staffed beds then this number becomes negative and indicated the number of patients without a bed.
• pending bed change: The changes in staffed bed numbers requested by the Deep RL agent, but which has not yet been actualised.

Hospital bed simulation reward
The simulation has a target number of free staffed beds. By default this is set at 5% the number of patients in the hospital at any given time. The reward is always zero or negative and is the negative difference between the number of spare beds and the target number of spare beds (equation 1). reward = −abs(spare beds − target spare beds) (1)

Hospital bed simulation methods
Methods that are specific to the hospital bed simulation are: • adjust bed numbers: Adjusts the staffed bed numbers after a delay (SimPy timeout). Prior to the delay, the adjust pending bed change method is called to track the requested changes in staffed bed numbers. The delay is the simulation time between the Deep RL agent requesting a change to the number of staffed beds, and the change being made. The delay is stored in the simulation attribute delay to change beds, and may be set when initializing the simulation. When the number of staffed beds changes, the state dictionary items beds and pending bed change are adjusted accordingly.
• adjust pending bed change: Adjusts the state dictionary item pending bed change when the Deep RL agent requests a change to the number of staffed beds.
• load patients: Loads new patients at the start of the simulation, such that the initial number of patients in the hospital equals the calculated long term average (arrivals per day * average length of stay). This method calls the patient spell method. This method increments the number of patients and staffed beds by 1 for each patient loaded into the simulation.
• new admission: A continuous loop of new patients. This method/process is initiated on simulation reset. A new patient arrival is initiated by calling the patient spell method. The number of patients in the hospital is incremented by 1. There is then a delay (SimPy timeout) before the next iteration of the loop. The delay is the inter-arrival time of patients. This is sampled from an exponential distribution, the mean of which depends on both the average arrival rate (set using arrivals per day attribute, which may be set when initializing the simulation. Mean arrivals per day are increased by 20% on weekdays (days 0-4), and reduced by 50% on weekends (days 5 & 6).
• patient spell : The patient spell in the hospital. Length of stay is sampled from an exponential distribution based on a mean length of stay. If the patient is part of the initial load of the hospital, the length of stay is multiplied by a random number between 0-1 to account for the fraction of the length of stay already completed. After the spell in hospital is complete the number of patients is reduced by 1, and the number of spare beds recalculated.

Hospital bed simulation reset method
The actions in the simulation reset method (required in all simulations for interaction with the Deep RL agents) are: 1. Create new hospital simulation environment.

Initialise simulation processes (new admission method).
3. Set starting state values for state dictionary.
5. Get and return first set of state observations.

Hospital bed simulation step method
The actions in the simulation step method (required in all simulations for interaction with the Deep RL agents) are: 1. Check requested action is legal.
3. Call bed change process. 4. Make a step in the simulation. Use: env.run(until=self.next time stop).

Get new observations.
6. Check whether terminal state reached (based on simulation time).
8. Create an empty information dictionary (this dictionary is required to be compatible with OpenAI Gym step method).
10. Return tuple of next state, reward, terminal, info.

Hospital bed simulation attributes
Attributes that are specific to the hospital bed simulation are: • arrivals per day: Average arrivals per day.
• delay to change beds: Time between requesting change in beds, and change in beds happening (days).
• los: Average patient length of stay (days).
• sim duration: Length of simulation run (days).
• target reserve: target free staffed beds as a proportion of the number of patients present.
• time step: Time between action steps (day). The output of the Bagging D3QN is shown in figure  1. It is not the intention of this paper to present a fully optimised Deep RL agent, but it can be seen that the example network improves in performance over time (repeated model runs) and manages the modelled bed stock appropriately.

Discussion
Combining Deep RL and Health Systems Simulations has significant potential, for both research into improving Deep RL performance and safety, and in operational practice. Our aim in this paper is not to present an optimised Deep RL model, or a detailed hospital simulation, but to provide a framework that is compatible with OpenAI Gym environments, enabling easy transfer of the many methods that have been developed and tested in such environments.
The potential for combining Deep Learning and Health Systems Simulations goes beyond the framework provided here. For example, we have demonstrated that a machine learning model can be used to simulate patient-level clinical decision making 12 as part of broader clinical pathway simulation study.
The combination of Deep Learning and Health Systems Simulation is an area of research that will hopefully bear much fruit in the coming years.