1. Introduction
Machine learning algorithms have been used in the last ten years in almost all fields where the problems associated with the data classification, pattern recognition, non-linear regression, etc., have to be solved. Application of such algorithms has also intensified in the field of queueing theory. While the first steps in the successful application of machine learning to evaluate the performance characteristics of simple and complex queueing systems have already been taken, the total number of works on this topic still remains modest. As for reviews, we can only refer to recent paper by Vishnevskiy and Gorbunova [
32] which proposes a systematic introduction to the use of machine learning in the study of queueing systems and networks. Thus, we would also like to make a small contribution to the popularisation of the topic by briefly describing existing works. In Stintzing and Norrman [
29], the artificial neural network was used for predicting the number of busy servers in the
queueing system. The papers of Nii et al. [
23] and Sherzer et al. [
26] have answered positively the question, whether the machines could be useful for solving the problems in general queueing systems. They have used neural network to estimate the mean performance measures of the multi-server queues
based on the first two moments of the inter-arrival and service time distributions. A machine learning approach was used in Kyritsis and Deriaz [
20] to predict the waiting time in queueing scenarios. The combination of a simulation and a machine learning techniques for assessing the performance characteristics have been illustrated in Vishnevsky et al. [
31] on a queueing system
with
K priority classes. Markovian queues were simulated using artificial neural networks in Sivakami et al. [
27]. The neural networks were used also in Efrosinin and Stepanova [
9] to estimate the optimal threshold policy in a heterogeneous
queueing system. The combination of the Markov decision problem and the neural network for the heterogeneous queueing model with a process sharing was studied by Efrosinin et al. [
10]. The performance parameters of the closed queueing network by means of a neural network were evaluated in Gorbunova and Vishnevskiy [
13]. The main conclusion to be drawn from the results already obtained by application of the machine learning to models of the queueing theory is that the neural networks cannot be treated as a replacement for classical methods for system performance analysis, but rather complement the capabilities of such analysis.
This paper proposes a fairly universal method for solving the problem of optimal dynamic scheduling or allocation in queueing systems of the general type, i.e. where the times between events are arbitrarily distributed, and in queueing systems with correlated inter-arrival and service times. It can provide also the performance analysis of complex controlled systems described by multidimensional random processes, for which finding analytical, approximate or heuristic solutions is a difficult task. The method is exemplified by some version of a well-known queueing model consisting of several parallel queues and one server which serve the queues according to some control policy. The system is assumed to have heterogeneous arrival and service attributes, i.e. unequal arrival and service rates, as well as holding and switching costs. Such systems are known also as polling systems which have found wide application in various fields such as computer networks, telecommunications systems, control in manufacturing and road traffic. For analytic and numerical results in various types of polling systems with applications to the broadband wireless Wi-Fi and Wi-MAX networks, we refer interested readers to the textbook by Vishnevsky and Semenova [
33] and the references therein. The same authors in [
34] developed their research on polling systems to systems with correlated arrival flows such as
,
, and the group Poisson arrivals. In Vishnevskiy et al. [
35] it was shown that the results obtained by a neural network are close enough to the results of analytical or simulation calculations for the
and
-type polling systems with cyclic polling.
A Markovian analog of such a model has been investigated by a number of authors. The two queue homogeneous model with equal service rates and holding costs has been studied in Horfi and Ross [
15], where it was shown that the queues must be serviced exhaustively according to the optimal policy. In Liu et al. [
21] it was shown that the scheduling policy that routes the server with respect to the LQF (Longest Queue First) policy is optimal when all queue lengths are known and that the cyclic scheduling policy is optimal in the case that the only information available is the previous decisions. The systems with multiple heterogeneous queues, also known as asymmetric polling systems, in different settings have been studied intensively for the case of no switching costs in Buyukkoc et al. [
4], Cox and Smith [
5], where the optimality of the static
-rule was proved. This policy schedules a server first to the queue
i with a maximum weight
consisting of the holding cost and service rate. In Koole [
19], the problem of optimal control in a two-queue system was analysed by means of the continuous-time Markov decision process and dynamic programming approach. The author has found numerically that the optimal policy which minimizes the average cost per unit of time could be quite complex if there are both holding and switching costs. The threshold-based policy for such a queueing system was applied by Avram and Gómez-Corral [
3], where the expressions for the long-run expected average cost of holding units and switching actions of the server were given. The queueing system with general service times and set-up costs which effect on instantaneous switch from one queue to another was studied in Duenyas and Van Oyen [
6]. The authors proposed a simple heuristic scheduling policy for the system with multiple queues. A rather similar model is described in Matsumoto [
22], where the optimal scheduling problem is solved in a system with arbitrary time distributions. Here, instead of switching costs, the corresponding set-up time intervals required for switching are used. The system is controlled by the Learning Vector Quantization (LVQ) network, see for details Kohonen [
18], which classifies the system state by the closest codebook vector of a certain class in terms of the Euclidean metric. The problem with this approach is the large number of parameters associated with the codebook vectors, normally it is required several vectors per class, which must be estimated for a given control policy using computationally quite expensive recurrent algorithm.
It is assumed in our model that the queue currently being served by the server is serviced exhaustively. The next queue to be served by the server is selected according to a dynamic scheduling policy based on the queue state information, i.e. on the number of customers waiting in each of parallel queues. It is expected that the changing of the serviced queue involves the switching costs. The holding of a customer in the system is also linked to the corresponding cost. Obviously, even with some fixed scheduling control policy, calculating any characteristics of the proposed queueing system with arbitrary inter-arrival and service time distributions in explicit form is not a trivial task. It is also difficult to fix the dynamic control policy defining the scheduling in large systems in a standard way, e.g. through a control matrix that would contain for all possible states of the system the corresponding control action. Therefore, in such a case we consider it justified to solve the problem of finding the optimal scheduling policy with the aim to minimize the average cost per unit of time by combining the simulation as a tool to calculate the performance characteristics of the system with a machine learning paradigm, where the neural network will be responsible for the dynamic control. By training a neural network for some initial control policy, we obtain characteristics of the network in form of a matrix of weights and a vector of biases. Then the process of solving the optimal scheduling problem is reduced to a discrete parametric optimization. The parameters of the neural network must be optimized in such a way that this network by generating control actions at decision epochs can guarantee the minimal values of the average cost functional. For this purpose we have chosen one of the random search methods, such as simulated annealing, see e.g. in Aarts and Korst [
1], Ahmed [
2]. It is a heuristic method based on a concept of heating and controlled cooling in metallurgy and it is normally used for global optimization problems in a large search space without any assumption on the form of the objective function. Specially for the probabilistic scheduling problem this algorithm was implemented by Gallo and Capozzi [
12]. The algorithm will be adapted for a non-explicitly defined parametric function with a large number of variables defined on a discrete domain. To verify the quality of the calculated optimal parameters of the neural network, the values of the average cost functional for the markovian version of the queueing system are compared with the results obtained by solving the Markov decision problem (MDP). The general theory on MDP models is discussed in Puterman [
25] and Tijms [
30]. The details on application of MDP to controlled queueing systems with heterogeneous servers can be found in Efrosinin [
8]. The optimal control policy and the corresponding objective function are calculated in the paper by a policy-iteration algorithm proposed in Howard [
16] for an arbitrary finite-state Markov decision process. According to the MDP, the router in our system has to find an optimal control action in the state visited at a decision epoch with the aim to minimize the long-run average cost. Note that for our queueing model under general assumptions the semi-markov decision problem (SMDP) can be formulated. The SMDP is more powerful model than the MDP since by calculating the objective function the time spent by the system in each state before a transition is taken into account. The objective function must be calculated here also by means of a simulation. In this case the reinforcement learning algorithm, e.g.
Q-
P-Learning, can be applied. The main problem of this approach consists in the fact that for deterministic control policy many pairs of state and action can remain non-observable and as a result the control actions in such states can not be optimized. However, in our opinion, neural networks could also be used to solve this problem which is a potential task for further research. The SMDP topic is outside the scope of this article but we refer the readers to book by Gosavi [
14], where one can find very interesting overview on reinforcement learning and a well-designed classification of simulated-based optimization algorithms.
Summarising our research in this paper we can highlight the following main contributions: (a) We proposed a new controlled single-server system with parallel queues where the router uses a trained multi-level neural network to perform a scheduling control; (b) a simulated annealing method was adapted to optimize the weights and the biases of the neural network with the aim to minimize the average cost function which can be calculated only by a simulation; (c) the quality of the resulting optimal scheduling policy was verified solving a Markov decision problem for the markovian analog of the queueing system; (d) we provide detailed numerical analysis of the optimal scheduling policy and discuss its sensitivity to the shape of the inter-arrival and service time distributions; (e) the distinctive feature of our paper is the presence of algorithms used in the paper in form of pseudocodes with detailed descriptions of relevant steps.
The rest of the paper is organized as follows.
Section 2 presents the formal description of the queueing system and optimization problem.
Section 3 describes the Markov decision problem and the policy-iteration algorithm used to calculate optimal scheduling policy. In
Section 4, the event based simulation procedure of the proposed queueing system is discussed. The neural network architecture, parametrization and training algorithm are summarized in
Section 5.
Section 6 presents simulated annealing optimization algorithm. Numerical analysis is shown in
Section 7 and we conclude the paper in
Section 8.
The following notations are introduced for use in sequel. Let denote the vector of appropriate dimension with 1 in the jth position beginning from 0th and 0 elsewhere, denote the indicator function which takes the value 1 if the event A occurs and 0 otherwise. The notations and mean the minimum and maximum of the values that a can assume, and , denote the element index associated respectively with the minimum and maximum value.