DKPRG or how to succeed in the Kolkata Paise Restaurant gamevia TSP

The Kolkata Paise Restaurant Problem is a challenging game, in which $n$ agents must decide where to have lunch during their lunch break. The game is very interesting because there are exactly $n$ restaurants and each restaurant can accommodate only one agent. If two or more agents happen to choose the same restaurant, only one gets served and the others have to return back to work hungry. In this paper we tackle this problem from an entirely new angle. We abolish certain implicit assumptions, which allows us to propose a novel strategy that results in greater utilization for the restaurants. We emphasize the spatially distributed nature of our approach, which, for the first time, perceives the locations of the restaurants as uniformly distributed in the entire city area. This critical change in perspective has profound ramifications in the topological layout of the restaurants, which now makes it completely realistic to assume that every agent has a second chance. Every agent now may visit, in case of failure, more than one restaurants, within the predefined time constraints.


The Kolkata Paise Restaurant Problem
The El Farol Bar problem is a well-established problem in Game Theory. It was William Brian Arthur who introduced El Farol Bar problem in Inductive Reasoning and Bounded Rationality [1]. It cab be described as follows: N people, the players, need to decide simultaneously but independently whether they will visit tonight a bar that offers live music. In order to have an enjoyable night the bar must not be too crowded. Each potential visitor does not know the number of attendances each night in advance, so the visitor must predict and decide whether she wants to go to the bar or stay home. Although the players decide using previous knowledge, their choice is not affected by previous visits and they cannot communicate with each other [2]. In the El Farol Bar problem, the number of choices n is equal to 2, so the players have to choose between staying home or going out.
The Kolkata Paise Restaurant Problem, as well as the Minority Game, are variants of the El Farol Bar problem. The Minority Game was first introduced in 1997 by Damien Challet and Yi-Cheng Zhang [3]. They developed the mathematical formulation of the El Farol Bar which they named Minority Game. This game has an odd number N of agents and at each stage of the game they decide whether they will go to the bar or stay home. The minority wins and the majority loses. Agents have to decide whether they want to go to the bar or not, regardless of the predictions for the attendance size. The Minority Game is a binary symmetric version of the El Farol Bar problem, with the symmetry relying on the fact that the bar can contain half of the players.
The Kolkata Paise Restaurant Problem (KPRP for short) is a repeated game that was named after the city Kolkata in India. In KPRP there are n cheap restaurants (Paise Restaurants) and N laborers who choose among these places for their quick lunch break. If the restaurant they go to is crowded, they have to return to work hungry, since they do not have time to visit another restaurant, or lack the resources needed to travel to another area. This generalization of the El Farol Bar is described as follows: each of a large number N of laborers has to choose between a large number n of restaurants, where usually N = n. In order for a player to win, that is to eat lunch, only one player should go to each restaurant. If more than one players attend the same restaurant at the same time, an agent is chosen randomly and only this agent is served. The player who gets to eat has a payoff equal to 1, whereas all others who also chose this restaurant have a payoff equal to 0. Each agent prefers to go to an unoccupied restaurant, than visit a restaurant where there are other agents as well. This realization in turn implies that the pure strategy Nash equilibria of the stage game are Pareto efficient. Consequently, there are exactly n! pure strategy Nash equilibria for the stage game. This, combined with the rationality of the players, leads to the conclusion that it is possible to sustain a pure strategy Nash equilibrium of the stage game as a sub-game perfect equilibrium of the KPRP.
In [4] each agent has a rational preference over the restaurants and, despite the fact that the first restaurant is the most preferred, all agents prefer to be served even at their least preferable restaurant than not to be served at all. The prices are considered to be identical and each restaurant is allowed to serve only one agent. If more than one laborers attend the same restaurant, one laborer is chosen randomly, while the others remain starved for that day. The Kolkata Paise Restaurant problem is symmetric, given the preferences of the agents over the set of restaurants. The game is non-trivial because there is a hierarchy among the restaurants, with the first being the most preferable. Another approach stipulates that if multiple agents choose the same restaurant they have to share the same meal and as a result, none of them is happy. The choice of each player is secret and they have to choose simultaneously. The players choose their strategy based on the payoffs. It is assumed that the restaurants charge their meals with the same price. There is even a version where some restaurants offer much tastier meals than others. This game is a repeated game with a period of one day, and the choices of each player are known to the other players at the end of the day. The agents have their personal strategy as to where they intend to have lunch. In order to attain the optimal solution, the agents have to communicate and coordinate their actions, something which is forbidden. As a result, some agents may end up hungry and, at the same time, some restaurants may waste their food.
Some authors study the case where the number of restaurants n is small and the agents take coordinated actions. Then, they analyze the game as a sub-game of KPRP and estimate the possibility to preserve the cyclically fair norm. As a result, punishment schemes need to be designed in this case. Every evening the agent makes up her mind based on her past experiences and the available information about each restaurant, which is supposed to be known to every agent. Each agent decides on her own, with no interaction with the other players. If more than one customers arrive at the same restaurant, an agent is randomly chosen to eat and the rest have to starve. There is a ranking system among the restaurants shared by the customers. The n! Pareto efficient states can be achieved when all customers get served. The probability of this event is very low, due to the absence of cooperation and disclosure among the agents.

The Travelling Salesman Problem
In discrete optimization problems, the variables take discrete values and, usually, the objective is to find a graph or another similar visualization, from an infinite or finite set [5]. The Travelling Salesman Problem (TSP) is a famous optimization problem described as follows: a salesman has to visit all the nearby cities starting from a specific city to which the salesman must return [6]. The only constraint is that the salesman must start and finish at the same specific city and visit each city only once. The visiting order is to be determined by the salesman each time the problem arises. The cities are connected through railway or roads and the cost of each travel is modeled by the difficulty in traversing the edges of the graph. The salesman has just one purpose and that is to visit all the cities with the minimum possible travel cost. In this problem, the optimum solution is the fastest, shortest and cheapest solution. TSP is easily expressed as a mathematical problem that typically assumes the form of a graph, where each of its nodes are the cities that the salesman has to visit. TSP was formulated during the 1800s by Sir William Rowan Hamilton and Thomas Kirkman and it was first studied by Karl Menger during the 1930s at Harvard and Vienna [6]. The purpose of TSP is for the salesman to determine the route with the lowest possible cost. Some of the typical applications of TSP are network optimization and hardware identification problems. It has kept researchers busy for decades and many solutions have emerged. TSP is an NP-hard problem and the results of the practical, heuristic solutions are not always optimal, but approximate [6]. The simplest "naive" solution to this problem is, of course, to try all possibilities and explore all paths, but the cost in time and complexity is so huge that is practically impossible. In order to overcome that, when solving a TSP the pragmatic focus is a near-optimal route, instead of always the best. For the graph depicted in Figure 1 the optimal tour is 1 → 3 → 4 → 2 → 1 with cost 7 + 12 + 19 + 11 = 49.

Games and the Kolkata Paise Restaurant Problem
The Kolkata Paise Restaurant Problem (KPRP) was initially introduced in an earlier form in 2007 [7]. Its current formulation appeared in 2009 in [8] and [9]. Subsequently, many creative ideas and different lines of thought have been published and even a quantum version of the game has arisen. In [8], the importance of diversity is emphasized while herd behavior is penalized. Furthermore, the differences between the KPRP and the Minority Game are highlighted. One major difference is that in the KPRP the emphasis is placed on the simultaneous move many choice problem, in contrast to the Minority Game, which studies a simultaneous move two choice problem. Another important difference is the existence of a ranking system in the KPRP, but not in the Minority Game. Some of the strategies developed for the KPRP are discussed in [10] which also discusses problems where these strategies can be successfully applied. Ghosh et al. in [11] present a dictator's, or a social planner's as they call it, solution. In this solution the agents form a queue and the planner assigns each of them to a ranked restaurant depending on the queue of the first evening. The following evening the agents go to the next ranked restaurant and the last in the queue goes to the first ranked restaurant. This solution is called the fair social norm. In real life, each agent decides in parallel or democratically every evening, so this solution may be considered somewhat unrealistic. However, the parallel decision or democratic decision strategy is not as efficient as the dictated one, with the last leading to one of the best solutions to this problem. Banerjee et al. in [4] offer a generalization of the problem in such a way that the cyclically fair norm is sustained. Each strategy is viewed as a sub-game of perfect equilibrium of the KPRP. In 2013, Ghosh et al. published an article about stochastic optimization strategies in the Minority Game and the KPRP [12]. There, they point out that a stochastic crowd avoiding strategy results in a efficient utilization in the KPRP. Reinforcement learning was first introduced in the KPRP by [13], together with six revision protocols aiming at efficient resource utilization. These protocols combine local information with reinforcement learning, Each revision protocol has two variants depending on whether or not customers who were once served by a restaurant remain loyal to that restaurant in all subsequent periods. Some of these protocols were experimentally tested and shown to improve the utilization rate. Another generalization was introduced by Yang et al. in [14] aiming at dynamic markets this time. They studied what happens when agents can either divert to another district or stay in the current one. Each agent may replace another agent with no prior knowledge of the game, following a Poisson distribution. Agarwal et al. in [15] showed that the KPRP can be reduced to a Majority Game. In the latter, capacity is not restricted and agents aim at choosing with the herd. If more than one agents choose the same option, the utility decreases (see also [16] and [17]). Abergel et al. in [18] applied the KPRP in hospitals and beds. The local patients choose among the local hospitals those with the best ranking and compete with the other patients. If the patients are not treated in time it is a clear case of social waste of service for the rest of the hospitals. A brief presentation of the KPRP was given by Sharma et al. in [19], which included the origin and an overview of the game, strategies that may arise, several extensions and its applications in a variety of phenomena. The authors also presented an experimental analysis. Park et al. in [20] introduced the KPRP in the Internet of Things (IoT) and IoT devices. They used a KPRP approach to develop a scheme for these devices, because it allowed them to model situations where multiple resources are shared among multiple users, each with individual preferences. In [21], Sinha et al. propose a phase transition behavior, where if two or more agents visit the same restaurant, one is randomly picked to eat. The agents evolve their strategy based on the publicly available information about past choices in order for each of them to reach the best minority choice. In the same paper, they also develop two strategies for crowd-avoiding.
A significant trend, which has been quite evident in the last two decades, is to enhance classical games using unconventional means. The most prominent direction is to cast a classical game in a quantum setting. Since the pioneering works of Meyer [22] and Eisert et al. [23], quantum versions for a plethora of well-known classical games have been studied in the literature. Starting from the most famous of all games, the Prisoners' Dilemma [23], [24], [25], [26], many researchers have sought to achieve better solutions by employing quantumness (see the recent [27] and [26] and references therein), or other tools, such as automata ( [28]). It not surprising that unconventional approaches to classical games are undertaken because they promise clear advantages over the classical ones. Another line of research is to turn to biological systems for inspiration. The Prisoners' Dilemma features prominently in this setting also (see [29] for a brief survey), but in reality most game situations can easily find analogues in biological and bio-inspired processes [30] and [31]. A quantum version of the KPRP was proposed in [32], where the quantum Minority Game was expanded to a multiple choice version. The agents cannot communicate with each other and have to choose among m choices, but an agent wins if she makes a unique choice. Higher payoffs than the classical version were observed due to shared entanglement and quantum operations. In Sharif's [33] review, quantum protocols for quantum games were introduced, including a protocol for a three-player quantum version of the KPRP. In [34] the authors study the effect of quantum decoherence in a three-player quantum KPRP using tripartite entangled qutrit states. They observe that in the case of maximum decoherence the influence of the amplitude damping channel dominates over depolarizing and flipping channels. Furthermore, the Nash equilibrium of the problem does not change under decoherence.

The Travelling Salesman Problem
The Travelling Salesman Problem is a well-known combinatorial optimization problem. In this problem a salesman must compute a route that begins from a particular node (the starting location), passes through all other nodes only once before returning to the starting location, and has the minimum cost. The first appearance of the term "Travelling Salesman Problem" probably occurred between 1931 and 1932. The core of the TSP problem, however, was first mentioned over a century before, in a 1832's German book [35]. The mathematical formulation was introduced by Hamilton and Kirkman [35] and is typically expressed as follows. A cycle in a graph is a path that begins and ends at the same node and passes through all other nodes once. A Hamiltonian cycle contains all the vertices of the graph. The Travelling Salesman Problem amounts to figuring the cheapest way to visit every city and return back. Research efforts on TSP and closely related problems include Ascheuer et al. [36] that addressed the asymmetric TSP-TW using more than three alternative integer programming formulations and more than ten neighborhood structures. Gutin and Punnen [37] studied the effect of sorting-based initialization procedures. The authors claimed that understanding the algorithmic behavior is the best way to find solutions, since this would help in determining the best solution out of those available. Jones and Adamatzky [38] showed experimentally that using a sorting function within their algorithm was not functional and failed to return a feasible solution in some cases.
The difficulty in tackling the TSP motivated researchers to explore other avenues. One such notable and particularly promising approach is based on metaheuristics. A metaheuristic is a high-level heuristic that is designed to recognize, build, or select a lower-level heuristic (such as a local search algorithm) that can provide a fairly good solution, particularly with missing or incomplete information or with limited computing capacity [39]. The term "metaheuristics" was coined by Glover. Metaheuristics can be used for a wide range of problems. Of course, it must be noted that metaheuristic procedures, in contrast to exact methods, do not guarantee a global optimal solution [40]. Papalitsas et al. [41] designed a metaheuristic based on VNS for the TSP with emphasis on Time Windows. Another quantum-inspired method, based on the original General Variable Neighborhood Search (GVNS), was proposed in order to solve the standard TSP [42]. This quantum-inspired procedure was also applied successfully to the solution of real-life problems that can be modeled as TSP instances [43]. A quantum-inspired procedure for solving the TSP with Time Windows was also presented in [44]. More recently, [45] applied a quantum-inspired metaheuristic for tackling the practical problem of garbage collection with time windows that produced particularly promising experimental results, as further comparative analysis demonstrated in [46]. A thorough statistical and computational analysis on asymmetric, symmetric, and national TSP benchmarks from the well known TSPLIB benchmark library, was conducted in [47]. Very recently, Papalitsas et al. parameterized the TSPTW into the QUBO (Quadratic Unconstrained Binary Optimization) model [48]. The QUBO formulation enables TSPTW to run on a Quantum Annealer and is a critical step towards the ultimate goal of running the TSPTW with pure quantum optimization methods. Stochastic optimization can be implemented through several metaheuristic processes. The solution generated depends on the set of created random variables [39]. Metaheuristic processes may find successful solutions with less computational effort than accurate algorithms, iterative methods or basic heuristic procedures by looking for a wide variety of feasible solutions [40]. Hence, metaheuristic procedures are extremely useful and practical approaches for optimization because they can guarantee good solutions in a small amount of time. For example, a problem instance with thousands of nodes can be run for 30 − 40 seconds and produce a solution with 3 − 5% deviation from the optimal. This deviation depends on the implemented local search procedures inside the main part of the algorithm. An efficient design and choice of those improvement heuristics will define the deviation from the optimal solution. In view of the small amount of time they require and of the good quality of the solution they produce, we advocate their functional use in the Distributed Kolkata Paise Restaurant game.

Contribution
Let us now briefly summarize the contributions of this paper.
• We study the Kolkata Paise Restaurant Problem from an entirely new perspective. We identify and state explicitly certain implicit assumption that are inherent in the standard formulation of the game. We then take the unconventional step to abolish them entirely. This provides the opportunity for an entirely new setting and the adoption of a novel approach that leads to a new and more efficient strategy and, ultimately, to greater utilization for the restaurants.
• For the first time, to the best of our knowledge, we focus on the spatial setting of the game and we propose a more realistic and plausible topological layout for the restaurants. We perceive the restaurants to be uniformly distributed in the entire city area. This, rather pragmatic and more probable in reality situation, has profound ramifications on the topological layout of the game: the restaurants now get closer and, as their number n increases, a standard assumption in the literature, the distances between nearby restaurants decrease. Due to the distribution of the restaurants, the resulting version of the game is aptly named the Distributed Kolkata Paise Restaurant Game.
• Thus, now it is realistic to assume that every agent has a second, a third, maybe even a fourth, chance. Every agent may visit, within the predefined time constraints, more than one restaurants. The agent is no longer a single destination and back traveller. The agent now resembles the iconic travelling salesman, who must pass through a network of cities, visiting every city once, coming back to the starting point, and all the time following the optimal route. This leads to the completely novel idea that each agent faces her own personalized TSP. We emphasize that the situation is specific for each agent, since the resulting network will vary. This is because each agent may have a different starting position and a different preference ranking of the restaurants. Of course, it is practically impossible to compute exact solutions for the TSP, as TSP is a famous NP-hard problem. However, this is a very small setback, as we may use metaheuristics. Metaheuristics can produce near-optimal solutions in a very short amount of time and this makes them indispensable tools of great practical value.
• This entirely new setting is formalized and then rigorously analyzed via probabilistic tools. We derive general formulas that mathematically confirm the advantages of this policy and the increase in utilization. Detailed examples of typical instances of the game are given in a series of Tables and the derived equations are graphically depicted in order to demonstrate their qualitative and quantitative characteristics. Our scheme demonstrably achieves utilization ranging from 0.85 and going to 0.95 and even beyond from the first day. The steady state utilization, to which the game rapidly converges, is, as expected, 1.0.
• Finally, let us point out that the equations we derive generalize formulas that were previously presented in the literature, showing that the latter are actually special cases of our results.

Organization of the paper
The structure of this paper is as follows. In section 1 we provide a comprehensive description of the KPRP and the TSP. In subsection 1.3 we mention some important works that deal with the KPRP and the TSP. The rigorous formulations of the KPRP and the TSP are presented in section 2. In section 3 we give a thorough explanation and presentation of the distributed version of the game, which we call Distributed Kolkata Paise Restaurant Game. We analyze mathematically the topological situation regarding the restaurants in section 4, where the profound ramifications of the hypothesis that they follow the uniform probability distribution are developed. We formally prove the main results of the paper, which showcase the advantages of the distributed framework in a definitive manner in section 5 . Finally, in Section 6 we summarize our results and discuss future extensions of this work.

Formulation of the standard Kolkata Paise Restaurant Problem
In its most usual formulation, the Kolkata Paise Restaurant Problem is a repeated game with infinite rounds. There is a set of players, typically called agents or customers, that is denoted by A = {a 1 , . . . , a n }, a set of restaurants that is denoted by R = {r 1 , . . . , r n }, and a utility vector u = (u 1 , . . . , u n ) ∈ R n , which is associated with the restaurants and is common to every agent. On any given day, all agents decide to go to one of the n restaurants for lunch. If it happens that just one agent arrives at a specific restaurant, then she will have lunch and she will be happy. If, however, two or more agents choose the same restaurant for lunch, then, it is generally assumed that just one of them will eat. The one to eat is chosen randomly. So, in such a case all but one will not be happy. Each agent has a utility and if they have lunch their utility is one, otherwise it is zero. In Chakrabarti et al. [8] the KPRP is modeled as a general one-shot restaurant game, where the set of agents is considered to be finite and the utilities are ranked as follows: The set of agents A and the ranking of the utilities can be used to define the game. The latter can be represented as G(u) = (A, S, ), where A is the set of agents, S is the set of strategies available to all agents, and = ( 1 , . . . , n ) stands for the payoff vector. If the i th agent a i decides to go to the j th restaurant r j , then the corresponding strategy is s i = j. Every day each agent decides to which of the n restaurants will go to eat. If s i = j, this means that agent a i has decided to go to restaurant r j . Given any strategy combination s = (s 1 , . . . , s n ) ∈ S n , the associated payoff vector is defined as (s) = ( 1 (s), . . . , n (s)), where the payoff i (s) of player a i is us i Ni(s) and N i (s) is the total number of players that have made the same choice, i.e., restaurant r j , as player a i , including a i . The strategy combination is in fact the restaurants the agents chose to eat to, and their payoff depends on their decision and the number of other agents that have made the same choice. In the literature, a game like KPRP, where there are potentially infinite rounds and in each round the same stage game is played, is referred to a supergame [49]. A supergame is a situation where the same game is repeatedly played as a one-shot game and the agents count the payoff in the long run of the game. This makes the payoff function more complex due to the repetitions.

Formulation of the TSP
The problem of finding the shortest Hamiltonian cycle is closely related to the TSP. The Hamiltonian graph problem, i.e., determining if a graph has a Hamiltonian cycle, is reducible to the traveling salesman problem. The trick is to assign zero length to the graph edges and, at the same time, create a new edge of length one for each missing edge. If the TSP solution for the resulting graph is zero, then there is a Hamiltonian cycle in the original graph; if the TSP solution is a positive number, then there is no Hamiltonian cycle in the original graph (see [50]). In different fields, such as operational research and theoretical computer science, TSP, which is NP-hard, is of great significance. Usually TSP is represented by a graph. The fact that TSP is NP-hard implies that there is no known polynomial-time algorithm for finding an optimal solution regardless of the size of the problem instance [51]. There are two types of models for the TSP, symmetric and asymmetric. The former is represented by a complete undirected graph G = (V, E) and the latter by a complete directed graph G = (V, A). Assuming that n denotes the number of cities (nodes), V = {1, 2, 3, . . . , n} is the set of vertices, E = {(i, j) : i, j ∈ V, where i < j} is the set of edges, and A = {(i, j) : i, j ∈ V, where i = j} is the set of arcs. A cost matrix C = [c i,j ], which satisfies the triangle inequality c i,j ≤ c i,k + c k,j for every i, j, k, is defined for each edge or arc. If c i,j is equal to c j,i , the TSP is symmetric (sTSP), otherwise it is called asymmetric (aTSP). In particular, this is the case for problems where the vertices are points P i = (X i , Y i ) of the Euclidean plane, and The triangle inequality holds if the quantity c i,j represents the length of the shortest path from i to j in the graph G [52]. In the case of the symmetrical TSP, the number of all possible routes covering all cities and corresponding to all feasible solutions is given by (n−1)! 2 (recall that the number of cities is n). The cost of the route is the sum of the costs of the edges followed.

Formulation of the DKPRG
The Kolkata Paise Restaurant problem (KPRP) is considered an extension of the minority game, as it involves multiple players (n) each having multiple choices (N ). In its most general form it is possible that n = N . In this paper we follow the pretty much standard approach that the number of agents is equal to the number of restaurants, i.e., n = N . The novelty of our work lies on the fact that we advocate a spatially distributed and, in our view, more realistic version of the KPRP by taking into account the topology of the restaurants and by allowing the agents to begin their routes from different starting points. We call our version the Distributed Kolkata Paise Restaurant Game, or DKPRG for from now on.
In the original formulation of the KPRP one may readily point out the following important underlying assumptions.
(A1) All agents start from the same location.
(A2) All restaurants are near enough to the point of origin of every customer, so that each customer can, in principle, go to any restaurant, eat there and return back to work in time, that is within the time window of the lunch break.
(A3) Every restaurant is sufficiently far away from every other restaurant, so as to make prohibitive in terms of time constraints the possibility of any customer trying a second restaurant, in case her first choice proved fruitless.
In the two dimensional setting of Kolkata, or, as a matter of fact, of any city, the above assumptions taken together imply something very close to the situation depicted in Figure 2. There, one can see that the agents are concentrated within a very narrow region, which can be viewed as the center of a conceptual "circle." The restaurants are located on this "circle" and since no two of them are allowed to be close they form something that resembles a "regular polygon." This last remark is significant because it disallows a situation as the one shown in Figure 3. The spatial layout depicted in this Figure is strictly forbidden. The proximity of two, three or more restaurants would contradict the impossibility of a second chance. In the standard KPRP no agent is allowed a second chance. We write "circle" and "regular polygon" inside quotation marks because we are not obviously dealing with a perfect geometric circle or a perfect regular polygon, but two dimensional approximations resembling the aforementioned symmetric shapes. Clearly, this a very special topological layout, one that is highly unlikely to be observed in practice. There is no compelling reason for the restaurants to exhibit this regularity or the agents to be confined to approximately the same location. On the contrary, it would seem far more reasonable to assume that at least the restaurants and perhaps even the agents are uniformly distributed within a given area. Finally, the usual assumption that the preference ranking of the restaurants is common to all customers seems a bit too special and probably too restrictive. Restaurants Agents a 1 a n Figure 3: The spatial layout depicted in this Figure is strictly forbidden. The proximity of two, three or more restaurants would contradict the impossibility of a second chance. In the standard KPRP no agent is allowed a second chance.
With that motivation in mind, in this work we propose to abolish all these assumptions. The resulting game is spatially distributed in terms of restaurants and as such is called the Distributed Kolkata Paise Restaurant Game (DKPRG). In our setting, each customer may have her own staring point, which is, in general, different from the starting locations of the other customers. The staring locations can either be concentrated in a small region of the Kolkata city area, precisely like the standard KPRP, or they may be assumed to follow a random distribution. The fundamental difference with prior approaches is that now the restaurants are viewed as being uniformly distributed over the city of Kolkata. This uniform randomness in the placement of the restaurants implies that there must be clusters of restaurants sufficiently near each other. This conclusion becomes inescapable, particularly in the case where the number of restaurants is large (n → ∞). As will be shown in the following sections, the expected distance between "adjacent" restaurants will be relatively short and will only decrease as the number n of restaurants increases.
The assumption of the random placement of restaurants leads to a personalized situation for each individual agent: each agent is in effect faced with a personalized Travelling Salesman Problem. To every agent corresponds an individual graph, which is assumed to be complete. The completeness assumption is not absolutely essential for the TSP, but, in any case, seems reasonable in the sense that one can go from any given restaurant to any other. This graph has n + 1 nodes, which are the locations of the n restaurants plus the location of the starting point of the customer. The costs assigned to the edges of the graph are also personalized ; each agent combines an objective factor, the spatial distances between the restaurants, with a subjective factor, her personal preferences. Recall that in the DKPRG we forego the common preference restriction and we let every customer have a distinct preference, i.e., she may prefer a particular restaurant and dislike another. Let us clarify however, that getting served, even at the least preferable restaurant, is more desirable than not getting served at all! This in turn will lead to a possibly unique ordering of the restaurants from the most preferable to the least for each agent. For instance, if two restaurants r and r ′ are equidistant from the staring point s of a certain customer, something that is obviously an objective fact, but the customer in question has a clear preference for r over r ′ , then she adjusts the costs c s,r and c s,r ′ corresponding to the edges (s, r) and (s, r ′ ), respectively, so that c s,r < c s,r ′ .
Hence, every agent is faced with a distinctive network topology, which is the combined result of the inherent randomness of the spatial locations and the subjectiveness of her preferences. The topology of the restaurants has a further implication of the utmost importance: a customer whose first choice is a particular restaurant, will now have with very high probability the opportunity to visit a second, a third, or even a fourth restaurant in the same area if need be. For each agent the time cost is dominated by the time taken to visit the first restaurant; the trip to other nearby restaurants in the same region incurs a relatively negligible time cost due to their spatial proximity. The customer has a second (even a third) chance to be served within the time window of the lunch break. Thus, an efficient, if not optimal, method for every customer to make well-informed decisions regarding her first, second, third, etc. choice is to solve the Travelling Salesman Problem for her personalized graph. Obviously, the TSP being an NP-hard problem, precludes the possibility of exact solutions. Nonetheless, near-optimal solutions of great practical value can easily be achieved in very short time by employing metaheuristics, as we have pointed out in subsection 1.3.
From this perspective, we proceed now to propose an effective distributed strategy that, if adopted by every agent, will lead to an efficient global solution. All of them will use a common high-level strategy that is tailored and fine-tuned according to their individual preferences. To enhance clarity we explicitly state below the hypotheses and that define and characterize the DKPRG variant.
(H2) There are two main protagonists in the game. First, the n agents (also referred to as customers) with different, in general, starting locations. The set of agents is denoted by A = {a 1 , . . . , a n }. Second, the n restaurants that are uniformly distributed within the same area. The set of restaurants is denoted by R = {r 1 , . . . , r n }. All agents know the locations of the restaurants, but each one of them need not know the starting locations of the other agents.
(H3) To each agent a ∈ A corresponds a distinct personal preference ordering P a = (r j1 , r j2 , . . . , r jn ) such that restaurant r j1 is her first preference, r j2 is her second preference, and so on, with r jn being the least preferable restaurant for a.
(H4) We adopt the standard convention that each restaurant can accommodate only one customer at a time. The immediate ramification of this convention is that if two or more customers arrive at a restaurant, only one can be served. The one to be served is chosen randomly.
(H5) The aforementioned hypotheses immediately bring to the front the novelty and contribution of our approach. The positions of the restaurants relative to the starting point of each customer create for every customer a distinct topology, a distinct network of restaurants. Specifically, each agent a ∈ A perceives a personalized graph G a = (V a , E a ). G a is a complete undirected graph having n+1 vertices v 0 , v 1 , . . . , v n , where v 0 is the starting location of a and v j is the location of restaurant j, 1 ≤ j ≤ n.
For each pair of distinct vertices u, v ∈ V a there exists an undirected edge (u, v) ∈ E a . The graph G a is complemented with the (symmetric) cost matrix C a , that assigns to each edge (u, v) a cost c u,v . We may surmise that the costs are computed by a function f a that incorporates geographical data, i.e., the distances between the restaurants, and the preference ordering P a . The topological layout of the restaurants is an objective and global reality that is common to all customers and is undeniably crucial to a rational computation of the travel costs. On the other hand, it would be illogical if an agent did not take into account her preferences. The weight assigned to the spatial distances need not be equal to the weight assigned to the preferences. A conservative approach could assign a far greater weight to the distances compared to the preferences. A more idiosyncratic approach would deal with both on an equal footing by assigning equal weights to distances and preferences. It is plausible that for customer a the personal preferences may play a more prominent role than for customer a ′ , in which case we may allow for the possibility that, in the process of computing the costs, each customer assigns completely different weights. In any event, we regard each cost matrix C a as distinct, which, along with the uniqueness of each V a , explains why the resulting networks G a are all considered different, that is every customer is confronted with her own unique and personalized TSP.
(H6) Each customer a ∈ A solves the corresponding TSP using an efficient metaheuristic that outputs a near-optimal solution. In that manner, a computes a (near-optimal) tour T a = (l 0 , l 1 , . . . , l n , l n+1 ). The tour is represented by the ordered list (l 0 , l 1 , . . . , l n , l n+1 ), where l 0 = l n+1 is the starting point of a and l k , 1 ≤ k ≤ n, is the index of the restaurant in the k th position of the tour. Endowed with their individual route T a , which is an integral part of their strategy, all customers follow a simple common strategy. From their starting location l 0 they first travel to the restaurant r l1 . Once there, those that get served conclude their route successfully. Those that do not get served, proceed to the restaurant r l2 . If their attempt at getting lunch also fails at r l2 , then they proceed to r l3 , and so on. Obviously, the time constraints, that is the fact that the agent must have returned to her staring point by the time the lunch break is over, means that the agent will not have the opportunity to exhaust the entire tour. The customer must interrupt the tour at some point in order to return. This may happen after travelling unsuccessfully to two, three, or more restaurants, depending on the topology of the network.
(H7) The Revision Strategy. We adopt the standard assumption that the agents operate independently and no communication takes place between any two of them. Therefore, each customer is completely unaware of the routes of the other customers. They revise their strategy every evening taking into account only what happened during the present day. This means that they decide using only information from the last day and no prior information or history need to be kept. We assume that all agents follow the same policy. If they got served at a specific restaurant this day, then tomorrow they go straight to the same restaurant. This applies even if this restaurant is not in the first place of their tour. For example, those agents that failed to get lunch at their first choice, but managed to do so at their second, or third choice, tomorrow go straight the restaurant that served them despite the fact that this particular restaurant is not their most preferable. Those that failed to get lunch, only know which restaurants were left vacant, i.e., not visited by any agent today. Further or more elaborate information, such as the choices of other players or if they got served and at which restaurant, seems unnecessary. The unserved agents construct and solve their new personalized TSP, this time using as vertices only the vacant restaurants (plus of course their starting location).
Having explained the details of the DKPRG, we shall proceed to analyze the mathematical characteristics and evaluate the resulting utilization of this policy in the following sections.

Topological considerations
We begin this section by fixing the notation and giving some definitions to clarify the most important concepts of our exposition. • The one-shot DKPRG takes place every day. We use the parameter t = 1, 2, . . . , to designate the day under consideration.
• To each agent a ∈ A corresponds the personalized network G a = (V a , E a ) together with the personalized cost matrix C a , which are constructed in the way we outlined in the previous section. Agent a follows the tour T a = (l 0 , l 1 , . . . , l n , l n+1 ), which is the solution to her personalized TSP. As we have emphasized, by using metaheuristics it is possible to obtain near-optimal solutions in a very short amount of time.
• The quality and efficiency of the strategy is measured by the utilization ratio f . This is of course the fraction of agents being served in a day, or, equivalently, the fraction of restaurants serving customers in a day. The equivalence is obvious because there are n customers and n restaurants.
In section 5 we shall revisit the concept of utilization and we shall be more precise by asserting the expected utilization per day as a function of the game parameters.
An agent a who has opted to follow tour T a = (l 0 , l 1 , . . . , l n , l n+1 ) will initially try to get lunch at restaurant r l1 . If she succeeds, she will eat and then return to her starting point. If she fails, she will visit the next restaurant in the tour, i.e., r l2 . If she gets lunch there, she will subsequently go back to work. This process will go on until either she gets served or runs out of time, in which case she must interrupt the tour and return to work. If the time constraints allow her to pass through the first m restaurants in her tour, in the worst-case scenario of m − 1 consecutive failures, then we say that T a is an m-stop tour.
To facilitate our mathematical analysis we take for granted that all customers follow m-stop tours. We have already explained why, in our view, m must be ≥ 2. The case where m = 1 reduces to the standard treatment of the KPRP, which has already been analyzed extensively in the literature. In the rest of this work we study the case where m ≥ 2. All these considerations motivate the next definition. • The tour T a = (l 0 , l 1 , . . . , l n , l n+1 ) associated with agent a is an m-stop tour, m ≥ 2, if, in the worst case, agent a can visit restaurants l 1 , l 2 , . . . , l m in this order without violating her time constraints.
In such a tour, l 1 is the first stop, l 2 is the second stop, and so on, with l m being the final m th stop.
• If ∀a ∈ A, T a is an m-stop tour, then the resulting game is the m-stop DKPRG.
Let us now explore the spatial ramifications of our assumption that the restaurants are uniformly distributed within the overall city area. We now give the formal definition of uniform distribution. Definition 4.3. Given a region B on the plane, a random variable L has uniform distribution on B, if given any subregion 1 C the following holds: We assume of course that L takes values in B.
The above definition is adapted from [53]. For a more general and sophisticated definition in terms of measures we refer the interested reader to [54].
Proposition 4.1. Assuming that the n restaurants are uniformly distributed on the whole city area, then if the city area is partitioned into n regions of equal area, the expected number N p of restaurants in each region is exactly 1.
Proof. Let B stand for the whole city area and let B 1 , . . . , B n be the n regions. The hypotheses assert that: Invoking the fact that the n restaurants are uniformly distributed on the whole city, we deduce from (1) that for every restaurant r j , 1 ≤ j ≤ n, and for every region B p , 1 ≤ p ≤ n, We may now define the following collection of auxiliary random variables N pj , where 1 ≤ p, j ≤ n.
N pj = 1 if restaurant r j is located in region B p 0 otherwise . Then, the random variable gives the number of restaurants in region B p , 1 ≤ p ≤ n. We are not interested in the actual value of the random variable N p per se, but in its expected value E [N p ]. The latter can be easily computed if we use the above results and the linearity of the expected value operator.
This establishes that the expected number of restaurants in each region is precisely 1 and proves formula (2).
Partitioning a city area into n disjoint regions of equal area might not be an easy task. The point is that for large values of n, as is the standard assumption in the literature, it is certainly doable. We stress the fact the shape of the regions need not be the same. Indeed, the validity of Proposition 4.1 holds irrespective of whether the regions have the same shape or any particular shape for that matter.
This topological layout of the restaurants is shown in Figures 4 and 5. In these Figures, the regions are drawn are squares, but this is just for convenience and to facilitate their graphic depiction. As we have explained, the regions are not required to have the same shape and nor does their shape need to resemble a regular two dimensional figure. For very large values of n, partitioning a city into very small identical squares is a good approximation, as we know from the field of image representation.
It is useful to contrast the two Figures. The latter depicts the situation where the number of restaurants is much larger compared to the number of restaurants in the former Figure. This demonstrates clearly what happens when n increases significantly, i.e., when n → ∞. Irrespective of the size of magnitude of n, the expected number of restaurants in each of the n regions (recall that they are pairwise disjoint and of equal area) remains 1. What does change however is the area of each region, which decreases with n and, as a consequence, the expected distance between restaurants located in adjacent regions.   Let us make the rather obvious observation that there is a meaningful notion of distance defined between any two points, or locations if you prefer, in the entire city area. In reality, this can be the geographical distance between any two locations, expressed in meters or kilometers or in some other unit of length. For instance, let us consider two points x and y with spatial coordinates (x 1 , x 2 ) and (y 1 , y 2 ), respectively. A typical manifestation of the notion of distance is the Euclidean distance: (x 2 − x 1 ) 2 + (y 2 − y 1 ) 2 between x and y. In any event, we take for granted the existence of such a distance function defined on every pair of points (x, y) of the city, which is denoted by d(x, y). • The distance between two regions B p and B q is defined as d(B p , B q ) = inf{d(x, y) : x ∈ B p and y ∈ B q } . (3) • Two regions B p and B q are adjacent if • We define the concept of diameter (see [55] for details) for the regions B p , 1 ≤ p ≤ n. In particular, we define Proposition 4.2. Let the n restaurants be uniformly distributed on the city area and assume that the whole area is partitioned into n regions of equal area. If r p and r q are the restaurants located at adjacent regions B p and B q respectively, where 1 ≤ p = q ≤ n, then the distance d(r p , r q ) between them is bounded above by diamB p + diamB q : Proof. Consider two adjacent regions B p and B q . By (4), this means that d(B p , B q ) = 0, which in turn implies that ∀ε ∃x ∈ B p ∃y ∈ B q such that d(x, y) ≤ ε (⋆). In view of Proposition 4.1, one expects to find exactly one restaurant in B p and exactly one restaurant in B q . So, let r p and r q be the restaurants located at regions B p and B q , respectively, and consider the distance d(r p , r q ) between them. By the triangle inequality, which is a fundamental property of every distance function, we may write that d(r p , r q ) ≤ d(r p , x) + d(x, y) + d(y, r q ), ∀x ∈ B p ∀y ∈ B q (⋆⋆). From (⋆) and (⋆⋆) we conclude that ∀ε ∃x ∈ B p ∃y ∈ B q such that d(r p , r q ) ≤ d(r p , x) + d(y, r q ) + ε (⋆ ⋆ ⋆). Now, according to (5), d(r p , x) ≤ diamB p and d(y, r q ) ≤ diamB q . These last two relations combined with (⋆ ⋆ ⋆), give that d(r p , r q ) ≤ diamB p + diamB q , as desired.
The above upper bound can be simplified if we further assume that all regions have the same geometric shape. This regularity does not impose any serious restriction on the overall setting of the game and allows us to assert that diamB 1 = . . . = diamB n = D, in which case inequality (6) becomes: In the special case where the regions are squares, as depicted in Figures 6 and 7, one can easily see that the diameter D is proportional to 2 n : A comparison between Figures 6 and 7 demonstrates that the expected distance between restaurants which lie in adjacent regions is quite short, as it is bounded above by the sum of the diameters of the corresponding regions. The diameter of the regions decreases as n increases, and in the special case shown in these two Figures, the diameter decreases in proportion to 1 √ n . In layman terms, this means that adjacent restaurants get very close to each other as n → ∞. Once the agent arrives at a restaurant, then,  with high probability, visiting an adjacent restaurant will only incur a negligible extra cost that will not violate her time constraints. We clarify that we are not making any assumption about the probabilistic distribution of the agents. One possibility is that the agents might be concentrated in the "center," or in another specific location of the city area, as is tacitly assumed by the original KPRP. Another possibility is that the agents follow a random distribution over the area, for instance they might also follow the uniform distribution. The former case is depicted in Figure 8 and the latter in Figure 9. The crucial observation is that in both cases any of the n agent can, potentially, have lunch in any of the n restaurants and return back in time. This fact implies that, assuming each agent follows the (near-optimal) tour produced as a solution to her individual TSP, she may visit a second, or even a third, restaurant if her previous choices proved fruitless. To see why this is indeed so, one may consider for instance agent a 1 in both Figures 8 and 9 and the restaurant that is furthest apart. Without loss of generality let us say that in both cases this is restaurant r n . Being able to visit r n while adhering to her time constraints, implies being also able to pass through adjacent restaurants within the same time window.

Mathematical analysis of the utilization
The current section is devoted to the analytic estimation of the evolution of the game parameters and the daily utilization of the proposed strategy scheme. Let us briefly summarize the policy that regulates the m-DKPRG.
• At the beginning of day 1 all n agents are in the same position, in that they have not got lunch yet, and they in a precarious state not knowing if they will manage to eat eventually. So, at this point in time they are all unsatisfied. The situation with the restaurants is symmetrical. All n restaurants face uncertainty in that it is yet unknown whether they will be chosen by at least one customer. Therefore, at this point they are all vacant.
• The situation is quite different at the end of day 1. A significant percentage of the n agents, as will be shown in this section, managed to get lunch. An equal percentage of the n restaurants was utilized. The common strategy followed by all agents ensures that the same agents will get lunch next day, the day after the next, etc. These agents are satisfied, since they have effectively "won" the game. Symmetrically, the same restaurants will be utilized every day from now on. They will be permanently reserved. • At the beginning of day 2, only those agents that failed to eat yesterday will essentially play the game. These will the active players of day 2. The active players will strive to get lunch exclusively to the restaurants that did not serve any customer yesterday. The rest of the agents are already satisfied and will certainly have lunch today, each one at the specific restaurant that (eventually) served her yesterday.
• By the end of day 2, a significant percentage of the active agents will have succeeded in getting lunch. Thus, the total number of satisfied agents will increase by the amount of today's gains. Of course, an equal percentage of yesterday's vacant restaurants will also be utilized for the first time today.
• This process will continue ad infinitum.
The next concepts will prove useful in our analysis.
• The expected number of agents that managed to eat lunch during day 1 is denoted by A s 1 and the expected number of agents that failed to eat lunch during day 1 is denoted by A u 1 .
• The expected number of agents that got lunch for the first time during day t, t = 2, 3, . . ., is denoted by A s t . The expected number of agents that failed to get lunch during day t, t = 2, 3, . . ., is denoted by A u t .
• Symmetrically, the expected number of restaurants that served lunch during day 1 is denoted by R r 1 and the expected number of restaurants that did not serve lunch during day 1 is denoted by R v 1 .
• The expected number of restaurants that served a customer for the first time during day t, t = 2, 3, . . ., is denoted by R r t . The expected number of agents that failed to serve lunch during day t, t = 2, 3, . . ., is denoted by R v t .
• The vacancy probability of day 1 is the probability that a restaurant did not accommodate any customer during day 1 and is designated by V P 1 .
• The vacancy probability of day t, t = 2, 3, . . ., designated by V P t , is the probability that a restaurant that has not served any customer before day t did not serve a customer during day t either.
• In the m-stop DKPRG, only the customers that have yet to get lunch participate actively in today's game. The agents that actually play the game at the beginning of day t, seeking a restaurant to get lunch, are called active players and their expected number is denoted by n t .
• The expected utilization of day t, t = 1, 2, . . ., denoted by f t , is the fraction of the expected number of agents that were served during day t. The steady state utilization is defined as f ∞ = sup{f t : t ∈ N}.
Equiprobability of tours. The following analysis is based on the premise that all n! tours are equiprobable. In the rest of this paper we shall refer to this assumption as the equiprobability of tours assumption (EPT for short). In view of the discussion in the previous sections, this premise is well justified.
An immediate consequence of the EPT assumption is the equiprobability of each restaurant appearing in any position. In particular, let us recall that in the tour T a = (l 0 , l 1 , . . . , l n , l n+1 ), corresponding to agent a, l 0 = l n+1 is the starting point of a and l k , 1 ≤ k ≤ n, is the index of the restaurant in the k th position of the tour. We may easily calculate the probability that a restaurant is in a specific position of the tour, as well as the probability of the complementary event. For easy reference, these facts are collected in the next Proposition whose proof is trivial and thus omitted.
Proposition 5.1. Assuming the equiprobability of tours, the following hold.
∀a ∈ A ∀r ∈ R ∀k, 1 ≤ k ≤ n, P (r not in position k of T a ) = n − 1 n (10) The above can be generalized to handle the case of a restaurant r appearing in one of w distinct positions k 1 , k 2 , . . . , k w , where 1 < w ≤ n.
∀a ∈ A ∀r ∈ R P (r is in one of positions k 1 , . . . , k w of T a ) = w n (11) ∀a ∈ A ∀r ∈ R P (r not in any of positions k 1 , . . . , k w of T a ) = n − w n (12) We only mention that the above hold for every restaurant, every position, and, of course, for every tour. Since the probability that restaurant r ∈ R is in the k th position of the tour of agent a is 1 n , the probability of the complementary event, i.e., that restaurant r is not in the k th position of T a is n−1 n . If we deem as "success" the case where r is indeed in the k th position of T a and as "failure" the case where r is not, then this situation is a typical example of a Bernoulli trial, having probability of success 1 n (also referred to as parameter, see [56]) and probability of failure n−1 n . In view of (9) we denote this as Analogously, the probability that restaurant r ∈ R appears in one of w, 1 < w ≤ n, distinct positions of the tour of agent a is w n . The probability of the complementary event, i.e., that restaurant r is not in any one of these w positions of T a is n−w n . This time, one may view as "success" the case where r is indeed in one of the designated w positions of T a and as "failure" the case where r is not. So, once again we are facing with a Bernoulli trial, this time with parameter w n .
P (r is in one of positions k 1 , . . . , k w of T a ) ∼ Ber( w n ) , ∀a ∈ A ∀r ∈ R .
The fact that the n agents calculate their tours independently, implies that n independent Bernoulli trials take place simultaneously, all with the same success and failure probabilities. This situation is described by the binomial distribution with parameters (n, p) 2 , denoted by Bin(n, p), where p = 1 n in the simple case of one position and p = w n in the general case of w positions. By employing well-known formulas from probability textbooks we may assert the following Proposition, whose proof is also trivial. Proposition 5.2. Given a restaurant r, if its appearance in a specified position k in one tour counts as one success, whereas its failure to appear in the specified position k in one tour counts as one failure, then the probability of exactly l appearances in position k in total is given by ∀r ∈ R ∀k, 1 ≤ k ≤ n, P (r appears l times in position k in n tours) = n l In the special case, where r never appears, that is it appears 0 times, in the specified position k, the above formula becomes: ∀r ∈ R ∀k, 1 ≤ k ≤ n, P (r never appears in position k in n tours) More generally, the probability that restaurant r appears exactly l times in total in one of the w distinct positions k 1 , . . . , k w , 1 < w ≤ n is given by If r never appears, that is it appears 0 times, in anyone of the w designated positions k 1 , . . . , k w , 1 < w ≤ n, the previous formula reduces to: ∀r ∈ R P (r never appears in any of positions k 1 , . . . , k w in n tours) We must emphasize that the above hold for every restaurant r ∈ R, for every position k, 1 ≤ k ≤ n, and for every set of positions {k 1 , . . . , k w }, 1 < w ≤ n . In other words, for every restaurant, the probability that it does not appear in one specific position in any of the n tours is n−1 n n , and the probability that it does not appear in any of w distinct positions in any of the n tours is n−w n n .
According to the strategy scheme employed in the m-stop DKPRG, at the start of the second (third, etc.) day, the satisfied customers always go straight to the restaurant that eventually served them the previous day. We stress the word eventually because an agent may have failed to get lunch during stop 1 of the previous day, but she may have succeeded during the second, third, or m th stop. This strategy is followed by all agents, something that guarantees that those customers that were satisfied on the previous day will remain satisfied today. Effectively, this strategy implies that the satisfied agents have "won" the game and from now on they do not need to solve their personalized TSP. The game will be played competitively by the unsatisfied agents of the previous day. We assume that they are aware of the unoccupied restaurants and, therefore, each one of them will once again solve her personalized TSP to compute her near-optimal tour. Of course, today the network of restaurants will consist of only the unoccupied restaurants, i.e., it will be significantly smaller that yesterday. The one-shot m-stop DKPRG of today will be different from the one-shot game of the previous day in a critical factor: the number of "actively competing" players will be significantly smaller. By the nature of the game, the number of active players at the beginning of stop 1 of the present day is equal to the number of unsatisfied customers at the end of the previous day. The way the expected number of active players varies with each passing day is captured by the following Theorem 5.1.
Theorem 5.1. The daily progression of the m-DKPRG is described by the following formulas, where t stands for the day in question.
Proof. The proof of the above formulas goes as follows.
1. We first prove the auxiliary result that the vacancy probability at the beginning of stop z, 1 ≤ z ≤ m, of day t is • Indeed, at the beginning of stop 1 of day t, the expected number of restaurants that have not served any customer yet is equal to the expected number n t of active agents. On day t, the game is all about the active agents and the restaurants that have never been utilized up to now. At this moment in time all these restaurants are still unutilized, so vacancy is a certainty. Thus, indeed V P t,1 = 1, which is in agreement with (27) when z = 1.
• We recall that, according to our strategy, at the beginning of day t the expected number of restaurants that have not served any customer yet is equal to the expected number n t of active players. At the beginning of stop 2 of day t, the probability that one of these restaurants r has not served lunch yet is precisely the probability that r never appears in position 1 in any tour of the active players. This last probability is given from (16), where of course n must now be replaced by n t . Hence, which is also in agreement with (27) when z = 2.
• Let us now carefully examine what happens during stop 2 of day t of the game. According to our scheme, those customers who have failed to get lunch at their first destination will immediately proceed to their second destination. For example, if customer a, who follows tour T a = (l 0 , l 1 , . . . , l n , l n+1 ), was not served at restaurant r l1 , she will try restaurant r l2 . However, an added complication arises now. It may well be the case that r l2 is already occupied from stop 1. In such a case r l2 is completely unavailable, i.e., it is now serving another active agent. In view of this fact, we may conclude that the restaurants that are vacant at the beginning of stop 3 must satisfy two properties: (P1) they must be vacant at the beginning of stop 2, which means that must never appear in position 1 in any tour of the active players, and (P2) they must never appear in position 2 in any tour of the active players.
The above are summarized more succinctly in the following rule.
(C) The restaurants that have not served any customer up to day t and are still vacant at the beginning of stop 3 of day t, never appear in position 1 or position 2 in any tour of the active players.
Therefore, V P t,3 = P (r never appears in positions 1 or 2 in n t tours) which is again in agreement with (27) when z = 3.
• The same reasoning can be employed to show that the vacancy probability V P t,z at the beginning of stop z of day t is V P t,z = P (r never appears in positions 1, . . . , z − 1) Hence, we have proved the validity of (27).
• Finally, to calculate the probability that one of the restaurants r that have not served any customer up to day t is still vacant at the end of stop m of day t, which in effect means at the end of day t, we must determine the probability that r never appears in positions 1, or, 2, or . . ., or m in any tour of the active players. Thus, V P t = P (r never appears in positions 1, . . . , m)  (19) to be valid, it must hold that n t ≥ m, otherwise it cannot be regarded as a probability. The physical meaning of this restriction is that (19) is meaningful and correct as long as there are at least as many active players as stops m. If on some day t we have that n t ≤ m, then the strategy we adhere to will make sure that all n t active players will manage to get lunch during day t.
2. Let us clarify that our sample space consists precisely of the restaurants that have not served any customer up to day t. The expected number of restaurants in our sample space that remained vacant at the end of day t is given by R v t . First, we express probabilistically those restaurants of our sample space that remain vacant after all agents visit their first m destinations. We define the family of random variables R v tj , 1 ≤ j ≤ n t . The random variable R v tj indicates whether restaurant r j is vacant or not at the end of day t. Specifically, if R v tj has the value 1, then restaurant r j is vacant at the end of day t, whereas if R v tj is 0, then r j is occupied.
R v tj = 1 if restaurant r j is vacant at the end of day t 0 otherwise , 1 ≤ j ≤ n t . Having done that, we define the random variable R v t , which counts the the number of restaurants that are vacant at the end of day t.
As always, in this probabilistic setting, we are interested not in the actual value of the random variable R v t , but in its expected value E[R v t ]. In view of definition ( 5.1.vii ) and the linearity of the expected value operator, we derive that 3. Recall that our sample space contains exactly those restaurants that have not served any customer up to day t. R r t denotes the expected number of the restaurants of the sample space that were visited by an agent by the end of day t. Now, we define the family of random variables R r tj , 1 ≤ j ≤ n t , which indicate whether restaurant r j is occupied or not at the end of day t. Specifically, if R r tj has the value 1, then restaurant r j is occupied at the end of day t, whereas if R r tj is 0, then r j is vacant.
By combining definition ( 5.1.ix ) and equation (19), we deduce that Having done that, we define the random variable R r t , which counts the the number of restaurants that are occupied at the end of day t.
We are not interested in the actual value of the random variable R r t , but in its expected value E[R r t ]. In view of definition ( 5.1.xi ) and the linearity of the expected value operator, we derive that which verifies (21).
4. The rules of the game stipulate that the number of customers that have not managed to eat lunch at the end of day t is equal to the number of restaurants that have not served any customer at the end of day t. Hence, their expected values are also equal, which means that A u t = R v t and (22) is proved. 5. Likewise, the adopted strategy ensures that the number of the active players that succeeded in getting lunch at the end of day t is equal to the number of restaurants that, although they had not served any agent up to day t, they managed to accommodate a customer by the end of day t. Hence, their expected values are also equal, which means that A s t = R r t and (23) is proved. 6. We are now in a position that enables us to assert the expected number of active players.
• At the beginning of the first day, the numbers of active players is exactly n. This trivial observation confirms the initial condition (24).
• As we have previously explained, the adopted strategy in the m-DKPRG ensures that the number of agents that have not got lunch at the end of day t is always equal to the number of active players on day t + 1. Thus, the expected number of active agents on day t + 1 is equal to the expected number of unsatisfied agents at the end of day t: n t+1 = A u t . which establishes the validity of (25), as desired. 7. The expected utilization f t for day t = 1, 2, . . ., is the ratio of the expected number of agents that were served during day t. This last numbers is equal to the expected number of customers that got lunch on day 1, plus the expected number of the additional customers that got lunch on day 2, and so on. The additional agents of day t are precisely those agents that had failed to get lunch prior to day t, but succeeded in eating on day t. Their expected number is A s t , which is given by equation (23). Hence, the total number of agents that have eaten lunch up to and including day t is given by An equivalent way to compute this exact number is by subtracting from the total number of agents n the expected number of agents that failed to get lunch on day t, which is A u t , which is given by equation (22). Thus, which establishes the validity of (26), as desired.
Let us now make an important observation: formula (20) that we derived above, and which gives the expected number of vacant restaurants at the end of day t, is completely general and subsumes more special formulas found in the literature. Take for example the special case where t = 1 and m = 1. For these values, (20) computes the expected number of vacant restaurants at the end of day 1 for the standard one-stop KPRP. By subtracting this quantity from n, the number of initially available restaurants, and then dividing by n, we derive the expected utilization ratio for day 1. Indeed One assumption that is taken for granted in the literature is that the number of agents n tends to infinity. It is straightforward to see how the above formula simplifies when n → ∞. We recall a very useful fact from calculus (see for instance [57]), namely that Under this premise, we see that lim n→∞ f → 1 − e −1 , which is in complete agreement with a well-known result of the literature.
Corollary 5.1. If we assume that n → ∞, then the following approximations hold, where t is the day in question.
To demonstrate how the exact formulas (19) -(26) reflect the daily evolution of the m-DKPRG we study five typical instances of the game. The first four are instances of 2-DKPRG games with substantially different number of players. In the first four games, the number of steps m is 2, meaning that each agent may visit two restaurants if the need arises. In the first example the number of agents n is 100, a relatively small number, and its detailed progression is shown in Table 1. The steady state utilization is, as expected, 1 and it is achieved by the end of day 3.
Number of stops m 2 Number of agents n 100 In the second example the number of agents n is 1000 and its progression is shown in Table 2. The steady state utilization is, as expected, 1 and it is achieved by the end of day 5, i.e., 2 days later compared to the previous example.
Number of stops m 2 Number of agents n 1000   Table demonstrates the progression of the m-DKPRG for m = 2 and n = 1000. One may ascertain that all customers eat lunch by the end of day 5.
The third example is more meaningful and interesting because in this case the number n of agents is 10 6 , which may be thought of as representing the average case.  is rapidly achieved by the end of day 8. Although it takes longer to reach that stage, utilization upwards of 0.98 is established from day 2.
The fourth example is instructive about the behavior of our strategy when a large number of agents is involved. In this case the number n of agents is 10 9 and, unsurprisingly, it takes 11 days to reach the steady state utilization 1. All the details of the progression of this game are given in Table 4. Careful observation of the data confirms a major characteristic of our distributed game: for m = 2 steps the first day utilization is at least 0.86 and it goes over 0.98 from day 2.
It is quite straightforward to convince ourselves that playing a 3-stop game is better than playing a a 2-stop game. A precise quantitative analysis of the resulting advantages can be performed by considering the exact formulas (19) - (26). Nonetheless, we believe it is expedient to showcase the difference with the following example. The present example resembles the previous one in that the number of agents is the same, namely 10 9 . However, this time each agent may visit up to three restaurants if need be. Such an instance, with a large number of agents, can serve as the best demonstration of the dramatic improvement that can be obtained by an increase in the number of steps. Indeed, the data in the Table 5 corroborate this expectation, as one can now see that all restaurants are utilized by the end of day 8, compared to day 11 before, the utilization at the end of first day is already up to an impressive 0.95 and becomes 1, for all practical purposes, at the end of day 6. This last example can be considered as a compelling argument that advocates the importance of topological analysis for the network of restaurants.
The above examples were studied using the exact formulas (19) - (26). Tables 1 -5 reflect the daily evolution of the above five instances of the m-DKPRG according to rigorous mathematical description provided by formulas (19) - (26). The next Figure 10 is a graphical representation of the exact utilization f t from all the previous examples, as shown in the Tables 1 -5. In this Figure, the generally excellent behavior of this scheme can be easily verified. We point out the rapid convergence to the steady state in a matter of few days and, especially, the superiority of the three stop policy. The latter achieves 0.95 utilization from the first day and above 0.99 from the second day.
The above remarks must not diminish the value of the approximate formulas (30) - (36). Their value lies on the fact that they can provide easy to compute and particularly good approximations for large n. A simple comparison of Figure 10 to the approximations shown in Figure 11, which corresponds to the case m = 2, and in Figure 12, which depicts the case where m = 3, ascertains their accuracy.

Conclusion
This work explored a completely new angle of the Kolkata Paise Restaurant Problem. The topological layout of the restaurants takes center stage in this new paradigm. Initially, we explicitly stated certain       assumptions that are implicitly present in the standard formulation of the game. Having done that, we undertook the radical step to go past them and create an entirely new setting. The critical examination of the topological setting of the game unavoidably enhanced our perception regarding the locations of the restaurants and suggested a more realistic topological layout. We argued that their uniform distribution in the entire city area is the most logical, fair, and probable situation. As a result, we defined a new version of the game that is spatially distributed and, for this, is is aptly named the Distributed Kolkate Paise Restaurant Game (DKPRG).
The uniform probabilistic distribution of the restaurants enabled us to rigorously prove that, as their number n increases, the restaurants get closer and the distance between adjacent restaurants decreases. In such a network, every customer has the opportunity to pass through more than one restaurants within the allowed time window. The agents now become travelling salesmen and this led us to suggest the innovative idea that TSP can be used to increase the chances of success in this game. We propose that each agent should use metaheuristics to solve her personalized TSP because metaheuristics produce near-optimal solutions very fast and as such can be easily used in practice. This culminated in the development of a new and more efficient strategy that achieves greater utilization.
After rigorously formulating DKPRG, we proved completely general formulas that assert the increase in utilization of our scheme. We established that utilization ranging from 0.85 to 0.95 is achievable. This was shown in great detail in Tables 1 -5, which depict the daily progress of characteristic instances of the DKPRG according to the rigorous mathematical description provided by the exact formulas (19) - (26). Apart from the exact formulas, we also derived the approximate formulas (30) - (36). They can be quite useful because they are considerably easier to compute and are exceedingly good approximations for large n. This fact is easily corroborated by comparing Figure 10 to the approximations shown in Figures 11 and  12. Let us remark that the derived equations generalize previously presented formulas in the literature.
It is worth mentioning that the fact that our strategy exhibits very rapid convergence to the steady state of utilization 1.0 can be potentially used to address the following situation. An issue that remains and is common to almost all works in the literature is the simple matter that a near optimal utilization may not, in general, be optimal for every agent individually. A socially efficient outcome where every agent eats lunch and every restaurant gets a customer to serve, is not necessarily optimal for the individual customer, in the sense that an agent may get served in a restaurant of low preference. A possible solution to this might be to reset the game periodically. We expect that adopting a reset period, i.e., setting a specific period of days, after which the system is reset and the game starts from scratch, may alleviate this drawback. In any event, this idea for a future work will require further study and experimental evaluation of its usefulness. Finally, another possible direction for future work could include extensive experimental tests and further investigation of other versions of TSP. For instance, that there exists a more restrictive version of the TSP, the Travelling Salesman Problem with Time Windows (TSP-TW). TSP-TW is a constrained version of TSP in which the salesman must visit the cities within a specific time window. This version is even more complicated and difficult to solve. However, the inherent time constraints built-in the TSP-TW may provide for an even more realistic modeling of the DKPRG, so it is a research avenue that we believe is worth pursuing.