Information Theory Optimization Algorithm for Efficient Service Orchestration in Distributed Systems

Distributed Systems architectures are becoming the standard computational model for processing and transportation of information, especially for Cloud Computing environments. The increase in demand for application processing and data management from enterprise and end-user workloads continues to move from a single-node client-server architecture to a distributed multitier design where data processing and transmission are segregated. Software development must considerer the orchestration required to provision its core components in order to deploy the services efficiently in many independent, loosely coupled - physically and virtually interconnected - data centers spread geographically, across the globe. This network routing challenge can be modeled as a variation of the Travelling Salesman Problem (TSP). This paper proposes a new optimization algorithm for optimum route selection using Algorithmic Information Theory. The Kelly criterion for a Shannon-Bernoulli process is used to generate a reliable quantitative algorithm to find a near optimal solution tour. The algorithm is then verified by comparing the results with heuristic solutions in 3 test cases. A statistical analysis is designed to measure the significance of the results between the algorithms and the entropy function can be derived from the distribution. The tested results shown an improvement in the solution quality by producing routes with smaller length and time requirements. The quality of the results proves the flexibility of the proposed algorithm for problems with different complexities without relying in nature-inspired models such as Genetic Algorithms and Simulated Annealing. This algorithm can be used by orchestration applications to deploy services across large cluster of nodes by making better decision in the route design.


# INTRODUCTION
Distributed Information Systems (DS) are growing in popularity across the software industry as it provides more computational and data transmission capacity for applications and become an essential infrastructure that is needed to address the increase in demand for data processing.
DS are used as a cost-efficient way to obtain higher levels of performance by using a cluster of low-capacity machines instead of a uniquesingle point of failure -large node. A DS is more tolerant to individual machine failures and provides more reliability than a monolithic system.
There are many algorithms proposed in the literature to solve the routing problem such as 2opt, ant colony, greedy algorithm, genetic algorithm (GA) and simulated annealing (SA) but very limited work is found using Algorithmic Information Theory to find the boundaries of decision problems in Turing Machines. In this paper we propose a variation of the TSP by defining the decision problem for the candidate solution as a Shannon-Bernoulli process that follows a lognormal distribution for the cost distance variable (i.e. created by a utility function for the TSP).
An orchestration job has to deploy efficiently S types of services (or tasks) in many different computing resources such as a cluster of containers or a pool of (physical or virtual) machine nodes connected over a distributed network with different weights (or costs) between each pair of nodes. This job is a process that needs to run on all M (unique) resources points. The cost to use each node can be defined as the (round-trip) network latency between the nodes in the network or the financial cost associated to the proportional quantitative utilization rate for each resource in a given time period. As more Computational Capacity is added, choosing the shortest route to multiple target nodes will be more computationally complex. Figure 2 demonstrate the deployment of services {A, B, C} for nodes {1, 2, 3, 4, 5} in a cluster of machines connected over a common network.

Figure 2 Computing Provider's distributed over a network of machines with the respective published Service's and Available nodes
The Traveling Salesman Problem was introduced by William Hamilton and Thomas Kirkman. It also known as the messenger problem. The problem asks given a list of cities and the distance between each pair of nodes what is the shortest route that visits all cities exactly once and returns to the original city at the end. There are several researches dedicated to address this routing problem and it has applications in mathematics, computer science, statistics and logistics.
The computational complexity of an algorithm shows how much resources are required to apply an algorithm such as how much time and memory are required by a Turing machine to complete execution and can be interpreted as a measure of difficulty of computing functions. A measurement of computation complexity is the big O notation. It can be defined as: Let two functions f and g such as f(n) is O(g(n)) if there are positive numbers c and N such that ( ) <= ( ) for all >= and is used to estimate the function growth tax (asymptotic complexity).
The TSP problem is an important combinatorial optimization problem. As most of the decision problems, it is in the class of NP-hard problems.
Consider a salesman traveling from city to city and some of them are connected. His goal is to visit each city exactly once and go back to the first city when he finishes. The salesman can choose any path as long as its valid (i.e. visit each city once and finish at the city it has started the tour) he also wants to minimize his cost by taking the shortest route. This problem can be described as a weighted graph G where each city is a node (or vertex) and is connected by a weighted edge only if the two cities are connected by any kind of road, and this road do not cross any other city. The utility function in the TSP is the Euclidian distance. Figure 3 demonstrate the cost matrix between each pair of nodes of a set defined by valid (non-repeating) permutations in a language L with symbols {A, B, C, D}. The cost/weight between points is calculated as a Euclidian distance in a 2D graph. Table 1 demonstrate a sample of valid and invalid strings created from L.  The graph can be represented as a matrix where each cell value is defined as the respective cost (distance) w between nodes v and u. For N nodes the distance matrix is defined as D = w (v,u) for all (unique) pair of N. The goal of the TSP is to find a permutation π that minimizes the distance between nodes. For symmetric instances the distance between two nodes in the graph is the same in each direction, forming an undirected graph. For asymmetric instances the weights for the edges between nodes can be dynamic or non-existent.
The weigh value of the edge is defined as the distance of the tour (roads) between cities. For symmetric TSP, as the number of nodes (or cities) increases in the graph G, the number of possible tours growth choice also increase and is exponential. If we consider N nodes, the function of the input size is This is the number of elements (states) an algorithm must evaluate to decide (to halt) the problem and is very large thus requiring considerable time and computational resources even for small instances of the problem.
A strategy to address this limitation is to accept near-optimal solutions by setting constrains in the problem using heuristics methods (approximations). Algorithms such as 2opt, GA and SA define − knowledge about the distribution of the solution space and then repetitively try to improve the quality. It works by following some heuristic (approximation) function schema while trying to avoid a local minimal. As the heuristics (for TSP and NP problems in general) are a best effort strategy to find a (good) near-optimal solution (by enforcing space and time boundaries), it does not guarantee that the solution found is the best candidate to the problem and therefore an program can never be sure that if by running more time the overall solution cost could be improved, unless the entire solution space to the problem is evaluated. This limitation is set by the definition of NP-hard class.
Computational Complexity Theory Problems in the NP class can be solved by a nondeterministic polynomial algorithm. Any given class of algorithms such as P, NP, coNP, regular etc. must have a lower bound that index the best performance any problem in the class can have. This bound can be described as the total amount of input items (or symbols) a machine must process (before halting), and the respective output items produced following a Probability Distribution Function and a given finite Alphabet. A strategy to find a solution to the decision problem is to find a function that reduce or transform a problem from a domain in which there is no know solution to a constrained domain with a known solution.
This allows the algorithm to search the solution space and decide if any solution is a valid (yesinstances) or invalid (no-instances) and its computable by a polynomial-time algorithm. This strategy allows us to map instances of the Hamiltonian circle problem to a decision version of the Traveling Salesman Problem and can be described as a decision problem to determine if exists a Hamiltonian circuit in a given complete graph with positive integer weights hose length is not greater than a given positive integer m. Each valid (yes-instance) in the TSP problem is mapped to a valid instance in the Hamiltonian problem space and this transformation can be done in polynomial time.
In Figure 4 reproduced from (Raskhodnikova, 2016) we have a visual representation of computational complexities.

Figure 4 Diagram Representation for the many categories of Computational Complexity
Hamiltonian Graph. A Hamiltonian cycle (or circuit) can described as a "path" that contains all nodes and the elements in this set are not repeated, with exception to the final vertex. This means that a Hamiltonian cycle in G with start node v has all other nodes exactly once and them finishes at node v. A graph G is Hamiltonian if it has a Hamiltonian cycle. A Hamiltonian cycle with minimum weight is an optimal circuit and therefore is the shortest tour in the TSP Problem.
The Figure 5 provides an example of Hamiltonian circuit for a Graph G with 5 nodes {A, B, C, D, E}. The Table 2 shows the cost matrix for the super graph G

Figure 5 Hamiltonian Cycle from a Super Graph
Although heuristics methods define special cases for the TSP problem and produces near-optimal solutions with short length (weight) Hamiltonian cycles it does not guarantee that the results are the shortest circuit possible. The algorithms to solve the TSP are grouped in 2 categories: exact (Brute-force, greedy) and approximation algorithms (Heuristics such as Simulated Annealing and Genetic Algorithm).

Nature inspired models.
Researchers have proposed algorithms inspired by natural events and structures like the heating of metals and the growing behavior of biological organisms. Those methods do not iterate over the entire solution space but rather a portion in order to find the local minimum. They start with an initial random solution and tries to improve the solution quality over each interaction until some input Threshold parameter factor T is reached like a maximum number of interactions; maximum number of candidate solutions; no further improvements found after several iterations; the rate of decay in a dependable temperature probabilistic function or a minimum quality threshold is achieved.
Therefore heuristics (approximation) methods can be interpreted as a non-deterministic way to address the error rate between the known solutions and the unknown solutions in polynomial time (i.e. Entropy reduction methods). Although such algorithms do not have to traverse the entire solution space it must decide -or "bet" -when a random candidate solution with negative gain will be accepted (i.e. candidate with worst solution quality than current know best solution) in the hopes that eventually it would lead to the shortest distance (i.e. a better solution quality).
Nature-inspired models such as Genetic Algorithms (GA) and Simulated Annealing (SA) use prior information to improve the solution results and thus are biased towards this encoding. Alternatively, by modeling the TSP problem as a communication channel with a probability density function associated with the stochastic process that generates the solutions at random (following a Bernoulli process), thus we can bound the limits of the search space to a log-normal distribution.
The advantage of this method is that by relying on the statistical analysis of the solution space instead of the computational complexity of the problem we can have equal or better quality than the traditional algorithms without relying on computationally complex implementations that have a higher time and space constrains.
Therefore, this paper attempts to provide an algorithm to solve the TSP using for the decision rule the entropy measured for the solution cost distribution H(X) and by maximizing the expected value of the logarithm of cost/weight/distance variable, defined as the utility function g(X). This is equivalent to maximize the expected geometric growth rate.
## CURRENT APPROACH # Literature Review 2opt, k-opt. Croes proposed the 2-opt algorithm (Croes, 1958), a simple local-search heuristic, to solve the optimization problem for the TSP. It works by removing two edges from the tour and reconnects the two paths created. The new path is a valid tour since there is only one way to reconnect the paths. The algorithm continues removing and reconnecting until no further improvements can be found. k-opt implementations are instances of 2-opt function but with k > 2 and can lead to small improvements in solution quality. However, as k increases so does the time to complete execution.
In his work (Glover, 1998) proposed the Tabu Search method and it can be used to improve the performance of several local-search heuristics such as 2opt. As neighborhood searches algorithms like 2opt can sometimes converge to a local optimum, the Tabu search keeps a list of illegal moves to prevent solutions that provide negative gain to be chosen frequently. In 2opt the two edges removed are inserted in the Tabu list. If the same pair of edges are created again by the 2opt move, they are considered Tabu. The pair is kept in the list until its pruned or it improves the best tour. However, using Tabu searches increases computational complexity to O(n3), as additional computation is required to insert and evaluate the elements in the list.
The Figure 6 show the 2-opt moves from (Emir Zunic, 2017).  (Nilsson, 2003) compared several heuristic strategies for the TSP problem such as Greedy, Insertion, SA, GA, etc. He investigated the performance tradeoff between solution quality and computational time. He classifies the heuristics in two class: Tour construction algorithms and Tour Improvement algorithms. All algorithms in the first group stops when a solution is found such as brute-force and Greedy Algorithm. In the second group, after a solution is found by some heuristics, it tries to improve that solution (up to certain computation and/or time constraints) such as implemented by 2opt, Genetic Algorithm and Simulated Annealing. He concluded by showing that the computational time required is proportional to the desired solution quality.
Simulated Annealing (SA). Simulated Annealing are heuristics with explicit rules to avoid local minimal. It can be described as a local random search that temporarily accepts moves with negative gain (i.e. were produced by solutions with worst quality than current). These methods simulate the behavior of the cooling process of metals into a minimum energy crystalline structure.
This concept is analogous to the search of global maximum and minimum. The probability of accepting a solution is set by a probability function of a temperature parameter variable t. As the temperature decreases over time the probability changes accordingly. Figure 7 demonstrates the simulated decay in the temperature function over the number of interactions in an algorithm. The acceptance probability is defined as p(x) = 1 if f (y) <= f (x) and when otherwise The SA algorithm specifies the neighborhood structure and the cooling function. Figure 8 from (Zhan, 2016) represents the SA algorithm flowchart.

Figure 8 Simulated Annealing Algorithm Flowchart
Metropolis Algorithm. Let f(X) be a function with output proportional to a given target distribution function r. The function r is the proposal density. At each iteration of the algorithm it attempts to move around the sample space. For each move it decides sometimes to accept a given random solution or stay in place. The probability of the solution of the new proposed candidate is with respect to the current know best solution. If the proposed solution is more probable than the know existing point, we automatically accept the new move. Else if the new proposed solution is less probable, we will sometimes reject the move and the more the decrease the probability, more likely we will accept the new move. Most of the values returned will be around the P(X) but eventually solutions with lower probability will be accepted. This characterizes can be interpreted as a generalization of the methods proposed by Simulated Annealing and Genetic Algorithms.
Other heuristics such as 2-opt, 3-opt, inverse, swap methods can be used to generate candidate solutions. Several researches such as (Nilsson, 2003) have been made to study the performance of different SA operators to solve the TSP problem. (Zhan, 2016) proposed a list-based SA algorithm using a list-based cooling method to dynamically adjust the temperature decreasing rate. This adaptive approach is more robust to changes in the input parameters. In his work (Kah Huo Leong, 2016) proposed a biological inspired bee system to optimize the routing in Railway systems. They conclude that the average solution results are better or equivalent than the traditional SA and GA methods alone. The quality of the solution can be improved by allowing more time for the algorithm to run. (Steiglitz, 1968) observed that the performance of 2-opt and 3-opt algorithms can be improved by keeping an ordered list of the closest neighbors for each city-node and thus reducing the amount of solutions to search.
Genetic Algorithm (GA). Genetic Algorithms was first introduced by (Holland, 1975) based on natural selection theory, as a stochastic optimization method in random searches for good (nearoptimal) solutions. This approach is analogous to the "survival of the fittest" principle presented by Darwin. This means that individuals that are fitter to the environment are more likely to survive and pass their genetic information features to the next generation.
In TSP the chromosome that models a solution is represented by a "path" in the graph between cities. GA has three basic operations: Selection, Crossover and Mutation. In the Selection method the candidate individuals are chosen for the production of the next generation by following some fittest function In the TSP This function can be defined as the length (weight) of the candidate solutions tour. In Figure 9 we have a representation of genes and Chromosomes. In Figure 10 is demonstrated an example of two parents under the Mutation and Crossover operators to generate a new offspring. Next those individuals are chosen to mate (reproduction) to produce the new offspring. Individuals that produce better solutions are more fit and therefore have more chances of having offspring. However, individuals that produces worst solutions should not be discarded since they have a probability to improve solution in the future. In other words, the heuristic accepts solutions with negative gain hoping that eventually it may lead to a better solution.
Several researches have studied the performance trade off of selection strategy and how the input parameters affect the quality of solution and the computational time. (Razali, 2011) in his work explores different selection strategies to solve the TSP and compare the performances quality and the number of generations required. It concludes that tournament selection is more appropriate for small instance problems and rank-based roulette wheel can be used to solve large size problems. (Goldberg, 1991) compared the quality of the solution and the convergence time on many selection methods such as proportional, tournament and ranking. They conclude that ranking and tournament have produced better results that proportional selection, under certain conditions to convergence. In his work (Zhong, 2005) explored proportional roulette wheel and tournament method. He concluded tournament selection is more efficient than proportional roulette selection.
The Figure 11 contains the pseudo-code for a Genetic Algorithm from (K.P. Ferentinos, 2002) Figure 11 Basic genetic algorithm.

## ALTERNATIVE APPROACH
Information Theory (IT) quantifies the amount of information in a noisy communication channel and is measured in bits of entropy. IT is based in probability theory and statistical distributions. Entropy quantifies the amount of uncertainty in a random Bernoulli variable created by a Bernoulli process thus information can be interpreted as a reduction in the overall uncertainty about a set of finite states. Mutual information is a measure of common information between two random variables and it can be used to maximize the amount of information shared between encoded (sent) and decoded (received) signals. In Table 3 we have the relationship between Information and Entropy. As we increase our knowledge about the states following a probabilistic function distribution, we reduce entropy, as there is less uncertainty about possible state outcomes. Information Theory as an approximation method. Information Theory has applications in a range of fields and is used as a mathematical framework for encoding and decoding of information such as in adaptive systems, artificial intelligence, complex systems, network theory, coding theory, etc. IT quantifies the number of bits required to describe a given data using a statistical distribution function for the input data.
Entropy of a random sequence. Entropy is a measure of uncertainty of a random variable. It is the average rate at which information is produced by a stochastic process. (Shannon, 1948) defined the entropy H as a discrete random variable X with possible values as outcomes draw from a probability density function P(X). Figure 12 demonstrate the variation in entropy H(X) vs a Bernoulli distribution. In Equation 1 the entropy function is defined as:

Figure 12 Entropy H(X) vs Probality Pr(X=1)
Random variables and utility function. Let X be an independent random variable with alphabet L: {001, 010, 100. . .}. A utility function g is used to model worth or value and is defined by g: X -> R (Real) and it represents a preference of relation between states. The utility function Y=g(X) of a random variable X express the preference of a given order of possible values of X. This order can be a logical evaluation of the value against a given threshold or constant parameter. The g(X) is defined by a normal distribution with given mean and variance under some degrees of freedom (i.e. confidence level).
As an example if g(X1 = 001 = 1) = c1 and g(X2 = 010 = 2) = c2 are the costs of two routes between a set of nodes in a super-graph G*, we can use this function to determine the arithmetical and logical relationship between them and decide if c1 is worst, better, less, greater or equal to c2. Therefore g(X1) < g(X2). The probability density function pdf(Y) can be used to calculate the entropy of the distribution of the cost values. An exponential utility is a special case used to model when uncertainty (or risks) in the outcome between binary states and in this case the expected utility function is maximized depending on the degree of risk preference.  Figure 13 shows the histogram for g(X). Table 4 demonstrate the calculation for the mean, standard deviation and variance for g(X), thus we have:   Kolmogorov Complexity. The Kolmogorov complexity (Kolmogorov, 1968) of a string w from language L denoted by Kc_L(w) is the shortest program from alphabet L which produces w as output and halts. The conditional Kolmogorov complexity of string x relative to word w is defined by Kc_L(w|x) and is the length of the shortest program that receives x as input and produces w as output.
Complexity of a string and shortest description length. Let U be a Universal computer (Universal Turing Machine). The Kolmogorov complexity Kc(x) of a string x of a computer U is Kc (x) = min length(p) When p: U (p) = x It is the minimum length program p that output variable x and halts. It's the small possible program. Let C be another computer. If this complexity is general there is a universal computer U that simulates C for any string x by a constant c on computer C. Thus Measuring the randomness of a string. Let Kc (x|y): The conditional Kolmogorov complexity of Xn given Y. Consider for example we want to find the binary string with higher complexity between three variables X1(010101010101010), X2(0111011000101110) and Y (01110110001011). In Table 5 we have the representation and minimal encoding using Xn and Y. Therefore, we can see X1 and X2 can be encoded as combinations of Y and thus Kc(X1|Y) > Kc (X2| Y). This relationship is defined as In Figure 15 from (Maier, 2014) we can see a comparison between a series of strings and the correspondent automata state machine and the regular expression patterns (i.e. regex).

Figure 15 Examples of representations of a given input string set using regex and an automaton.
The expected value of the Kolmogorov complexity of a random sequence is close to the Shannon entropy. This relationship between complexity and entropy can be described as a stochastic process drawn to a i.i.d on variable X following a probability mass function pdf(x). The symbol x in variable X is defined by a finite alphabet. This expectation is

E(1/n) Kc(X^n|n) => H(X)
Kelly criterion and the uncertainty in random outcomes. The Kelly strategy is a function for optimal size of an allocation in a channel. It calculates the percentage of a resource that should be allocated for a given random process. It was created by John Kelly (Kelly, 1956) to measure signal noise in a network. The bit can be interpreted as the amount of entropy in an expected event with two possible (binary) outcome and even odds. This model maximizes the expectation of the logarithm of total resource value rather than the expected improvement for the utility function from each trial (in each clock unit iteration in a Turing Machine) The Kelly criterion has applications in gambling and investments in the securities market (Thorp, 1997). In those special cases, the resource (communication) channel is the gambler's financial capital wealth and the fraction is the optimal bet size. The gambler wants to reduce the risk of ruin and maximize the growth rate of his capital. This value is found by maximizing the expected value of logarithm of wealth which is equal to maximize the expected geometric growth rate.
Similarly, the log-normal Salesman's can improve his strategy in the long run by quantifying the total of available inside information in the channel (or a tape in the Turing machine) and maximizing the expected value of the logarithm of the value function (defined by Traveled Euclidean distance) for each execution clock. Using this approach, he can reduce his uncertainty (entropy H(X)) while optimizing his rate of distance reduction (solution quality improvement) at each execution time.
The Figure 16 demonstrate the Kelly criterion value over the Expected Growth Rate

Figure 16 Maximization of entropy in random events
Kelly uncertainty distribution. Let E(Y) be the expected value of random variable Y, H(Y) be the measured entropy for the pdf(Y) distribution and K*(E(Y),H(Y)) = f* be the maximization of the expected value of the logarithm of the entropy of the utility function Y=g(X) . This fraction is known as the Kelly criterion and can be understood as the level of uncertainty about a given data distribution of the random variable X relative to a probability density function pdf(Y) of a measured respective cost distribution found at the sample. It's a measurement of the amount of useful encoded information.
The value of f* is a fraction of the cost-value of g(X) on an outcome that occurs with probability p and odds b. Let the probability of finding a value which improves g(X) be p and in this case the resulting improvement is equal to 1 cost-unit plus (1+) the fraction f. The probability of decreasing quality for Y is (1-p). Therefore, the expected value for log variable (E) is differed in Equation 2 as:

Equation 2 Expected value for the cost variable
The maximization of the expected value f* is defined by the Kelly criterion formula in Equation 3 Equation 3 Kelly criterion formula Where f* is the optimal fraction, b is the net odds, p is the probability of improving quality (win) in Y=g(X) and the q is the probability of decreasing (loss) quality q=(1-p).
For example, consider a program with a 60% chance of improving the utility function g(X) thus p=0.6 (win) and q=0.4 (lose). Consider the program has a 1-to-1 odds of finding a sequence which improves g(X) and thus b=1(1 quality-unit increase divided by 1 quality-unit decrease). For these parameters the program has a 20%(f*=0.20) of certainty that the outcomes produce values that improve the expected value of g(X) over many trials.

# METHOD
Quantitative Algorithm Theory. The algorithm is designed to find the near optimal best route to multiple service nodes before returning to the original point. This problem is a variation of TSP.
Tour improvementheuristics -algorithms such as 2opt and Simulated Annealing (SA) are used as a benchmark for the proposed Quantitative Algorithm (QA). 3 test cases are used to analyses the solutions generated by each algorithm. The 2opt algorithms produces solutions with smaller total distance but required more time units as the number of nodes increase. SA and QA have a maximum number of allowed interactions, but QA produces better solution quality than SA for the same time period.
The test samples are grouped in 10, 30 and 50 nodes. Each point represents a machine in a data center (i.e. computing and network provider) that can deploy a given service S. The distance cost in this case is the illustrative round-trip network and processing delay. This weight is the length of time to send a signal t(s*) plus the time to reply acknowledging of that the same signal t(s*) was received.
To avoid bias and miss interpretation in the research, the first tour loaded in the computer memory is randomly flushed using a statistical function in Python programming language. The function swaps all elements (using a normal distribution) of the initial tour list, created after reading the list of input nodes.
The solutions found from SA and QA heuristics algorithms were analyzed for accuracy and reliability of the output. We have compared the required time and solution improvement between each program. Each algorithm was measured with a trial with sample N=60.

# RESULTS -PROPOSED MODEL
In Figure 17 we have the flowchart design for the Quantitative TSP Algorithm (QA). The constraints for the Kelly criterion and the Bernoulli trials are presented in Figure 18 and Figure 20. Table 7 and Table 8 demonstrates the mathematical model and the pseudo-code for the proposed Quantitative Algorithm (QA).   Algorithmic Information method. The two major components are the simulated Kelly fraction f* (describing the overall uncertainty spread) and the Bernoulli process distribution of the underlining random event between states (estimated as the mean and standard variance for the weight function for each solution). The combination of those factors will be evaluated to decide the start of a neighborhood search (following a Probability density function) when the new alternative solution has a negative gain (i.e. new proposed solution is worse than current bestknown encoded candidate).

Mathematical Model
Solutions to the TSP routing problem are explored by algorithms such as 2opt, Simulated Annealing (SA), Greedy and Genetic Algorithm (GA). In this paper we proposed a Quantitative Algorithm (QA) that does not rely on naturally inspired schemas but rather provides a statistical interpretation as a distribution of signals by a stochastic (log normal) process. This stochastic process is defined as an ordered list of random variables {Xn} for a given trial of length N. N is a set of non-negative integers and Xn is defined as a target measurement for a specific instance of time.
The utility function is used to find the near optimal route that have the minimum traveling distance to multiple target node destination while returning to the starting node at the end. There are 2 constraints to be considered in the model presented in this paper: Simulated Entropy Uncertainty and Bernoulli Process.

## CONSTRAINT 1
Simulated Uncertainty. The first constraint is limited by the entropy. The input parameters for the Kelly function f* are the Wining probability P_W and the expected net-odds b for the Bernoulli trial B. The value of P_W is decreased by a fixed rate of 1% (0.01) at each interaction. The value b is measured as the ratio of average improvements of the positive interactions i+ divided by the average reduction of the negative interactions i-. The result of this function is the percentage of the useful side information available in a noisy channel.
In Table 9 we have an example for the Kelly criterion calculation In Figure 18 we have a circuit representation of the first constraint:

## CONSTRAINT 2
Bernoulli Process. The second constraint is defined by a Bernoulli process as a finite sequence of independent and random Bernoulli variables (i.i.d). This module will return as output the value "True" at 1% of the time for 100 interactions (N=100) and it will accept unlikely (risky) solutions with negative gain to eventually provide improvements bets for the solution quality under some degree of freedom. The process is defined as a trial with two binary states either "True/Success" (1) or "False/Failure" (0) with domain 0 <= p <= 1 P (Xi = 1) = p P (Xi = 0) = 1p E(X) = p Variance Var (X) = p -pˆ2 In Figure 19 we can see a Bernoulli distribution with P (0) = 80% probability of output a Failure state P(X=0) = 0.8 P(X=1) = 0.2 Otherwise P(X) = 0 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 30 July 2020 doi:10.20944/preprints202007.0710.v1 Examples of Bernoulli trials are show in Table 10  In Figure 20 we have the circuit representation of the second constraint:

# DISCUSSION -COMPUTATIONAL RESULTS
The research evaluates the performance of the proposed algorithm through a series of test cases and statistical analysis. The new method is tested against the traditional method Simulated Annealing. Each test case was run for a trial with population size of N=60. In the statistical analysis section, the t-test was used to compare the means between the sample groups. The null hypothesis is that there is no difference between the means of the two populations.
In the Appendix sections we have the Table 18, Table 19, Table 20 for the 3 test cases with nodes n={20, 30, 50}, with the trial input variables for best final tour cost, total execution time and the initial tour cost for a sample size of N=60. In Table 14, Table 15, Table 16 we have the t-test pvalue for each test case for the cost and execution time variable. Table 14 demonstrate the T-test calculation for 2 independent means for n=20.   Table 16 demonstrate the T-test calculation for 2 independent means for n=50. These findings can be expanded to other complex problems and it scales linearly to the search space sample. The proposed method is also resistant to the time and space constraints and it has a constant number of maximum iterations. In this paper we have introduced a new interpretation for the entropy rate for a binary program that implements a given NP problem.
The results can be used by many real-world applications such as the optimization of routing messages over a network and the orchestration of services across a distributed system such as provided by Cloud Computing environments and micro-services-oriented architectures. Besides it is also a future reference on the subject of Information Theory, Computational Complex Theory and Logarithmic utility in optimization routing, deployment, scheduling and planning. The research demonstrated that the proposed concepts have statistically significant results with better solution quality in tour planning. The model provides a new interpretation of entropy in problems encoded in Turing Machines and has the potential to change the traditional interpretation of the limits of Computing Theory.
The results are statistically significant (with p-value < 0.05), and we can conclude the proposed algorithm has better solution quality with reduced computational requirements and better cost improvement.

## STATISTICAL CASE STUDY
In order to test the performance in solving the TSP we have created trials with sample size N=60 for each test case with different number of nodes n={10, 30, 50} and then compared the results obtained from a traditional heuristic (SA) and the proposed algorithm (QA). Table 17 demonstrate the two-tailed t-test for two independent samples of costs with Significance Level of 0.05. This is a two-sided test for the null hypothesis with two independent means have the identical expected value. This test measures if the average expected cost value differs significantly across the measured samples. If the p-value is small than the significance level of 0.05 (5%) then we can reject the null hypothesis of equal average means. The Table 17 shows that results are statistically significant (with p-value < 0.05) for test cases n={20, 30, 50}, and we can conclude the proposed Quantitative Algorithm (QA) has better solution quality with reduced computational requirements and better cost improvement than heuristic Simulated Annealing (SA).

# CONCLUSION
Service scheduling and network routing has many applications and is related to the optimization problem modeled by the Traveling Salesman Problem. It's possible to improve performance by reducing the cost of transmission of information across many distributed systems locations. A new interpretation and verified optimization algorithm and statistical model based in Information Theory is presented in this paper and it demonstrated how it can be used to solve the TSP. The results support the idea that the proposed method can be used reliably to generate solution under a given degree of freedom. The algorithm can be expanded to large scale problems without the high requirements of computational resources utilization imposed by the brute-force and traditional heuristics methods such as Simulated Annealing and Genetic Algorithms. The algorithm can be adapted to any case of the routing problem. The research can be used as a framework for future works and extend the implications of Information Theory and Kolmogorov-Complexity in solving TSP and NP Problems in general.
The advantage of this approach is that it is independent of the computer encoding the problem and the time and space complexities are additive up to a limit that is linearly proportional to the input size. Other heuristics methods assume a predefined knowledge about the data structure and thus are biased towards this encoded schema. The implications of this interpretation is that by reducing the NP problems to a matter of modularization of encoded and decoded random signals in an communication noisy channel, we can find near optimal solutions that are statistically significant and are guaranteed to produce the best rate of improvement in the long run over many trials(i.e. simulation iterations) even though the problems itself is computationally complex and the alternative sequential brute force algorithm would require exponential time to solve. This mathematical model can be interpreted as a generalization of heuristics methods.
The findings in this paper unifies critical areas in Computing Science, Mathematics and Statistics that many researchers have not explored and provided a new interpretation that advances the understanding of the role of entropy in decision problems encoded in Turing Machines.