The entropy function for non polynomial problems and its applications for turing machines

We present a general process for the halting problem, valid regardless of the time and space computational complexity of the decision problem. It can be interpreted as the maximization of entropy for the utility function of a given Shannon-Kolmogorov-Bernoulli process. Applications to non-polynomials problems are given. The new interpretation of information rate proposed in this work is a method that models the solution space boundaries of any decision problem (and non polynomial problems in general) as a communication channel by means of Information Theory. The limits of the search space are defined by the Kolmogorov-Chaitin complexity of the sequences encoded as Bernoulli strings. We conclude with a discussion about the implications for general decision problems in Turing machines.


INTRODUCTION
Consider a Shanon-Bernoulli process P defined as a sequence of independent binary random variables X1, X2, X3 . . .
Xn. For each element the value can be 0 or 1. All values have a given probability p. Thus the process is a sequence of independent and identically distributed Bernoulli trials. Let H(P) be the binary entropy function defined as the entropy of a Bernoulli process with probability p of one of two outcomes. A state is the representation (i.e. encoding) of a choice of the possible values for p. For any given trial there is no information about previous executions and future outcomes.
Each execution of the process is stateless [21] [3]. Let a function g(X) be an indexing function. This function is a utility function used to represent a preference of ordering and is used to compare if alternative state A is preferable against alternative state B. If the probability density function for g(X) is an instance of a normal density function than the average of a given set of observations of a random variable -with finite mean and variance -is also a random variable that has a distribution that converges to a normal distribution as the number of samples increases. The entropy H(g(X)) for the function g(x) from a random variable X can not be greater than the entropy of the random variable X. Let Y=g(x).
If the covariance of X and Y is not equal to zero than X and Y are correlated.
A Bernoulli scheme is a special case of a Markov chain and it's a generalization of the Bernoulli process to more than two possible outcomes from an independent random variable. A Markov source is defined as a information source created by a stationary finite Markov chain. The Kolmogorov complexity of a string is the length of the shortest computer program executed by a Turing Machine that produces the sequence as output and halts. It's the measurement of the computational resources required to define this string. It can be proved that the Kolmogorov complexity of any string can not be larger than the expected length of the Bernoulli sequence itself. The entropy of the Markov information source is related to the Kolmogorov complexity. From "Kolmogorov complexity" in Wikipedia [27] [14]: "the Kolmogorov complexity of the output of a Markov information source, normalized by the length of the output, converges almost surely (as the length of the output goes to infinity) to the entropy of the source. " Let E(Y ) be the expectation of the H(Y). A bit of information is the amount of entropy encoded in an event with two possible outcomes and even odds. The mutual information between two random variables X and Y can be used to optimize the growth of the utility function g(X) over many trials/executions. This is the "information gain" of a probability distribution for X given the value of Y relative to a given predefined a priori distribution (like the normal distribution) -or stated probabilities -on random variable X. Let K(E(Y ), H (Y )) be the maximization of the expected value of the logarithm of the entropy of the utility function Y, which is equivalent to maximize the expected geometric growth [26]. This fraction is the level of uncertainty between current and future states of X relative to E(Y). In other words this value represents the intensity of the "side information" measured from each element of the Bernoulli sequence.
A program executing in any Universal Turing Machine can use this information to reduce entropy by adjusting the error rate in the long run. At each iteration the program must evaluate the level of uncertainty and decide if to halt or not. This implications contradicts the standard definition of the halting problem in which it proves that there is no general algorithm to solve the halting problem for all possible program-input pairs. The problem is in determining from an arbitrary description of an arbitrary program and an input whether the program will halt or run forever.
By analyzing the entropy of the logarithm of the utility function g(X) relative to the probability density function of a random variable X we can extract useful information about the overall distribution of possible output values in a given sample. Any program p n can use this knowledge to determine whatever programs halt for a given subset of program-input pairs symbols.

THESIS STATEMENT
The statistical analysis of the entropy function of H(Y=g(X)) and its correlation with random variable X can be used by any program p n to decide the limits in which cases the near-optimal values satisfy the halting condition. This decision must have statistical significance (under a given degree of freedom) and reduce the uncertainty about the possible states of values from random variable Y. In section 3.1 we describe the normal distribution function and the Shapiro-Wilk test of normality. In section 3.2 we describe the Shannon entropy function for Bernoulli sequences. In section 3.3 we discuss the optimization of the logarithm of the utility function of a random independent variable using the Kelly criterion and the Chi-square distribution. In section 3.4 we present the Indexing binary function for classification of states in decision problems. In section 3.5 we present an information theory interpretation applied to non polynomial problems such as the Traveling Salesman Problem. In section 4 we discuss previous works and background information on the TSP problem. In section 4.1 we present the TSP modeling using Information Theory and the stochastic algorithm. The conclusion summarizes the discussion and the implications for the computational limits of Turing Machines for a finite alphabet with a given mean and variance.

Sufficient Statistic
3.1.1 Normal distribution. The normal distribution (or the normal density function) is a continuous probability distribution for a random variable. The parameter value are the mean (or expected value) of a given real number and the standard deviation. The domain is (-∞,+∞). Its defined by the formula show in figure 1:   Consider the function д(X n ) = c n . The output for this function are the cost values c calculated from the input random sequences in {X n }, with n finite and positive. The sequences are the semantic valid programs encoded by Bernoulli strings using a Shannon-Fano code. A Turing Machine executing these programs must decide if to halt by interpreting the random signals read from the tape. If the signal is statistically significant and the entropy is reduced then the program can stop and return the decision in binary representation. The computational complexity to decide the problem is additive to the machine implementing the program up to a limiting point.
As an example let {p1, p2, p3} be a given set of programs and let the costs for д(X = 001 = p1) = c1, д(X = 111 = p2) = c2, д(X = 101 = p3) = c3. Let the acceptance threshold T for the machine M returns False when g(X) is off by more than 1%. Assuming that the cost values {c1, c2, c3} is a normal random variable with mean 50 and standard deviation 0.5. The range for the programs in this domain are 49.5 <= д(X ) <= 50.5. The probability that a program will be accepted is P(49.5 <= X <= 50.5). Let f=g(X) be a normal random variable with median m= 50 and standard Manuscript submitted to ACM Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 5 March 2020 doi:10.20944/preprints202001.0360.v3 deviation SD =0.5. Therefore from [19] we can see that 68.27% of the programs will be accepted and the remaining 31.73% will be rejected. The calculations are shown in figure 3: The normal distribution for Y=g(X) with mean = 50 and SD = 0.5 is shown in figure 4:   Since the p-value> $alpha we accepted the H0 and we can assume the data is normally distributed.    The 2x2 contingency table for the chi-square can be used to compare two groups of dichotomous dependent variables.
The formula for the Chi-Square test is defined in figure 9: Fig. 9. Chi-square test formula As an example consider two programs pA and pB that encodes 600 Bernoulli strings. Each program has a set of Bernoulli sequences that were classified between two groups: 0 and 1. Group 0 is the set with sequences with better costs than a given threshold (i.e. improves expected quality) and the group 1 is the sample-set with sequences with worse or equal quality (i.e. reduces or does not change the expected quality of g(X)). A chi-square test can be used to determine whether there is a significant difference between the proportion of sequences in programs pA and pB. The Contingency Table is show in figure 10.  Table for the expected value of д(X ) relative to random variable X.
The chi-square statistics is 6.8343. The p-value is .008943. There is enough evidence to reject the null hypothesis as p-value is less than $alpha = 0.05.

Information Theory
3.2.1 Entropy of a random sequence. Entropy is a measure of uncertainty of a random variable. In other words it's the average rate at which information is produced by a stochastic process. Shannon defined the entropy H as a discrete random variable X with possible values as outcomes draw from a probability density function P(X) [21]. In figure 11 the entropy function is defined as: Manuscript submitted to ACM Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 5 March 2020 doi:10.20944/preprints202001.0360.v3 The conditional entropy of two events X = x and Y = y with X = x i and Y = y j for (i,j) and p(x i , y j ) is the amount of randomness in the random variable X given Y. In figure 12 this relationship is defined as:  Consider for example a Bernoulli trial with size n for random variable X. Let g(x) be the utility function for X and p be the probability of a program p n producing as output a string which optimizes g(X). Let K c (X ) be the length of the shortest program that outputs X and halts. Let H(X) and H(g(X)) be the entropy function for variable X and the function g(X) respectively. The only possible results are the event of interest F (i.e. finding the optimal solution to a given decision problem and thus maximizing the utility function) and event !F (i.e. not finding the optimal solution) with probabilities p=0.003 and q = 1-p = 0.997. The probability function for this Bernoulli trial is The utility function Y=g(X) of a random variable X express the preference of a given order of possible values of X. This order can be a logical evaluation of the value against a given threshold or constant. As an example if д(X 1 = 001) = c1 and д(X 2 = 010) = c2 are the costs of two routes between a set of nodes -in a super-graph G* -we can use this function to determine the arithmetical relationship between them and decide if c1 is less, greater or equal to c2. The probability density function pdf(Y) can be used to calculate the entropy of the distribution of the cost values.

Entropy of the utility function.
Let g(X) be a utility function of a random independent variable X with entropy H(g(X)). We need to determine the condition in which the entropy of function g(X) is not greater or equal than the entropy of X. This relationship is defined by H (д(X )) <= H (X ). From Wiegand and Schwarz [25] we have the proof of 3.2.5 Kolmogorov Complexity. The Kolmogorov complexity of a string w from language L denoted by K L (w) is the shortest program from alphabet L which produces w as output and halts. [14] The conditional Kolmogorov complexity of string x relative to word w is defined by K L (w |x) and is the length of the shortest program that receives x as input and produces w as output.
3.2.6 Complexity of a string and shortest description length. Let U be a Universal computer. The Kolmogorov complexity K c (x) of a string x of a computer U is It is the minimum length program p that output variable x and halts. Its the small possible program. Let C be another computer. If this complexity is general there is a universal computer U that simulates C for any string x by a constant Therefore there are few sequences with low complexity in the set. The lower bound states that are not many short programs in the set. If X is a Bernoulli sequence with probability p(x) = 1/2, this means there are no more than 2 k strings with complexity K c (x) < k.
Consider each program can produce only one possible output, the number of sequence with complexity K c (x) < k is less than 2 k . Thus the complexity will depend on the computer up to an additive constant. Lets consider fractals for example. They can produce complex 2d images with self-replicating patterns with many different scales. However the complexity to describe its rules have a Kolmogorov Complexity close to zero. [3] The expected value of the Kolmogorov complexity of a random sequence is close to the Shannon entropy. This relationship between complexity and entropy can be described as a stochastic process drawn to a i.i.d on variable X following a probability mass function p(x). The symbol x in variable X is defined by an finite alphabet. This expectation Therefore most of the sequences in the set do not have simple description. However there are some simple sequences.
The probability that a random sequence can be compressed by more than k bits is no greater than 2 (−k ) . Thus most Bernoulli sequences have a complexity close to their length.  converges to the entropy. Therefore we can bound the number of sequences with complexity that are significantly lower than entropy and classify each element between two sets(the typical sets and a non-typical set).

3.2.7
Kelly criterion and the uncertainty in random outcomes. Let K(E(Y ), H (Y )) = f be the maximization of the expected value of the logarithm of the entropy of the utility function Y=g(X) [13]. This fraction is known as the Kelly criterion and can be understood as the level of uncertainty about a given data distribution of the random variable X relative to a probability density function pdf(Y) of a measured respective cost distribution found at the sample. The value of f is a fraction of the cost-value of g(X) on an outcome that occurs with probability p and odds b. Let the probability of finding a value which improves g(X) be p and in this case the resulting improvement is equal to 1 cost-unit plus the fraction f. The probability of decreasing quality for Y is 1-p. Therefore the expected value for log cost (E) is differed in figure 15 as: From "Kelly criterion" in Wikipedia [26], "To find the value of f for which the expectation value is maximized, denoted as f*, we differentiate the above expression and set this equal to zero.". In figure 16 we have the maximization of the expected value. Where f* is the optimal fraction, b is the net odds, p is the probability of improving quality in Y=g(X) and the q is the probability of decreasing quality (1-q).
For example consider a program with a 60% chance of improving the utility function g(X) thus p=0.6 and q=0.4. The program has a 1-to-1 odds of finding a sequence which improves g(X) and thus b=1. For this parameters the program has a 20%(f*=0.20) of confidence that the outcomes produce values that improve the expected value of g(X) over many trials. b ∈ Γ, ⊆ Γ − {b} is the set of input symbols, q 0 ∈ Q is the initial state, F ∈ Q is the set of final accepted states, Let X be a random symbol from the alphabet L0. We can expand the standard Turing Machine model to a general computation under a given know distribution with probability p. Consider the utility function д(X ) related to the machine TM. The output of g(X) is dependent on the changes in the value of x in X. Assume that the values of g(x)

General Decision Machine
follow a normal distribution. Let the probability density function of g(X) be pdf(g(X)). The entropy H from g(X) relative to X can be calculated. Let K(E(Y ), H (Y )) = f * be the maximization of the expected value of the logarithm of the entropy of the utility function Y=g(X).

Indexing binary function
Let the C p (X, д(X )) be the p-value from the Chi-Square Test for random samples. Assume the samples are normally distributed. These functions evaluate whether the sample means are generated from the same distribution. Each element of the sample is a random independent variable X that is compared to the output of the utility function Y=g(X). This utility function is a conditional statement that returns(i.e. produces as a output) True or False by evaluating if a given The domain for function C p (X, д(X )) is the range from 0 (no change) to 1 (high confidence). The value outputted by this function is the significance level. For example a value of 0.05 represents a 5% chance. The domain for function K is (−∞, +∞). The K value represents the intensity about the uncertainty of randomly finding state outcomes that improves (or reduces) the utility function Y=g(X). If the K value is positive then it shows the amount of useful information encoded in the sample. For example if the value is 0.05 (5%) then the uncertainty is reduced (i.e decrease in entropy) and there is additional side information transmitted by the information source. If the edge is negative the K value is also negative indicating that the chance of finding useful information from the elements in the random sample is unlikely. By comparing the p-value and the Kelly fraction any program p n can decide whatever to halt based on "side information" on which symbols can lead -in the long run -to the best possible rate of quality improvement for the utility function g(X), over finite many interactions. Therefore the halting problem is reduced to a sort problem using the p-value and the Kelly fraction as the primary and secondary keys for an unsorted computation list. To find the best near optimal sequence any program can sort the keys by decreasing order. Additionally it can sort the sub-list by the number of bits required to encode the binary -near optimal -string in a Shannon-Fano code following the probability density function pdf(g(X)). The relationship between the p-value and the kelly fraction is show in figure 18:

Applications
The indexing binary function M can be used as a key to decide if a given decision problem should halt or not. For example the Traveling Salesman Problem (TSP) wants to find the shortest route in a graph of connected nodes. Another example is the game of Sudoku where the candidate solution that has a better partial solution has a better score than other alternatives. The Sudoku puzzles is a grid of partially completed rows and cells partitioned in regions. A solution using distinct symbols from alphabet L such that row, column and region have exactly one of each element of the set.
The general problem of Sudoku is NP-complete.
The function M can label each output and decide whether to halt if the intensity of M is above a given threshold T.
In other words the state of any Bernoulli sequence has a proportional real value that ranges according to the variation in entropy of the utility function of the candidate strings. Both Sudoku and the TSP are in the class of non-polynomial (NP) class of problems [28].

In this paper we have proposed a new approach for uncertainty modeling based on the Kelly-Shannon-Thorp [23] [21]
[13] criterion to measure the optimal channel allocation that a program should place at each interaction of an algorithm 1 (greater or equal cost -worse or unchanged quality). This distribution reveals the amount of inside information available to the program and the entropy can be calculated. The Turing Machine can then use this information to reduce entropy at each execution clock unit while maximizing the expected improvement in solution quality (ie finding a shortest euclidean distance tour that visits all nodes as described in the TSP problem). In the TSP, the salesman receives a list of cities and paths from a "wire" or "tape" and must decide between accepting any given new route against the current know best route. If he is lucky and chooses the right path, he may very well end with a smaller distance to travel across all cities and returning home. If he chooses the wrong path he may end up with thinking the current total distance is the best one even though it's not. For the classical salesman there is no way to know how good or bad his path is because he has no inside information about the overall expected distance of the routes. At each execution clock he must decide using only his current knowledge for that iteration about the best distance so far and the new alternative solution.
Alternatively, the log-normal salesman's can improve his strategy in the long run by quantifying the total of available inside information in the channel (or a tape in the Turing machine) and maximizing the expected value of the logarithm of value function (defined by Traveled Euclidean distance) for each execution clock. Using this approach, he can reduce his uncertainty while optimizing his rate of distance reduction (quality improvement) at each execution time. He can do it by creating the probability density function of the cost distance variable by sampling two sets (current and new) of random tours distances.

PREVIOUS WORK & PRELIMINARES
This section contain an overview of the most important heuristics to solve the TSP problem. Other algorithms such as Nearest Neighbor, Greedy, Insertion, Christofides, Lik-Kerninghan, Branch & Bound and Ant Colony Optimization were not discussed in this paper. Those methods and its complexity considerations is studied in details by Nilson [20].

Heuristics for the TSP & Graph Theory
An algorithm that is searching for the smallest distance (cost or weight) between nodes v and u in a graph G must record the information about the intermediary distances to w . This information can be encoded as a label attached to the nodes where its value is defined as the distance between v and w. [5] For n nodes We can use an distance matrix where each element represents the distance between each city-nodes. Lets Π be the set that contains all possible permutations from node 1 to n. Zhan et al [29] in his work described that the goal of the TSP is to find a permutation π that minimizes the distance between nodes. For symmetric instances the distance between two nodes in the graph is the same in each direction, forming an undirected graph. For asymmetric instances the weights for the edges between nodes can be dynamic or non-existent. In his work Glover [8] proposed the Tabu Search method and it can be used to improve the performance of several local-search heuristics such as 2opt. As neighborhood searches algorithms like 2opt can sometimes converge to a local optimum, the Tabu search keeps a list of illegal moves to prevent solutions that provide negative gain to be chosen frequently. In 2opt the two edges removed are inserted in the tabu list. If the same pair of edges are created again by the 2opt move, they are considered tabu. The pair is keep in the list until its pruned or it improves the best tour. [20] However using tabu searches increases computational complexity to O(n 3 ), as additional computation is required to insert and evaluate the elements in the list.
The Figure 21 show the 2-opt moves from Zunic et. al [31]. as an stochastic optimization method in random searches for good (near-optimal) solutions. This approach is analogous to the "survival of the fittest" principle presented by Darwin. This means that individuals that are fitter to the environment are more likely to survive and pass their genetic information features to the next generation.
In TSP the chromosome that models a solution is represented by a "path" in the graph between cities. GA has thee basic operations: Selection, Crossover and Mutation. In the Selection method the candidate individuals are chosen for the production of the next generation by following some fittest function In the TSP This function can be defined as the length (weight) of the candidate solutions tour. In figure 22 we have a representation of genes and Chromosomes. Next those individuals are chosen to mate (reproduction) to produce the new offspring. Individuals that produce better solutions are more fit and therefore have more chances of having offspring. However individuals that produces worst solutions should not be discarded since they have a probability to improve solution in the future. In other words, the heuristic accepts solutions with negative gain hopping that eventually it may lead to a better solution.
Manuscript submitted to ACM Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 5 March 2020 doi:10.20944/preprints202001.0360.v3 Several researches have studied the performance trade off of selection strategy and how the input parameters affects the quality of solution and the computational time. Julston studied the performance of rank-based and concluded that tournament selection presents better results than rank-based selections. [18] Razali et al. [18] in his work explores different selection strategies to solve the TSP and compare the performances quality and the number of generations required. It concludes that tournament selection is more appropriate for small instance problems and rank-based roulette wheel can be used to solve large size problems.
Goldenberg & Deb [6] compared the quality of the solution and the convergence time on many selection methods such as proportional, tournament and raking. They conclude that ranking and tournament have produced better results that proportional selection, under certain conditions to convergence. [18] In his work Zhong et al. [30] explored proportional roulette wheel and tournament. He concluded tournament selection is more efficient than proportional roulette selection.
The figure 23 contains the pseudo-code for a Genetic Algorithm from Ferentinos et. al [7]  where tmp is the input temperature. The SA algorithm specifies the neighborhood structure and the cooling function.
From Chien et. al [2] we can see the flow chart in figure 24 for the Simulated Annealing algorithm: Manuscript submitted to ACM Zhan et. al [29] proposed a list-based SA algorithm using a list-based cooling method to dynamically adjust the temperature decreasing rate. This adaptive approach is more robust to changes in the input parameters. In his work Chien et. al [2] proposed a genetic simulated annealing inspired by ant colony systems. They conclude that the average solution results are better than the traditional SA and GA methods alone.
The quality of the solution can be improved by allowing more time for the algorithm to run. Steiglitz and Weiner [22] observed that the performance of 2-opt and 3-opt algorithms can be improved by keeping a ordered list of the closest neighbors for each city-node and thus reducing the amount of solutions to search.

Nature inspired models. Researchers have proposed algorithms inspired by natural events and structures like
the heating of metals and the growing behavior of biological organisms. Those methods do not iterate over the entire solution space but rather a portion in order to find the local minimum. They start with an initial random solution and tries to improve the solution quality over each interaction until some constant parameter factor is reached like a maximum number of interactions, maximum number of candidate solutions, the rate of decay in temperature or a minimum quality threshold is achieved. This can be interpreted as a non-deterministic way to address the error rate between the known solutions and the unknown solutions in polynomial time. Although such algorithms do not have to traverse the entire solution space it must decide when a random candidate solution with negative gain will be accepted (i.e. candidate with worst solution quality than current know best solution) in the hopes that eventually it would lead to the shortest distance (i.e. a better solution quality).
What those methods have in common is that they ignore the data distribution of input and output strings for the candidate solutions sequences. The algorithm does not care which city it starts and ends the tour, as long as the solution sequence is semantically valid it must decide whether to return true or false using some function that defines the decision rules based on the current best known total Euclidean distance tour. At each interaction those heuristics are trying to optimize the calculated value function set as the total tour distance for that specific trial, it has no information Manuscript submitted to ACM about the other possible candidate solution strings in the solution space. This assumption introduces a bias in the algorithm since it implicitly assumes a prior knowledge about the data structure in order to reduce the resources required to complete the computation. Put it simply, the algorithm is never sure if the solution at hand is the optimal and may very well discard it hoping to look for a better one in the future based on some arbitrary criteria.
In the TSP problem all non-deterministic algorithm will have to work with random input and output sequences and for each one it must decide when to halt or not. The candidate tour sequences from the perspective of the Turing machine are random symbols being read by the machine's head. All the available inside information are the symbols in the tape. The tape is the communication channel used to transmit the messages (i.e. candidate solutions). Therefore, modelling the TSP by means of Information Theory and Kolmogorov Complexity is the natural language as the number of possible candidate solutions are very large even for small instances of the problem. This means that all cities are assumed to have the same probability of being chosen and the outputted total tour distance would be proportional to this choice over many trials.
Nature-inspired models such as GA and SA uses prior information to improve the solution results and thus are biased towards this encoding. Alternatively, by modeling the TSP problem as a communication channel with a probability density function associated with the stochastic process that generates the solutions at random, thus we can bound the limits of the search space to a log-normal distribution. The advantage of this method is that by relying on the statistical analysis of the solution space instead of the computational complexity of the problem we can have equal or better qualities than the traditional algorithms without relying on computational complex implementations that have a high time and space constrains.

ALGORITHMIC-INFORMATION LANGUAGE
Let L be a finite language with N symbols in a set such as L : {NODE A ; NODE B ; NODE C , . . . NODE N }. Each item is an identification label for the elements in the set under some probability distribution function pdf(x). This is the likelihood of choosing a given symbol x in L. This can be interpreted as a random variable x in L that can assume any item in the set by following the known distribution pdf(x). Each symbol represents a city with a 1-1 mapping to a respective node in a super-graph G*. Therefore, each node is a point with X n and Y n coordinates in a 2D Euclidean space. Let the function g(X) be the utility function for a given language L. This functions outputs the costs or weight of X relative to a given threshold T defined as the expected value of E(g(X)). Assume g(X) is a Bernoulli distribution.
Thus, each element is a node that have two variables that stores the x and y axis coordinate accordingly. A valid sequence s* can be produced by combining elements of L. Permutations with duplicate elements are invalid. S* is the set with all possible permutations of this sequence. The neighbor of each node at position i in the sequence is the next connected city (i.e item at position i+1). This sequence describes a path between nodes and is a tour starting from the first node until the last and returning to the first node when finished [15]. This is shown in figures 25 and 26: Manuscript submitted to ACM In figure 26 we have an example of valid and invalid tours sequences:  This is the error rate of transmission of information in a noisy communication channel. If the fraction is negative the salesman must switch strategy and stay with current know best path sample and wait for the next alternative sample that will be created at the next execution iteration clock. [16].
Manuscript submitted to ACM     The chi-square statistic is 13.3333. The p-value is .000261. The result is significant at p < .05. With a significance level of 0.05 there is strong evidence against the null hypothesis and there is a 5% chance the null is correct. We can reject the null hypothesis and accept the alternative hypnosis that Group B has solutions that produce better quality than in Group A. "We don't have enough evidence and/or information to reject the null hypothesis]

CONCLUSION
In this paper we have demonstrated the statistical analysis of a stochastic process created by a set of Shannon-Bernoulli sequences. From this framework it's possible to analyze the distribution of information for a given random variable and the correlation between this variable and a utility function that compares the differences between state sequences. This function shares information with the random variable itself and this side information can be measured. The knowledge of this side information can be used by any program -executing in a Turing Machine -to decide whatever to halt or not.
This model can be used to optimize the performance of heuristics algorithms such as Simulated Annealing, Genetic Algorithms and 2-opt applied to the Traveling Salesman Problem (TSP). The advantage of this approach is that it is independent of the computer encoding the problem and the time and space complexities are additive up to a limit that is linearly proportional to the input size. Other heuristics methods assume a predefined knowledge about the data structure and thus are biased towards this encoded knowledge. The implications of this interpretation is that by reducing the NP problems to a matter of modularization of encoded and decoded random signals in an alphabet we can find near optimal solutions that are statistically significant and are guaranteed to produce the best rate of improvement in the long run over many trials(i.e. simulation iterations) even though the problems itself is computationally complex and the alternative sequential brute force algorithm would require exponential time to solve.
From Fekete et al. [17] the classical interpretation of hard decision problems argues that unless P=NP there is no polynomial-time algorithm for NP problems such as the TSP: "Assuming triangle inequality, the best polynomial heuristic known to date uses the computation of an optimal weighted matching: Christofides method combines a in this paper, the decision problem itself can be quantified in bits and the distribution of random sequences that aresyntactically and semantically -valid is not neglectable. In other words, although not all sequences produce outputs that optimizes the quality for any utility function g(X), there are some infrequent random sequences that improve g(X) and are statistically significant under some given degree of freedom. This new interpretation of information rate can be used to define the limitations imposed by the computational complexity of Turing Machines when solving non polynomial problems, regardless of time and space constraints.