3. Derivation and Results of Polynomial Time Algorithm for Sudoku Problems in the Second Case
When examining the second category of problems, numbers in unsolved Sudoku puzzles and known Sudoku solutions at the same grid position often show discrepancies. If identical numbers appear in both, this indicates a 9-point difference between them. Notably, these numerical differences consistently fall within the 1-9 range. By descending order, the probability distribution of specific numerical differences can be calculated: the first term represents the probability of a 9-point difference, followed by the sum of probabilities for 9 and 8-point differences, and so forth. This process continues until the total probability of all nine differences (9-1) reaches 1. When these nine sets of data are arranged sequentially, their probability distribution pattern aligns with the characteristics of a cumulative distribution function[
6].
Therefore, these nine sets of data can serve as the independent variables for a discrete Gaussian distribution[
7]. To align with the probability mass function (PMF) [
8], based on the symmetry of the probability distribution, eight additional sets of data are generated in the Cartesian coordinate system, symmetrical about the Y-axis relative to the original data. The resulting 16 sets of data fully conform to the characteristics of the PMF, allowing for the calculation of the probabilities of each set. The difference between the hint numbers and the known numbers in the unsolved Sudoku is used as a sampling sample. By calculating the mean and variance of this sample, the frequency of occurrence of these nine types of data in the overall Sudoku can be estimated. Since these nine sets of data are cumulative, only the probability of the first difference value being 9 can be accurately calculated. However, by performing increment or decrement operations (with a step size of 1) on the known numbers in the Sudoku, nine mutually exclusive Sudokus, including the initial known Sudoku, can be generated. In these nine Sudokus, the difference between any two numbers is always the same constant. Thus, by sequentially calculating the probability of the difference value being 9 for these nine Sudokus and the unsolved Sudoku, a complete probability distribution with differences from 1 to 9 can be obtained, based on the initial known Sudoku.
In the calculation process, the differences between the unsolved Sudoku and the known Sudoku clues must follow a unified rule, and the nine known Sudokus must use the same addition or subtraction operations to calculate the difference. Since these nine sets of data are presented in a sequential cumulative manner, the goal is to determine the probability of the difference being 9. Therefore, the probability mass function does not need to be normalized; it only needs to identify the value corresponding to a difference of 9 in the probability mass function. Given that these nine sets of data form the positive half of the probability mass function, according to the principle of symmetry, the formula for calculating the probability of a difference of 9 should be divided by 2 to obtain the true probability. Thus, the final formula is as follows:
Where K is an integer (in this paper, K = 9), μ is the sample mean (positive real number) and σ is the standard deviation (positive real number)
By inferring the population from a sample, we can determine the overall probability of the difference value between the known Sudoku and the target Sudoku. Multiplying the total number of cells in a 9x9 Sudoku by this probability value, and rounding the result to the nearest whole number, gives the number of valid numbers in the known Sudoku under the condition that the difference is 9. This allows us to determine the total number of numbers in the entire 9x9 grid that meet this condition.
Based on the distribution of candidate numbers in each cell, it can be determined whether the difference between the initial known numbers and the number in that cell meets a specific difference value condition. However, the number of cells that meet this condition is usually higher than the actual count. Based on the candidate numbers and the number of numbers that meet the specific difference value condition in each cell of the unsolved Sudoku, 18 equations can be established as follows:
A represents the number of digits that match a difference of 1, B represents the number of digits that match a difference of 2, C represents the number of digits that match a difference of 3, D represents the number of digits that match a difference of 4, E represents the number of digits that match a difference of 5, F represents the number of digits that match a difference of 6, G represents the number of digits that match a difference of 7, H represents the number of digits that match a difference of 8, and I represents the number of digits that match a difference of 9. A1-I9 are unknowns.
Clearly, these 18 equations collectively form a system of linear equations with multiple variables, potentially containing up to 81 independent variables. Given that constructing Sudoku instances without numerical constraints proves more straightforward, we can generate new known Sudoku instances and apply the proposed algorithm to derive these 18 equations anew. Notably, the numerical differences between each newly constructed Sudoku instance and its original counterpart can be precisely calculated. This establishes a linearly independent equation system among the derived equations and the original set. Building on this approach, we can further develop additional equations—such as creating eight systems of 18 equations each using eight new Sudoku instances—and apply Gaussian elimination to solve the system. The problem belongs to the category of bounded-variable sparse integer linear equation systems (requiring unique positive integer solutions). Experimental verification confirms that when the number of linearly independent equations exceeds the number of independent variables, a unique positive integer solution exists[
9]. Solving such systems via Gaussian elimination is computationally polynomial-time [
10].
After obtaining the values from A1 to I9, assign unit values to the independent variables of each equation based on the differences. Specifically, assign the values 1 to 9 to A, B, C, D, E, F, G, H, and I, respectively, and then multiply these values by A1 to I9 in sequence. This process yields a set of values that match the number of difference cells in each equation. Next, label the numbers in the unsolved Sudoku as variables X1 to X81. Based on formulas (2) to (10), establish 9 equations for the specific cell positions where the known Sudoku matches the specific differences. Similarly, based on another 8 Sudokus, 72 equations can be established. Then, use Gaussian elimination to solve for the values of X1 to X81 sequentially. Finally, using the difference calculation results between the unknown Sudoku and the known Sudoku, solve the unsolved Sudoku.