3. Derivation and Results of Polynomial Time Algorithm for Sudoku Problems in the Second Case
When discussing the second type of problem, at the same grid position, the numbers in the unsolved Sudoku and the known Sudoku often differ; if the numbers are the same, it indicates that the difference between the two is 9. Clearly, the difference between the two numbers is always distributed between 1 and 9. By cumulatively adding the probabilities of specific differences in descending order (the first term is the probability of a difference of 9, the second term is the sum of the probabilities of differences of 9 and 8, and so on, until the total probability of differences from 9 to 1 is 1), we can observe that the probability distribution of these nine sets of differences closely resembles the area under a discrete Gaussian distribution curve with positive independent variables, with a total probability of 1. Therefore, the probability corresponding to each specific difference can be calculated using the formula for standardizing normal distributions [
6]. By using the difference between the hints in the unsolved Sudoku and the known Sudoku as the sampling sample, we can estimate the frequency of each difference in the overall Sudoku.
By synchronously performing increment or decrement operations (with a step size of 1) on the numbers in a known Sudoku, nine interlocking Sudokus can be generated, including the initial known Sudoku. These nine Sudokus ensure that the difference between any two numbers is always the same constant. To simplify the solution process, the probability of the difference between these nine Sudokus and the Sudoku to be solved being 9 can be calculated sequentially. This ensures that, based on the initial known Sudoku, a complete probability distribution of differences from 1 to 9 with respect to the Sudoku to be solved is obtained. During the calculation, the difference values between the Sudoku to be solved and the known Sudoku hints must follow a unified rule, and the nine known Sudokus must use the same addition or subtraction operations to calculate the difference. Using the standardization formula for the normal distribution and the normal distribution probability table, the probability range for a specific difference value of 9 can be derived. By inferring the population from samples, the overall probability of the known Sudoku having this difference value relative to the target Sudoku can be determined. Multiplying the total number of cells in the 9x9 Sudoku by this probability value, and rounding the result to the nearest whole number, gives the number of valid numbers in the known Sudoku under the condition of a difference of 9. This determines the total number of numbers in the entire 9x9 grid that meet this condition.
Based on the distribution of candidate numbers in each cell, it can be determined whether the difference between the initial known numbers and the number in that cell meets a specific difference value condition. However, the number of cells that meet this condition is usually higher than the actual value. Based on the candidate numbers in each cell of the unsolved Sudoku and the number of numbers that meet the specific difference value condition, 18 equations can be established as follows:
A1+B1+C1+D1+E1+F1+G1+H1+I1= the number of blanks with a difference value of 1 (1) A2+B2+C2+D2+E2+F2+G2+H2+I2= the number of blanks with a difference value of 2 (2)
A3+B3+C3+D3+E3+F3+G3+H3+I3= the number of blanks with a difference value of 3 (3) A4+B4+C4+D4+E4+F4+G4+H4+I4= the number of blanks with a difference value of 4 (4)
A5+B5+C5+D5+E5+F5+G5+H5+I5= the number of blanks with a difference value of 5 (5) A6+B6+C6+D6+E6+F6+G6+H6+I6= the number of blanks with a difference value of 6 (6)
A7+B7+C7+D7+E7+F7+G7+H7+I7= the number of blanks with a difference value of 7 (7) A8+B8+C8+D8+E8+F8+G8+H8+I8= the number of blanks with a difference value of 8 (8)
A9+B9+C9+D9+E9+F9+G9+H9+I9= the number of blanks with a difference value of 9 (9)
A1+A2+A3+A4+A5+A6+A7+A8+A9=A( 10)
B1+B2+B3+B4+B5+B6+B7+B8+B9=B( 11)
C1+C2+C3+C4+C5+C6+C7+C8+C9=C( 12)
D1+D2+D3+D4+D5+D6+D7+D8+D9=D( 13)
E1+E2+E3+E4+E5+E6+E7+E8+E9=E( 14)
F1+F2+F3+F4+F5+F6+F7+F8+F9=F( 15)
G1+G2+G3+G4+G5+G6+G7+G8+G9=G( 16)
H1+H2+H3+H4+H5+H6+H7+H8+H9=H( 17)
I1+I2+I3+I4+I5+I6+I7+I8+I9=I( 18)
A represents the number of digits that match a difference of 1, B represents the number of digits that match a difference of 2, C represents the number of digits that match a difference of 3, D represents the number of digits that match a difference of 4, E represents the number of digits that match a difference of 5, F represents the number of digits that match a difference of 6, G represents the number of digits that match a difference of 7, H represents the number of digits that match a difference of 8, and I represents the number of digits that match a difference of 9.
Clearly, these 18 equations all belong to a system of linear equations with multiple variables, with the number of independent variables reaching up to 81. Given that constructing Sudoku puzzles without numerical constraints is relatively straightforward, new known Sudoku instances can be created. The difference in numbers between each newly constructed known Sudoku and the original known Sudoku can be precisely calculated. Consequently, the new column equations are interrelated with the original 18 equations and form a system of linearly independent equations. Using this method, several more equations can be constructed, for example, by using 8 new Sudoku puzzles to create 8 systems, each containing 18 equations, and then applying Gaussian elimination to solve the system of linear equations. It has been proven that when the number of linearly independent equations is at least as many as the number of independent variables, solving the system of linear equations using Gaussian elimination is a polynomial-time complexity algorithm [
7].
After obtaining the values from A1 to I9, assign unit values to the independent variables of each equation based on the differences. Specifically, assign the values 1 to 9 to A, B, C, D, E, F, G, H, and I, respectively, and then multiply these values by A1 to I9 in sequence. This process yields a set of values that match the number of difference cells in each equation. Next, label the numbers in the unsolved Sudoku as variables X1 to X81. Based on formulas (1) to (9), establish 9 equations for the specific cell positions where the known Sudoku matches the specific difference. Similarly, based on another 8 Sudokus, 72 equations can be established. Then, use Gaussian elimination to solve for the values of X1 to X81 sequentially. Finally, using the difference calculation results between the unknown Sudoku and the known Sudoku, solve the unsolved Sudoku.