Speeding up the Karatsuba A lgorithm

This paper describes an ∼ O ( n ) pre-compute technique to speed up the Karatsuba algorithm for multiplying two numbers

The Karatsuba algorithm [1,2], an O(n log 2 3 ) technique to multiply two n-digit numbers, has been surpassed by newer techniques that are O(n×log n×log log n) [3][4][5][6][7] and O(n×log n) [8] respectively.However, the simplicity of the algorithm allows improvements that are easily implemented and can be reduced to fewer multiplications, supplemented by look-ups.

THE ORIGINAL ALGORITHM
For simplicity, consider multiplying two n-digit numbers x and y, written as where for simplicity,we use n = 2 k , and m = n 2 and work in base-10.The product can be simplified to x.y = x 0 y 0 + (x 0 y 1 + x 1 y 0 ) × 10 m + x 1 y 1 10 2m = x 0 y 0 + ((x 0 + x 1 )(y 0 + y 1 ) − x 0 y 0 − x 1 y 1 ) × 10 m + x 1 y 1 10 2m (2) so that the product of the n-digit numbers can be reduced to the multiplication of three m-digit (and occasionally m + 1-digit) numbers, instead of four m-digit numbers.Note that multiplications by 10 are ignored in the complexity calculations, as they can be reduced to shifts.The order of magnitude ("complexity") of the number of computations to multiply these two numbers can be reduced to the relation This may be seen simply as follows.At every step, the starting number (initially n = 2 s digits long) is split into two numbers with half the number of digits.After s steps, there are 3 s multiplications of single-digit numbers that need to be performed.This number can be written as 3 s = 3 log 2 n which can be further re-written as n log 2 3 , hence the result.This calculation is exact for a number which has a number of digits that is a power of 2.

THE IMPROVED VERSION
We generalize the above algorithm as follows and write (in base-B, though for simplicity, we will use B = 10) As can be quickly checked, each of the numbers x 0 , ..., x N , y 0 , ..., y N are m-digits long and (N + 1)m = n where n is the total number of digits in x.When one multiplies the two numbers x, y, the number of multiplications (the order of complexity) is M = (N + 1) + is what we use in the order-of-magnitude estimate in Equation ( 3).
In the Karatsuba technique, the m-digit numbers are further multiplied by the same technique, carrying on recursively till we are reduced to single-digit multiplications.That leads to the recursive complexity calculation noted in Equation (3).
However, note that if we simply pre-computed the individual m-digit multiplications and looked up the individual multiplications, we end up with essentially ∼ n 2 2m 2 lookups rather than actual multiplications.Indeed, lookups take, on average, 1/5 the time taken for singledigit multiplication (and then we have to multiply by the number of operations L required to perform the lookup), hence the complexity when lookups are added are ∼ n 2 5m 2 × L in comparison with the Karatsuba method.As we will show below, L ∼ 6m, so that the total complexity of the algorithm is ∼ n 2 m .Since m can be chosen to be a fraction of n, i.e., m = n N +1 , the complexity is ∼ (N + 1)n.When compared to the Karatsuba technique, this is much quicker than n 1.58 .This is the main result of this short note.
The lookups of m-digit multiplications need to be performed against a table of size B m × B m .This lookup, as can be verified by the "divide-and-conquer" technique, is (for B = 10) of complexity ∼ log 2 (10 2m ) = 2m log 2 10 ∼ 6m.There are some additional additions and subtractions, which add additional (though sub-dominant) complexity ∼ n m as can be easily checked and are detailed in the below example.
Analyzing this further, we could choose to mix and match, i.e., apply k Karatsuba-style divide-by-two-and-conquer steps, then apply the lookup method to look-up 3 k pre-calculated products of n 2 k digits.Alternatively, we could use the above technique (break-up into m digit blocks) with m = n 2 k , we'd have to look-up 2 k + C(2 k , 2) products.It is clear that 3 k < 2 k + C(2 k , 2), so a hybrid divide-by-two-and-conquer with lookups algorithm is the quickest way to speed up the calculation.A graph of the reduced complexity (essentially 3 N ( 23 ) N −k ) achieved this way is plotted in Figure 1 -clearly, cutting the recursion off early is advantageous.
A little reflection will show why divide-and-conquer by 2 for k times followed by lookup is the most efficient way to carry out the above procedure.Each time we divide an n-digit number into N + 1 blocks of m-digits, we have to (recursively) perform (N + 1) + C(N + 1, 2) multiplications.After k such recursions, we are left with (N + 1) k blocks of n (N +1) k digits each and have to perform ((N + 1) + C(N + 1, 2)) k multiplications.At this point, if we look up pre-computed products of numbers of this type, that is a complexity factor of ∼ n (N +1) k .The total number of operations is which is smallest for smallest N , i.e., N = 1.The complexity then matches exactly the complexity of the Karatsuba algorithm.
We would need to perform additions and subtractions, of course and there are 34 of them in the above example.That number of elementary operations depends, however, only upon n m .

Memory Requirements
Typical RSA encryption algorithms use ∼ 1000-digit base-10 composite numbers that are the product of five-hundred-digit primes.If one were to attack the problem by pre-computing keys, i.e., pre-multiplying pairs of five-hundred-digit primes (n = 500 ∼ 2 9 ) and storing the results of multiplying all possible 6-digit numbers (m = 6 ∼ 2 3 ), one has a complexity ∼ n 2 m = 85n ∼ 42, 500, which is worse than the new ∼ n log 2 n ∼ 4500 complexity [8], albeit the fact that the newer approach also has multipliers, which we have not accounted for.If we use the hybrid method (Karatsuba followed by look-up of 6-digit products), the complexity is ∼ 3 6 × 6 ∼ 4200, which is arguably much better (no pre-factors missing) than even the n log 2 n algorithms.We would need to store ∼ 10 12 twelve-digit numbers, roughly 20 TB of memory, which is a very reasonable size.

CONCLUSION
This paper presents a rapid pre-computed approach to speeding up multiplications.
Though one needs to pre-compute and store all possible m-digit multiplications, one can compute the products of two integers with number of digits equal to any integer times m in time proportional to the number of digits (times the afore-mentioned integer).Memory is cheaper than CPU-speed, so this is a method that can be exploited in other (for instance signal-processing) situations to speed up intensive calculations too.
Useful conversations are acknowledged with Dr. B. Kumar.As this paper was being prepared, an article about using pre-stored calculations was released, where the Eratosthenes sieve was sped up in calculation complexity [9].

FIG. 1 .
FIG. 1. Plot of the Efficiency of cutting off Karatsuba early