Optimizing Multi-Scalar Multiplication Over Fixed Bases

Saulius Grigaitis

doi:10.20944/preprints202604.0045.v1

Submitted:

31 March 2026

Posted:

02 April 2026

You are already at the latest version

Abstract

This work investigates multi-scalar multiplication (MSM) over a fixed base for small input sizes, where classical large-scale optimizations are less effective. We propose a novel variant of the Pippenger-based bucket method that enhance performance by using additional precomputation. In particular, our approach extends the BGMW method by introducing structured precomputations of point combinations, enabling the replacement of multiple point additions with table lookups. We further generalize this idea through chunk-based precomputation, allowing flexible trade-offs between memory usage and runtime performance. Experimental results demonstrate that the proposed variants significantly outperform the Fixed Window method for small MSM instances, achieving up to 3× speedup under practical memory constraints. These results challenge the common assumption that bucket-based methods are inefficient for small MSMs.

Keywords:

multi-scalar multiplication

;

elliptic curve cryptography

;

pippenger’s algorithm

;

bucket method

;

precomputation

;

fixed base scalar multiplication

Subject:

Computer Science and Mathematics - Security Systems

1. Introduction

Multi-scalar multiplication (MSM) is a fundamental operation in elliptic curve cryptography and underpins many cryptographic protocols. While MSM has been extensively studied, most recent advances focus on large-scale instances, where Pippenger’s bucket method and its variants dominate in performance.

In contrast, small MSM instances exhibit different characteristics and allow alternative optimization strategies. In this regime, precomputation becomes a powerful tool, as its cost can be amortized over relatively small inputs. This opens the door to algorithms that trade memory for improved runtime efficiency.

However, classical bucket-based methods such as the BGMW approach are often considered suboptimal for small MSMs. In this work, we challenge this assumption by introducing novel precomputation technique that reduce the number of required group operations.

We propose a new variant that extends the BGMW method and demonstrate that, when combined with carefully designed precomputation strategies, bucket-based methods can outperform widely used approaches such as the Fixed Window method in the small MSM regime.

2. Previous Work

Multi-scalar multiplication (MSM) over fixed base has been extensively studied over several decades. Recent advances have primarily targeted large-scale MSM instances, typically of size at least

2^{20}

, where state-of-the-art algorithms are based on variants of Pippenger’s bucket method [1].

In contrast, smaller MSM instances admit different optimization strategies, most notably through the use of precomputation. In this regime, the precomputation cost is often proportional to the MSM size, allowing relatively modest overhead. Furthermore, smaller instances enable significantly larger precomputation per point, creating opportunities for more sophisticated algorithms that trade memory for improved runtime performance.

This chapter reviews classical MSM algorithms.

2.1. Shamir’s Trick

Shamir’s Trick [2] computes a double-scalar multiplication

S = k_{0} P_{0} + k_{1} P_{1}

using a minimal precomputation table consisting of

O

,

P_{0}

,

P_{1}

, and

P_{0} + P_{1}

. Only the latter requires explicit precomputation.

The scalars are processed simultaneously in binary form from the most significant bit to the least significant bit. At each step, the accumulator is doubled, and a precomputed value is conditionally added based on the current bit pair. This method achieves an efficient trade-off between precomputation and runtime cost.

2.2. Straus’s Method

Straus’s method [3] generalizes Shamir’s Trick to n scalars:

S = \sum_{i = 0}^{n - 1} k_{i} P_{i} .

The algorithm processes all scalars in parallel and relies on a precomputation table containing all possible sums of subsets of

{P_{0}, \dots, P_{n - 1}}

. At each bit position, the accumulator is doubled and the subset sum corresponding to active bits is added.

While computationally efficient—requiring

k - 1

doublings and

k - 1

additions for k-bit scalars—the method suffers from exponential precomputation complexity, rendering it practical only for very small n.

2.3. Bos-Coster Method

The Bos-Coster method [4] is a recursive MSM algorithm that iteratively reduces the largest scalars. At each step, the two largest scalars

k_{0} > k_{1}

are replaced as

k_{0} P_{0} + k_{1} P_{1} \to (k_{0} - k_{1}) P_{0} + k_{1} (P_{0} + P_{1}) .

Each iteration requires one scalar subtraction and one group addition. The process continues until a single scalar remains. The method effectively reduces scalar sizes while increasing point reuse, making it attractive in settings where additions are significantly cheaper than scalar operations.

2.4. Fixed Window Method

The Fixed Window method [5] accelerates scalar and multi-scalar multiplication through per-point precomputation. For a window size w, each point

P_{i}

is expanded into

{P_{i}, 2 P_{i}, \dots, (2^{w} - 1) P_{i}} .

Scalars are partitioned into w-bit chunks, and each chunk selects a precomputed multiple. The contributions are combined using appropriate doublings. The method achieves a predictable trade-off between precomputation cost

O (n 2^{w})

and runtime efficiency.

2.5. Sliding Window Method

The Sliding Window method [5] improves upon the fixed window approach by dynamically selecting windows. Instead of fixed partitions, the algorithm scans the scalar representation and processes a window beginning with a non-zero bit.

This reduces the number of additions by skipping zero segments, making the method particularly effective for scalars with low Hamming weight.

2.6. Pippenger’s Bucket Method

For large-scale MSM, Pippenger’s bucket method and its variants are widely regarded as the most efficient approaches [6]. The central idea is to group points according to scalar digits and accumulate them into buckets.

In the windowed variant, scalars are decomposed into base-

2^{w}

digits. For each window, points are assigned to

2^{w} - 1

buckets based on their digit values. After bucket accumulation, the result is computed using a Horner-like scheme, followed by appropriate doublings between windows.

This approach significantly reduces the number of required scalar multiplications by replacing them with structured additions.

2.7. BGMW Method

The BGMW method [7] extends Pippenger’s approach by incorporating structured precomputation. For each point

P_{i}

, a table of size h is constructed:

{q^{j} P_{i} ∣ i = 0, \dots, n - 1, j = 0, \dots, h - 1},

resulting in

O (n h)

precomputed points.

This moderate precomputation cost enables efficient bucket accumulation and improves data locality. However, a key characteristic of BGMW is the existence of an optimal parameter regime: increasing precomputation (e.g., by reducing the window size w) does not necessarily improve performance and even degrade it if w is below the optimal.

This behavior contrasts with fixed-window methods, where larger precomputation typically yields improvements. Although several extensions of BGMW have been proposed [6], they are not performant for scalar multiplication and small MSMs.

3. Pippenger’s Bucket Method for Small MSMs

The bucket method improves performance by first aggregating points within each bucket and subsequently multiplying the accumulated sum by the corresponding bucket index. This strategy is particularly effective for large MSM instances, where the expected number of points per bucket grows with the MSM size n. In contrast, for small MSM instances, bucket occupancy is low, and the resulting performance gains are limited.

The BGMW method addresses this limitation by increasing the number of points per bucket through precomputation for Pippenger method. In particular, points corresponding to the same bucket index across different windows can be aggregated. This is enabled by the BGMW precomputation, which provides, for each point, its representation across all windows.

A straightforward attempt to further increase bucket occupancy would be to decrease the window size w. However, this approach is not effective, as the BGMW method admits an optimal window size. Reducing w below this optimum increases the total number of group operations and consequently degrades overall performance.

The main contribution of this work is the introduction of additional precomputation techniques for the Pippenger method, building upon the precomputations proposed in the BGMW method, which further improve performance for small MSM instances. In particular, the proposed variant outperform existing precomputation-based methods, such as the Fixed Window method, for small MSMs.

3.1. The Proposed Variant

The primary goal of the proposed method is to reduce the number of point additions required during bucket accumulation. For small MSM instances and moderate window sizes w: some buckets contain zero or one point, while other buckets contain multiple points. For such buckets with more than one point, point additions can be reduced by replacing explicit additions with lookups into precomputation tables containing sums of point pairs.

More precisely, if a bucket contains two points, their sum can be obtained via a single lookup, provided that the corresponding pair has been precomputed. For buckets containing three points, two additions can be replaced by one lookup and one addition, and so on. In principle, one could precompute all possible point pairs; however, this becomes prohibitively expensive for large n or small w. To address this, we restrict precomputation to chunks of points. This significantly reduces memory requirements while preserving most of the performance benefits. The resulting trade-off is well suited for scenarios where memory usage is constrained.

The variant proceeds as follows. First, the standard BGMW precomputation of size

n k / w

is generated. Next, the resulting BGMW precomputation is used to precompute additional values of the form

P_{i} + P_{j}

for indices

i, j \in ⋃_{d = 0}^{\frac{n k}{t w} - 1} {d t, d t + 1, \dots, (d + 1) t - 1}, i \neq j,

where w denotes the window size in bits, n is the MSM size, k is the scalar size in bits, and t is the chunk size.

The resulting algorithm follows the standard BGMW procedure, with modifications to the bucket accumulation phase. During accumulation, the algorithm scans each bucket and replaces pairs of points with their corresponding precomputed sums whenever possible. Increasing the chunk size t enlarges the set of available precomputed pairs, thereby reducing the number of explicit point additions.

4. Experiments

Authors of MSM algorithms often fork existing implementations, modify them, and publish the results. For example, [6] forked the BLST [8] library and released a modified version [9]. While this enables rapid prototyping, such implementations are typically tightly coupled to a specific elliptic curve cryptography (ECC) library, making fair comparisons across different libraries difficult. Moreover, these repositories are often not maintained, limiting reproducibility.

To address these issues, the author initiated the Rust-KZG project [10], a unified framework supporting multiple ECC backends, including Arkworks [11], BLST [8], Constantine [12], MCL [13], and ZKCrypto [14]. It integrates MSM implementations under a common interface, enabling fair and reproducible comparisons. The source code for the methods proposed in this work will be publicly released upon acceptance of the corresponding paper.

Rust-KZG is implemented in Rust, providing memory safety and performance comparable to C, and is already used by other researchers [15].

The proposed methods are compared against the Fixed Window method, specifically the optimized wbits implementation from BLST. Both the proposed variant and wbits use batched addition. Experiments are conducted on the widely used BLS12_381 curve [16], using the same backend to ensure consistency.

All experiments are performed on a system with an AMD Ryzen 9 5950X CPU, 64 GB of RAM, and Ubuntu 24.04, using the Criterion benchmarking library.

4.1. Experiment Results

The performance of the Fixed Window method improves as the window size increases, unless further increases no longer reduce the number of scalar windows. However, each increase doubles the size of the precomputation table, leading to poor scalability with respect to memory.

In contrast, the proposed variant inherit from the BGMW method the property that an optimal window size exists. Deviating from this optimal value degrades performance, even if the precomputation size increases. Furthermore, the proposed variant introduce an additional parameter, the chunk size (a multiplier of n), resulting in an optimal combination of window size and chunk size for a given memory budget.

Due to the fundamentally different precomputation strategies, direct comparison with the Fixed Window method is not straightforward, as matching precomputation sizes is often not possible. Therefore, the proposed variant are compared using configurations with equal or smaller precomputation size than the Fixed Window method.

The experimental results for the proposed variant indicate that the proposed variant achieves up to almost 3X performance improvements over the Fixed Window method for small MSM sizes. In particular, the runtime is reduced by 63.05% for MSM of size

2^{1}

, 47.32% for size

2^{2}

, 32.31% for size

2^{3}

, 24.44% for size

2^{4}

, and 11.41% for size

2^{5}

. For MSM of size

2^{6}

, the proposed variant reaches very similar performance (only -0.77% difference) compared to the Fixed Window method.

It should be noted that these improvements are achieved under specific memory budgets for precomputation, as detailed in the corresponding table. The proposed variant begins to outperform the Fixed Window method only when a sufficient memory budget for precomputation is available. However, the required memory remains relatively modest: 18.43 KB for MSM of size

2^{1}

, 48.96 KB for size

2^{2}

, 176.26 KB for size

2^{3}

, 665.86 KB for size

2^{4}

, and 2.59 MB for size

2^{5}

.

Furthermore, the proposed variant exhibits both globally optimal parameter settings for each MSM size and locally optimal settings for specific memory budgets. For example, for MSM of size

2^{2}

, the globally optimal configuration uses a window size of 4 and a chunk size of 64, resulting in a precomputation table of approximately 3.16 MB. However, if the available memory is limited to 1 MB, the optimal configuration changes to a window size of 4 and a chunk size of 9. The chunks parameter values are divided by n in the tables below. All reported execution times are measured in milliseconds.