tl;dr: When the discrete log $a$ of $a\cdot G$ is known to lie in a small range $[m)$, the baby-step giant-step (BSGS) algorithm recovers $a$ in $\ceil{\sqrt{m}}$ $\Gr$ additions using only a precomputed table of exactly $\ceil{\sqrt{m}}$ compressed points, trading the $O(1)$ time of the naive $m$-sized lookup table for much less memory.
This post describes vanilla BSGS, and two Ristretto255-optimized variants: BSGS-$k$, which batches $k$ giant steps to amortize the expensive point compression and truncated BSGS-$k$ (TBSGS-$k$), which stores only 8-byte truncated keys to shrink the table by 75% with negligible performance impact.
Notation
- We assume a prime-order group $\term{\Gr}$ of prime order $\term{p}$
- We use additive group notation: $a \cdot G$ denotes scalar multiplication in $\Gr$, where $a\in \Zp$ and $G\in\Gr$
Naive discrete log algorithm
Naively, computing the discrete logarithm (DL) $a$ on $a \cdot G$ when $a\in[m)$ can be done in constant-time via a single lookup in an $m$-sized precomputed table:
\begin{align} (\jG{j})_{j\in [m)} \end{align}
This works great for small enoguh $m$, depending on how much memory is available. e.g.,
- 2 MiB for $m=2^{16} \Rightarrow$ 512 KiB with truncation
- 32 MiB for $m=2^{20} \Rightarrow$ 4 MiB with truncation
Baby-step giant step (BSGS) discrete log algorithm
The BSGS algorithm compute a DL $a\in[m)$ on $a\cdot G$ with less memory but more computation. Specifically:
- it reduces the table size from $m$ to $\sqm$
- it increases the solving time from $O(1)$ to $\GaddG{(\sqm-1)}$.
Let us define a base, $s$, as:
\[\term{s}\bydef\sqm\]The key idea is that we can represent the value $a$ as a 2-digit number $(i,j)$ in base-$\emph{s}$:
\begin{align}
a = i\cdot s + j,\ \text{where}\ i,j\in[s)
\end{align}
As a result finding the discrete log $a$ of $H\bydef a\cdot G$, can be reduced to finding its two digits $i,j\in[s)$ such that:
\begin{align}
H &= (i \cdot s + j)\cdot G\Leftrightarrow\\
H &= i \cdot (s \cdot G) + j \cdot G\Leftrightarrow\\
\label{eq:bsgs-check}
H + i \cdot (\sG) &= \jG{j}
\end{align}
Now, imagine we have all $(\jG{j})_{j\in[s)}$ and $\sG$ precomputed. Then, finding $a$ can be reduced to computing all the left hand sides (LHS) of Eq. $\ref{eq:bsgs-check}$ for all possible $i\in[s)$ and checking if there exists $j\in[s)$ such that the LHS equals the right hand side (RHS). If it does, then $a = i\cdot s+ j$!
More concretely, we compute:
\begin{align}
V_0 &\gets H = H + 0 \cdot (\sG)\\
V_1 &\gets V_0 \sG = H + 1 \cdot (\sG)\\
V_2 &\gets V_1 \sG = H + 2 \cdot (\sG)\\
&\hspace{.7em}\vdots\\
V_i &\gets V_{i-1} \sG = H + i \cdot (\sG)\\
&\hspace{.7em}\vdots\\
V_{s-1} &\gets V_{s-2} \sG = H + (s-1) \cdot (\sG)\\
\end{align}
Then, for each computed $V_i$, we check (in constant-time) whether there exists a $j\in[s)$ such that $V_i = \jG{j}$.
In other words, we check if Eq. \ref{eq:bsgs-check} holds for some $i,j\in[s)$.
If it does, then we solved for the correct DL $a = i\cdot s + j$!
Note that this algorithm will take at most $s-1$ group additions in $\Gr$, so it is very efficient!
The maximum value $a$ can take in this base-$s$ representation is $(s-1) \cdot s + (s-1) = (s-1)(s+1) = s^2 - 1$. Since $\sqm \ge \sqrt{m}$, by squaring it, it follows that $\sqm^2 \ge \sqrt{m}^2\Leftrightarrow s^2 \ge m$. This means $a=m-1$ can be represented in base-$s$, so the algorithm will be able to solve DL for all $a\in[m)$. (Not only: it may also be able to solve it for slightly higher values, e.g., $s=4$ for both $m = 15,16$).
We give formal algorithms for BSGS below.
$\mathsf{BSGS.Precompute}(m\in \N, G\in\Gr) \rightarrow \table$
Recall that $\term{s}\bydef\sqm$.
Using $\GaddG{(s-1)}$ time, build a dictionary $\term{\dict}$ (e.g., a hash map) that maps each baby step to its index: \begin{align} \dict[\jG{j}] \gets j,\quad \forall j\in[s) \end{align} By convention, $\dict[P] = \bot$ for any $P$ that is not a key (i.e., for any $P\notin \{\jG{j}\}_{j\in[s)}$).
Return the table $\table \gets (s, \dict, \sG)$.
$\mathsf{BSGS.Solve}(\table, H\in\Gr) \rightarrow a \in [m)\times \{\bot\}$
Parse the table: \begin{align} (s, \dict, \sG) \parse \table \end{align}
Let $V_0 = H$.
For each $i\in[0, s)$:
- $j \gets \dict[V_i]$
- if $j \ne \bot$, then return $i\cdot s + j$
- $V_{i+1} \gets V_i + (\sG)$
If we reached this point, this means no $i,j\in[s)$ were found. This, in turn, means $a \ge s^2 \ge m$.
Therefore, return $\bot$.
BSGS-$k$ discrete log algorithm
BSGS-$k$ is a variant of BSGS that dramatically speeds up the solve algorithm on certain elliptic curves by batching $k$ giant steps together.
This addresses a key performance issue: when using BSGS on Ristretto255 curves, its main bottleneck is the expensive point compression required to index into the precomputed table after every giant step. BSGS-$k$ avoids this by first computing $k$ giant steps via $k-1$ group additions and then compressing all $k$ points at once. Unfortunately, batch compression is not supported in Ristretto255. (Feel free to open that can of worms on your own.) But, fortunately, “doubled-point” batch compression is supported via a $\doubleAndCompressBatch{\cdot}$ algorithm. As a result, we observe that the BSGS algorithm can be adjusted to work over these doubled points!
This algorithm was LLM-discovered.
Upon first glance at the Ristretto255 APIs in curve25519-dalek, I could not find any batch compression algorithms.
So I just threw the problem to Claude, with very little hope, thinking “Ha! There is no way it’ll find anything I couldn’t find. Let me watch it fail…“
Then, Claude found the $\doubleAndCompressBatch{\cdot}$ API I would have otherwise missed.
We give formal algorithms for BSGS-$k$ below, building on the BSGS algorithms above.
$\mathsf{Compress}(H\in \Gr) \rightarrow \{0,1\}^{256}$
Returns a 32-byte (256-bit) compressed canonical representation of the group element $H$.
$\mathsf{DoubleAndCompressBatch}\left((H_i)_{i\in[k]}\in \Gr\right) \rightarrow \left(\{0,1\}^{256}\right)^k$
Doubles all $H_i$’s and compress the results in batch, more efficiently than repeatedly calling $\compress{2\cdot H_i}$ for all $i\in [k]$.
$\mathsf{BSGS\text{-}k.Precompute}(m\in \N, G\in\Gr) \rightarrow \table$
Recall that $\term{s}\bydef\sqm$.
Using $\GaddG{(s-1)}$ time, build a dictionary $\dict$ that maps each doubled baby step to its index: \begin{align} \dict[\ctwojG{j}] \gets j,\quad \forall j\in[s) \end{align} (As before, $\dict[c] = \bot$ for any $c$ that is not a key.)
Return the table $\table \gets (s, \dict, \sG)$.
Note that the baby steps are stored doubled (as $\ctwojG{j}$), so that we can compare against them using $\doubleAndCompressBatch{\cdot}$. The giant step, however, remains $\sG$ (not doubled): as we’ll see below, the doubling is applied to each running point $V_{i+\ell}$ at compression time, not to the giant step itself.
$\mathsf{BSGS\text{-}k.Solve}(\table, H\in\Gr) \rightarrow a \in [m)\times \{\bot\}$
Parse the table: \begin{align} (s, \dict, \sG) \parse \table \end{align}
Let $V_0 = H$.
For each $i\in\{0,k,2k,\ldots\}$ subject to $i < s$:
- Compute $V_{i+1}, \ldots, V_{i+k-1}$ via $k-1$ additions of $\sG$, such that $\term{V_{i+\ell}} \bydef H + (i+\ell)\cdot (\sG)$
- Compute $\doubleAndCompressBatch{V_i, V_{i+1}, \ldots, V_{i+k-1}}$ to get compressed points $(c_i, \ldots, c_{i+k-1})$, where $c_{i+\ell} \bydef \compress{2\cdot V_{i+\ell}}$
- For each $\ell \in [0, k)$, run the match check:
- $j \gets \dict[c_{i+\ell}]$
- if $j \ne \bot$, then return $(i+\ell)\cdot s + j$
- $V_{i+k} \gets V_{i+k-1} + (\sG)$
If we reached this point, return $\bot$.
For simplicity, we describe the last batch as always having $k$ points, even when $i + k > s$. As a result, we may compute and look up a few extra points $V_n$ whose giant-step index $i + \ell$ is $\ge s$. This is harmless.
Truncated BSGS-$k$ (TBSGS-$k$) discrete log algorithm
TBSGS-$k$ is a variant of BSGS that dramatically reduces the table size from $\sqm \cdot 32$ bytes to $\sqm \cdot 8$ bytes (a 75% reduction) while maintaining nearly-identical performance for large secrets.
The key idea is: instead of keying the dictionary on the full doubled baby steps $\ctwojG{j}$, we key it on their 8-byte truncations $\tctwomjG{j}$, where $\term{\mathsf{Trunc}}$ returns the first 8 bytes and $\term{\mathsf{Compress}}$ is the standard Ristretto255 point compression.
We give formal algorithms for TBSGS-$k$ below, building on the BSGS-$k$ algorithms above.
$\mathsf{TBSGS\text{-}k.Precompute}(m\in \N, G\in\Gr) \rightarrow \table$
Recall that $\term{s}\bydef\sqm$.
Using $\GaddG{(s-1)}$ time, build a dictionary $\dict$ that maps each truncated, doubled baby step to its index: \begin{align} \dict[\tctwomjG{j}] \gets j,\quad \forall j\in[s) \end{align} (As before, $\dict[t] = \bot$ for any $t$ that is not a key.) The dictionary’s keys are now only 8 bytes each, hence the $\sqm \cdot 8$-byte table size.
Return the table $\table \gets (s, \dict, \sG)$.
$\mathsf{TBSGS\text{-}k.Solve}(\table, H\in\Gr) \rightarrow a \in [m)\times \{\bot\}$
TBSGS-$k$.Solve is identical to BSGS-$k$.Solve above, except for the match check. Since the table is now keyed by 8-byte truncations (which can collide), a truncated hit must be verified via a scalar multiplication before we trust it. So the match check becomes:
- $j \gets \dict[\trunc{c_{i+\ell}}]$
- if $j \ne \bot$ and $V_{i+\ell} \equals \jG{j}$ (verify the match), then return $(i+\ell)\cdot s + j$
Application to Aptos confidential assets
We use the TBSGS-k algorithm to decrypt confidential balances quickly for the Aptos confidential assets features.
Appendix: Benchmarks
All benchmarks are run single-threaded on an Apple Macbook Pro M4 Max.
All algorithms are implemented for Ristretto255 in this ristretto255-dlog repo.
Discrete log precomputed table sizes
| Algorithm | Size |
|---|---|
| TBSGS-k 32-bit | 512 KiB |
| BSGS 32-bit | 2.00 MiB |
| BSGS-k 32-bit | 2.00 MiB |
| BL12 32-bit | 258 KiB |
TBSGS-k is the default algorithm in Aptos confidential assets, offering the best balance between WASM size and DL solving time.
Rust DLP: [BL12]1
[BL12] 32-bit secrets time: [4.6197 ms 4.8080 ms 5.0194 ms]
On my old M1 Max, for 48 bits, BL12 times were: [763.90 ms 1.1598 s 1.6174 s].
Rust DLP: (Truncated-)BSGS with batch size $k$
Summary: For 32-bit secrets, TBSGS-k performs identically to BSGS-k (~11 ms). For smaller secrets, TBSGS-k is ~3x slower due to its verification overhead (i.e., one scalar multiplication per truncated match) However, this is negligible in absolute terms (35 µs vs 12 µs). TBSGS-k should be preferred for its dramatically smaller table size with minimal performance impact.
For discrete logs on 32-bit values with tables for 32-bits:
[BSGS-k1], 32-bit secrets
time: [61.928 ms 67.501 ms 77.980 ms]
[TBSGS-k1], 32-bit secrets
time: [64.498 ms 69.626 ms 74.543 ms]
[BSGS-k2], 32-bit secrets
time: [40.716 ms 42.236 ms 44.658 ms]
[TBSGS-k2], 32-bit secrets
time: [41.303 ms 43.666 ms 46.133 ms]
[BSGS-k4], 32-bit secrets
time: [24.406 ms 25.578 ms 26.663 ms]
[TBSGS-k4], 32-bit secrets
time: [25.164 ms 26.253 ms 27.185 ms]
[BSGS-k8], 32-bit secrets
time: [16.921 ms 17.727 ms 18.506 ms]
[TBSGS-k8], 32-bit secrets
time: [17.375 ms 18.174 ms 18.895 ms]
[BSGS-k16], 32-bit secrets
time: [12.398 ms 13.299 ms 14.351 ms]
[TBSGS-k16], 32-bit secrets
time: [12.851 ms 13.537 ms 14.255 ms]
[BSGS-k32], 32-bit secrets
time: [11.833 ms 12.347 ms 12.783 ms]
[TBSGS-k32], 32-bit secrets
time: [11.379 ms 12.037 ms 12.533 ms]
[BSGS-k64], 32-bit secrets
time: [11.760 ms 12.257 ms 12.909 ms]
[TBSGS-k64], 32-bit secrets
time: [10.717 ms 11.302 ms 11.772 ms]
[BSGS-k128], 32-bit secrets
time: [10.264 ms 10.677 ms 11.013 ms]
[TBSGS-k128], 32-bit secrets
time: [10.363 ms 10.843 ms 11.202 ms]
[BSGS-k256], 32-bit secrets
time: [10.674 ms 11.057 ms 11.478 ms]
[TBSGS-k256], 32-bit secrets
time: [10.512 ms 10.976 ms 11.631 ms]
[BSGS-k512], 32-bit secrets
time: [9.8338 ms 10.664 ms 11.260 ms]
[TBSGS-k512], 32-bit secrets
time: [10.967 ms 11.461 ms 12.073 ms]
[BSGS-k1024], 32-bit secrets
time: [10.329 ms 11.076 ms 11.718 ms]
[TBSGS-k1024], 32-bit secrets
time: [11.144 ms 11.453 ms 11.714 ms]
[BSGS-k2048], 32-bit secrets
time: [10.394 ms 10.811 ms 11.373 ms]
[TBSGS-k2048], 32-bit secrets
time: [10.800 ms 11.248 ms 11.783 ms]
[BSGS-k4096], 32-bit secrets
time: [10.686 ms 11.277 ms 11.681 ms]
[TBSGS-k4096], 32-bit secrets
time: [10.911 ms 11.467 ms 12.043 ms]
[BSGS-k8192], 32-bit secrets
time: [11.520 ms 12.029 ms 12.341 ms]
[TBSGS-k8192], 32-bit secrets
time: [11.663 ms 12.255 ms 12.650 ms]
[BSGS-k16384], 32-bit secrets
time: [12.228 ms 13.168 ms 14.037 ms]
[TBSGS-k16384], 32-bit secrets
time: [13.009 ms 13.615 ms 14.469 ms]
For discrete log on 17-24 bit values but with the same tables for 32 bits, demonstrating that smaller values are solved for faster:
[BSGS-k32], 17-bit secrets (32-bit table)
time: [12.096 µs 12.118 µs 12.144 µs]
[TBSGS-k32], 17-bit secrets (32-bit table)
time: [35.155 µs 35.252 µs 35.368 µs]
[BSGS-k32], 18-bit secrets (32-bit table)
time: [12.144 µs 12.166 µs 12.189 µs]
[TBSGS-k32], 18-bit secrets (32-bit table)
time: [35.003 µs 35.073 µs 35.164 µs]
[BSGS-k32], 19-bit secrets (32-bit table)
time: [12.211 µs 12.234 µs 12.261 µs]
[TBSGS-k32], 19-bit secrets (32-bit table)
time: [35.060 µs 35.131 µs 35.218 µs]
[BSGS-k32], 20-bit secrets (32-bit table)
time: [12.286 µs 12.307 µs 12.331 µs]
[TBSGS-k32], 20-bit secrets (32-bit table)
time: [35.063 µs 35.098 µs 35.137 µs]
[BSGS-k32], 21-bit secrets (32-bit table)
time: [12.424 µs 12.444 µs 12.466 µs]
[TBSGS-k32], 21-bit secrets (32-bit table)
time: [35.119 µs 35.153 µs 35.190 µs]
[BSGS-k32], 22-bit secrets (32-bit table)
time: [18.917 µs 18.954 µs 18.999 µs]
[TBSGS-k32], 22-bit secrets (32-bit table)
time: [41.477 µs 41.553 µs 41.628 µs]
[BSGS-k32], 23-bit secrets (32-bit table)
time: [31.782 µs 31.871 µs 31.971 µs]
[TBSGS-k32], 23-bit secrets (32-bit table)
time: [54.167 µs 54.306 µs 54.453 µs]
[BSGS-k32], 24-bit secrets (32-bit table)
time: [57.521 µs 57.709 µs 57.893 µs]
[TBSGS-k32], 24-bit secrets (32-bit table)
time: [79.229 µs 79.522 µs 79.798 µs]
For varying K values with 18-bit secrets:
[BSGS-k64], 18-bit secrets (32-bit table)
time: [22.715 µs 22.763 µs 22.816 µs]
[TBSGS-k64], 18-bit secrets (32-bit table)
time: [45.064 µs 45.104 µs 45.144 µs]
[BSGS-k128], 18-bit secrets (32-bit table)
time: [43.323 µs 43.387 µs 43.456 µs]
[TBSGS-k128], 18-bit secrets (32-bit table)
time: [65.512 µs 65.641 µs 65.812 µs]
[BSGS-k1024], 18-bit secrets (32-bit table)
time: [338.10 µs 338.59 µs 339.12 µs]
[TBSGS-k1024], 18-bit secrets (32-bit table)
time: [354.13 µs 354.48 µs 354.88 µs]
[BSGS-k2048], 18-bit secrets (32-bit table)
time: [673.72 µs 674.26 µs 674.87 µs]
[TBSGS-k2048], 18-bit secrets (32-bit table)
time: [681.17 µs 681.61 µs 682.10 µs]
Decision for Aptos:
Stick with 16-bit chunks and use the naive DL algorithm: store all solutions in tables of $2^{16}$ group elements ($2^{16}\times 32$ bytes $\Rightarrow 2$ MiB) and use naive lookup to compute the DL in constant time, when chunks are small!
Note that the TBSGS-$k$ table can be reused for this naive lookup approach!
When chunks are big, resort to TBSGS-$k$ (or, in the future, to [BL12]1).
Rationale:
We have to be conservative because a user may be using multiple confidential apps at the same time and/or the browser may be busy doing other things.
Rust DLP: BSGS (compression after every step)
[BSGS] 32-bit secrets time: [63.673 ms 69.557 ms 75.040 ms]
WASM DLP: [BL12]1
These were run on an older version of the TS SDK repo via: pnpm jest discrete-log
WASM [BL12] 16-bit: avg=0.24ms, min=0.19ms, max=0.30ms
WASM [BL12] 32-bit: avg=42.59ms, min=34.17ms, max=56.06ms
WASM DLP: TBSGS-$k$ and NaiveTruncatedDoubledLookup
WASM NaiveTruncatedDoubledLookup 16-bit: avg=1.50ms, min=1.33ms, max=2.29ms
WASM TBSGS-k32 32-bit: avg=20.15ms, min=1.84ms, max=39.55ms
TypeScript DLP: BSGS
These were run in the TS SDK repo via: pnpm jest discrete-log
TS BSGS 16-bit: avg=7.47ms, min=5.82ms, max=8.24ms
TS BSGS 32-bit: avg=2252.36ms, min=1539.60ms, max=3359.79ms