Groth16 and Aptos Keyless

 

tl;dr: Notes on our use of Groth16 for Aptos Keyless.

$ % \def\one#1{\left[#1\right]_\textcolor{green}{1}} \def\two#1{\left[#1\right]_\textcolor{red}{2}} \def\three#1{\left[#1\right]_\textcolor{blue}{\top}} \def\pair#1#2{e\left(#1, #2\right)} \def\GGen{\mathsf{GGen}} $

$ $

Performance

Circuit size

As of October 29th, 2025:

  • 1,438,805 constraints
  • 1,406,686 variables

Not relevant for Groth16, but for other zkSNARKs like Spartan:

  • The matrix $A$ has 4,203,190 non-zero terms
  • The matrix $B$ has 3,251,286 non-zero terms
  • The matrix $C$ has 2,055,196 non-zero terms
  • Total number of nonzero terms: 9,509,672

I think our task.sh script in the keyless-zk-proofs repo can be used to reproduce these “# of non-zero entries” numbers.

Proving time breakdown

These are the times taken on a t2d-standard-4 VM for an older version of the circuit with 1.3M constraints and variables.

Operation Time (millis)
MSM $\one{A}$ 276
MSM $\one{B}$ 248
MSM $\two{B}$ 885
MSM $\one{C}$ 366
Total $A, B, C$ MSM 1,775
Calculating C time 18
iFFT A time 242
Shift A time 11
FFT A time 237
iFFT B time 240
Shift B time 10
FFT B time 237
iFFT C time 239
Shift C time 11
FFT C time 238
Total FFT time 1,465
ABC time 21
MSM $h(X)$ time 2,785
Total 6,209

Appendix: PLONK

snarkjs compiler

# of “PLONK constraints” for the keyless relation is 6,421,050 (addition + multiplication gates, I believe.)

How? Set up a PLONK proving key using a downloaded BN254 12 GiB powers-of-tau file and used ran a snarkjs command: node --max-old-space-size=$((8192*2)) $(which snarkjs) plonk setup main.r1cs ~/Downloads/powersOfTau28_hez_final_23.ptau plonk.zkey

Benchmarks

These were run on my 10-core Macbook Pro M1 Max.

For jellyfish, I modified the benchmarks to fix a bug, exclude batch verification benches and increase the circuit size to $2^{22}$. The git diff shows:

diff --git a/plonk/benches/bench.rs b/plonk/benches/bench.rs
index 256f0d53..9b8aec21 100644
--- a/plonk/benches/bench.rs
+++ b/plonk/benches/bench.rs
@@ -12,6 +12,7 @@ use ark_bls12_377::{Bls12_377, Fr as Fr377};
 use ark_bls12_381::{Bls12_381, Fr as Fr381};
 use ark_bn254::{Bn254, Fr as Fr254};
 use ark_bw6_761::{Fr as Fr761, BW6_761};
+use ark_serialize::{CanonicalSerialize, Compress};
 use ark_ff::PrimeField;
 use jf_plonk::{
     errors::PlonkError,
@@ -22,8 +23,9 @@ use jf_plonk::{
 use jf_relation::{Circuit, PlonkCircuit};
 use std::time::Instant;
 
+const NUM_PROVE_REPETITIONS: usize = 1;
 const NUM_REPETITIONS: usize = 10;
-const NUM_GATES_LARGE: usize = 32768;
+const NUM_GATES_LARGE: usize = 4_194_304;
 const NUM_GATES_SMALL: usize = 8192;
 
 fn gen_circuit_for_bench<F: PrimeField>(
@@ -58,31 +60,33 @@ macro_rules! plonk_prove_bench {
 
         let start = Instant::now();
 
-        for _ in 0..NUM_REPETITIONS {
+        for _ in 0..NUM_PROVE_REPETITIONS {
             let _ = PlonkKzgSnark::<$bench_curve>::prove::<_, _, StandardTranscript>(
                 rng, &cs, &pk, None,
             )
             .unwrap();
         }
 
+        let elapsed = start.elapsed();
+        println!(
+            "proving time total {}, {}: {} milliseconds",
+            stringify!($bench_curve),
+            stringify!($bench_plonk_type),
+            elapsed.as_millis() / NUM_PROVE_REPETITIONS as u128
+        );
+
         println!(
             "proving time for {}, {}: {} ns/gate",
             stringify!($bench_curve),
             stringify!($bench_plonk_type),
-            start.elapsed().as_nanos() / NUM_REPETITIONS as u128 / $num_gates as u128
+            elapsed.as_nanos() / NUM_PROVE_REPETITIONS as u128 / $num_gates as u128
         );
     };
 }
 
 fn bench_prove() {
-    plonk_prove_bench!(Bls12_381, Fr381, PlonkType::TurboPlonk, NUM_GATES_LARGE);
-    plonk_prove_bench!(Bls12_377, Fr377, PlonkType::TurboPlonk, NUM_GATES_LARGE);
     plonk_prove_bench!(Bn254, Fr254, PlonkType::TurboPlonk, NUM_GATES_LARGE);
-    plonk_prove_bench!(BW6_761, Fr761, PlonkType::TurboPlonk, NUM_GATES_SMALL);
-    plonk_prove_bench!(Bls12_381, Fr381, PlonkType::UltraPlonk, NUM_GATES_LARGE);
-    plonk_prove_bench!(Bls12_377, Fr377, PlonkType::UltraPlonk, NUM_GATES_LARGE);
     plonk_prove_bench!(Bn254, Fr254, PlonkType::UltraPlonk, NUM_GATES_LARGE);
-    plonk_prove_bench!(BW6_761, Fr761, PlonkType::UltraPlonk, NUM_GATES_SMALL);
 }
 
 macro_rules! plonk_verify_bench {
@@ -91,7 +95,7 @@ macro_rules! plonk_verify_bench {
         let cs = gen_circuit_for_bench::<$bench_field>($num_gates, $bench_plonk_type).unwrap();
 
         let max_degree = $num_gates + 2;
-        let srs = PlonkKzgSnark::<$bench_curve>::universal_setup(max_degree, rng).unwrap();
+        let srs = PlonkKzgSnark::<$bench_curve>::universal_setup_for_testing(max_degree, rng).unwrap();
 
         let (pk, vk) = PlonkKzgSnark::<$bench_curve>::preprocess(&srs, &cs).unwrap();
 
@@ -99,6 +103,14 @@ macro_rules! plonk_verify_bench {
             PlonkKzgSnark::<$bench_curve>::prove::<_, _, StandardTranscript>(rng, &cs, &pk, None)
                 .unwrap();
 
+        let mut bytes = Vec::with_capacity(proof.serialized_size(Compress::Yes));
+        proof.serialize_with_mode(&mut bytes, Compress::Yes).unwrap();
+
+        println!(
+            "proof size: {} bytes",
+            bytes.len()
+        );
+
         let start = Instant::now();
 
         for _ in 0..NUM_REPETITIONS {
@@ -117,14 +129,8 @@ macro_rules! plonk_verify_bench {
 }
 
 fn bench_verify() {
-    plonk_verify_bench!(Bls12_381, Fr381, PlonkType::TurboPlonk, NUM_GATES_LARGE);
-    plonk_verify_bench!(Bls12_377, Fr377, PlonkType::TurboPlonk, NUM_GATES_LARGE);
-    plonk_verify_bench!(Bn254, Fr254, PlonkType::TurboPlonk, NUM_GATES_LARGE);
-    plonk_verify_bench!(BW6_761, Fr761, PlonkType::TurboPlonk, NUM_GATES_SMALL);
-    plonk_verify_bench!(Bls12_381, Fr381, PlonkType::UltraPlonk, NUM_GATES_LARGE);
-    plonk_verify_bench!(Bls12_377, Fr377, PlonkType::UltraPlonk, NUM_GATES_LARGE);
-    plonk_verify_bench!(Bn254, Fr254, PlonkType::UltraPlonk, NUM_GATES_LARGE);
-    plonk_verify_bench!(BW6_761, Fr761, PlonkType::UltraPlonk, NUM_GATES_SMALL);
+    plonk_verify_bench!(Bn254, Fr254, PlonkType::TurboPlonk, NUM_GATES_SMALL);
+    plonk_verify_bench!(Bn254, Fr254, PlonkType::UltraPlonk, NUM_GATES_SMALL);
 }
 
 macro_rules! plonk_batch_verify_bench {
@@ -168,19 +174,7 @@ macro_rules! plonk_batch_verify_bench {
     };
 }
 
-fn bench_batch_verify() {
-    plonk_batch_verify_bench!(Bls12_381, Fr381, PlonkType::TurboPlonk, 1000);
-    plonk_batch_verify_bench!(Bls12_377, Fr377, PlonkType::TurboPlonk, 1000);
-    plonk_batch_verify_bench!(Bn254, Fr254, PlonkType::TurboPlonk, 1000);
-    plonk_batch_verify_bench!(BW6_761, Fr761, PlonkType::TurboPlonk, 1000);
-    plonk_batch_verify_bench!(Bls12_381, Fr381, PlonkType::UltraPlonk, 1000);
-    plonk_batch_verify_bench!(Bls12_377, Fr377, PlonkType::UltraPlonk, 1000);
-    plonk_batch_verify_bench!(Bn254, Fr254, PlonkType::UltraPlonk, 1000);
-    plonk_batch_verify_bench!(BW6_761, Fr761, PlonkType::UltraPlonk, 1000);
-}
-
 fn main() {
-    bench_prove();
+    // bench_prove();
     bench_verify();
-    bench_batch_verify();
 }

Then, to run the benchmarks, in the root of the repo as per the README, I ran:

time cargo bench --bench plonk-benches --features=test-srs

To run single-threaded:

time RAYON_NUM_THREADS=1 cargo bench --bench plonk-benches --features=test-srs

Some jellyfish notes/caveats:

  • it generates a circuit with just addition gates
    • it just insert $n$ gates where gate $i$ simply adds 1 to gate $i-1$
    • it is unclear what gate 0 is initialized to: cs.zero() does not mean the gate is given the value 0!
    • not really sure what the relation being proved is here: the public input in the code is the empty vector
    • also unclear what values the gates get assigned
  • by default runs multithread on all cores (judging from htop output)
  • single-threaded verification is a little bit slower
  • TurboPLONK has custom gates and UltraPLONK adds lookups
Library Scheme # gates # threads Prove Verify Proof
jellyfish TurboPLONK $2^{22}$ default 104.16 secs 1.36 ms 769 bytes
jellyfish UltraPLONK $2^{22}$ default 190.86 secs 1.41 ms 1,481 bytes
jellyfish TurboPLONK $2^{13}$ 1 don’t care 1.55 ms 769 bytes
jellyfish UltraPLONK $2^{13}$ 1 don’t care 1.72 ms 1,481 bytes

Benchmark single-thread jellyfish proving times. Play with the witness size: figure out how to initialize the first gate to 0 vs. $p/2$. Benchmark dusk-network/plonk PLONK.

References

For cited works, see below 👇👇