Circom

 

tl;dr: Everything I wanted to know but was afraid to ask about circom.

$ $

Notes

As put by Palladino1:

The main challenges [in] Circom is understanding when you are writing constraint-generation code, and when you are writing witness-generation code, and tracking what values are known at compile-time versus only known at runtime. […] Keeping in mind the difference between constraint and witness generation times can guard you against underconstrained computation bugs.

Terminology

  • witness generation: computation of hints a.k.a. “witness data” that aids in satisfying the constraints encoded in a circom template
  • variables
    • circom docs say: “variables are identifiers that hold non-signal data and are mutable.”
    • but, a more accurate mental model, AFAICT, is that they are compile-time symbolic expressions (e.g., often they are used to symbolically-store a linear combination of signals at compile time; see the Num2Bits example)
  • functions: reusable pieces of code used only during witness generation
  • signals
    • input signals; i.e.,:
      • part of the NP relation’s statement or witness
      • you cannot do s <== expression for signal input s; only for signal s.
    • output signals
    • intermediate signals (computed via witness generation)
  • template: a family of circuits, parameterized by one or more values known at compile-time (think C++ templates)
    • template parameters
  • components
  • assignment
  • constraints

Operators

  • a \ b computes the $\lfloor a / b\rfloor$ division quotient as if dividing natural numbers (i.e., s.t. $a=b q + r$ and $0 \le r < b$, when viewed as natural numbers)

Signals

Input signals versus output signals.

I understand that marking a signal in a template as input or output helps define a “function-like” interface for the template, with some nicer calling syntax that avoids explicitly instantiating a template via the component keyword.

An example:

pragma circom 2.0.0;

template Multiplier2(){
   signal input in1;
   signal input in2;
   signal output out <== in1 * in2;
}

component main {public [in1, in2]} = Multiplier2();

As emphasized in the circom docs: “all output signals of the main component are public (and cannot be made private)”.

In contrast, “the input signals of the main component are private if not stated otherwise using the keyword public as above”

Variables: var versus signal

As far as I can tell, var is a poor-name for a “compile-time derivable linear combination of signals” (including the trivial one like var = some_signal; or var = SomeTemplateCall(N)(in);).

That’s typically what you put in a var, AFAICT.

You can also put (compile-time) constants or any other compile-time expression.

An example of how a var is a compile-time linear combination of signals is in Num2Bits.

Witness generation versus constraints

Find a more natural circuit that showcases this difference. Not IsZero().

As explained by Palladino1:

When writing a circuit you’re writing two different programs, that belong to two different programming paradigms, in a single one.

The most important thing to know: when (not) to use <--.

Tags

Very useful for safety. See nice article by Hess2.

e.g., the result of the IsZero template could be tagged as binary to indicate that the returned output is either 0 or 1:

template IsZero() {
  signal input in;
  signal output {binary} out;
  signal inv;

  inv <-- in != 0 ? 1/in : 0;
  out <== -in * inv +1;

  in * out === 0;
}

I found Circom’s docs confusing because they make it seem like binary is a language-reserved tag with this actual meaning? (e.g., “This tag means that each signal in the array is always expected to be 0 or 1, in order to compute the corresponding number correctly.”)

The code below would propagate the binary tag to the a signal:

component isz = IsZero();
isz.in <== 10;
signal a <=== isz.out

Tags can have compile-time values:

“Valued tags behave like parameters which means that they can only be assigned to values known at compilation time.”

e.g., a template’s compile time parameter can be used to tag its output signal’s max bit length:

template Bits2Num(n) {
    signal input {binary} in[n];
    signal output {maxbit} out;
    var num = 0;

    var pow_2 = 1;
    for (var i = 0; i < n; i++) {
        num += in[i] * pow_2;
        pow_2 = pow2 + pow_2;
    }
    out.maxbit = n;

    num ==> out;
}

This suggests max lengths of strings could be tags.

Buses

Very useful for readable, maintainable code. And therefore, for safety. See here.

Buses, combined with tags can start emulating a safe type system in circom.

For example, an elliptic curve point can be represented as:

bus Point() {
    signal x;
    signal y;
}

You can even tag buses: e.g., an Edwards Point() vs. a Montgomery one! So, you can have templates that work like this:

template Edwards2Montgomery () {
  input  Point() { edwards_point    } in;
  output Point() { montgomery_point } out;

  out.x <–- (1 + in.y) / (1 - in.y);
  out.y <–- out.x / in.x;

  out.x * (1 - in.y) === (1 + in.y);
  out.y * in.x === out.x;
}

Or, you can even tag signals in buses:

bus Book () {
  signal {maxvalue} title[50];
  signal {maxvalue} author[50];
  signal {maxvalue} sold_copies;
  signal {maxvalue} year;
};

Buses can be nested (but not recursed):

bus Date() {
    signal day;
    signal month;
    signal year;
}

bus Person() {
    signal name[50];
    Date() birthday;
}

Buses can be parameterized by compile-time constants:

bus Vector(N) {
    signal v[N];
}

bus Matrix(NUM_ROWS, NUM_COLS) {
	Vector(NUM_ROWS) column[NUM_COLS];
}

Assert

The assert keyword is particularly “interesting.” I think I’m gonna stay away from it.

Quoting from here, it is supposed to add constraints:

// if all values are known at compile-time the assertion is checked then
// otherwise, the assertion creates a constraint.
assert(a <= b);
assert(a * a == b);

But I don’t think that’s true, from looking at the circom docs and from here or here: it does not add constraints.

Non-trivial examples

IsZero() template

This is a template that outputs 1 when the input signal is 0 and outputs 0 otherwise.

It should be the most trivial thing to implement, no? Yet it turns out that, while checking whether something is zero is trivial (i.e., something === 0), assigning a signal with the truth value of something === 0 is not as trivial… (This is because the only operations in circom are equality checks, addition and multiplication.)

This example will also illustrate the difference between witness generation versus constraining.

1
2
3
4
5
6
7
8
9
10
template IsZero() {
  signal input in;
  signal output out;
  signal inv;

  inv <-- in != 0 ? 1/in : 0;

  out <== -in * inv + 1;
  in * out === 0;
}

How does it work? Here’s a simple analysis:

  • When in is 0, then:
    • out <== -in * inv + 1 (line 8) implies out <== 0 * inv + 1.
    • Which implies out <== 1, ensuring the desired behaviour.
    • Q: But then inv does not even matter for this case? Which is why it is assigned 0? But why 0? It could be assigned anything it seems.
  • When in is not 0, then:
    • in * out === 0 (line 9) implies out === 0.
    • As a result, out <== -in * inv + 1 implies 0 <== -in * inv + 1
    • Which implies in * inv === 1
    • Which can be satisfied, since the witness generation can compute an inverse via the inv <-- in != 0 ? 1/in : 0 line.

Observation: I find the explanation in circom’s docs a bit confusing, perhaps even in a dangerous way. It says: “If in is 0, then […]. Otherwise, in * inv is always 1, then out is 0.” This could be interpreted to suggest that circom enforces that inv be correctly computed as the inverse of in. Which is not the case. It’s not that circom guarantees in * inv will be always 1; it’s that when in is not zero, the only way to satisfy line 8 and 9’s constraints is to correctly compute the inverse inv via the witness generation on line 6.

In a way, this rather-elegant IsZero() template perfectly illustrates how non-intuitive circom can be when it comes to efficiently-programming ZK relations.

Num2Bits() template

Let’s also consider the Num2Bits() example from circom’s docs, which returns the binary representation of its input, as n bits (although edited with more sane variable names and annotated with comments).

This example illustrates:

  • template parameters (e.g., n)
  • variables versus signals
  • witness generation versus constraining
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/**
 * Outputs a truth bit for whether in[0] < in[1] when viewed as N-bit 
 * integers.
 *
 * Parameters:
 *   N      in[0] and in[1] MUST be in [0, 2^N)
 *
 * Input signals:
 *   in     array of the two numbers to be compared
 *
 * Output signals:
 *   out    a bit indicating whether in[0] < in[1] when viewed as integers 
 */
template LessThan(N) {
    // This seems to (unfortunately) be hard-coded for the scalar field of BN254,
    // where scalars are \le (p - 1) and where 2^253 < p < 2^254.
    //
    // The choice of N <= 252 ensures that the maximum value assigned to `bits` 
    // below is 2^N + (2^N - 1) = 2^{N+1} - 1 = 2^253 - 1, and thus does not
    // exceed p-1.
    //
    // (If N were <= 253, then this maximum value would have been 2^254 - 1
    //  which would be larger than p-1 and would thus not fit in a signal.)
    assert(N <= 252);

    signal input in[2];
    signal output out;  // TODO(Tag): add {binary} tag?

    // (N+1)-bit representation of m = (2^N + in[0]) - in[1]
    // e.g., for N = 3
    //  in[0] = 100
    //  in[1] = 001
    //  m     = (1000 + 100) - 001
    //        = 1100 - 001
    //        = 1011
    signal bits <== Num2Bits(N + 1)(in[0] + (1 << N) - in[1]);

    // if in[0] >= in[1], then:
    //   bits[N], the nth bit toggled above by adding 2^N, remains 1 => out is 0
    // else:
    //   bits[N] turns into 0 => out is 1
    out <== 1 - bits[N];
}

It was actually ChatGPT who helped me understand that acc merely stores a symbolic expression that is derivable at compile-time (see here), because the circom docs, as far as I could tell, do not do a good job at clarifying the distinction between circom signals and circom variables. So: “thx chat!”

DivRem() template

It’s surprisingly tricky to implement a basic “division with remainder” in circom because emulating integer division in a finite field is rife with dangers. For example:

Normally, for any $a,b\in \N$, Euclid’s division theorem says there exist unique $q,r\in\N$ s.t.: \begin{align} a = b q + r, 0\le r < b \end{align}

Now, suppose we are implementing a circom template that checks $a,b,q,r$ for the properties above via:

a === b * q + r;
is_small_remainder <== LessThan([r, b]);

Are we good? Is this enough to guarantee uniqueness of $q$ and $r$?

Nope! If the field were $\F_{13}$, then dividing $a=10$ by $b=3$ we would have two solutions: \begin{align} q = 3 \wedge r = 1\Rightarrow 3\times 3 + 1 = 10 \end{align} But also: \begin{align} q = 7 \wedge r = 2\Rightarrow 7\times 3 + 2 = 23 \end{align} And $23 \bmod 13 = 10$.

To fix this, we actually have to check that quotient $q$ is “in range”: i.e., $|q| = |a| - |b| + 1$ ($|x| = $ minimum # of bits needed to represent $x$):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Assuming a is not zero, it should be enough to ensure that
//   2^{n-1} <= a < 2^n
// TODO: There should be more efficient methods based on Num2Bits.
//
// e.g., 1 bit  to represent 1: 2^{1-1} = 1 <= 1 < 2^1 = 2
// e.g., 2 bit  to represent 2: 2^{2-1} = 2 <= 2 < 2^2 = 4
// e.g., 2 bits to represent 3: 2^{2-1} = 2 <= 3 < 2^2 = 4
// e.g., 3 bits to represent 4: 2^{3-1} = 4 <= 4 < 2^3 = 8
// e.g., 3 bits to represent 5: 2^{3-1} = 4 <= 5 < 2^3 = 8
// e.g., 3 bits to represent 6: 2^{3-1} = 4 <= 6 < 2^3 = 8
// e.g., 3 bits to represent 7: 2^{3-1} = 4 <= 7 < 2^3 = 8
template MinNumBits(N)() {
    signal input {nonzero} a;
    signal output n;

    // TODO: Impl
}

// a < 2^N
// b < 2^M
// Note: We won't really be interesting in dividing zero, I suppose?
// (If we were, we can add simpler constraints that q == 0 and r == 0.)
template DivRem(N, M) {
    signal input {nonzero} a, b;
    signal output q, r;

    signal q <-- a \ b;
    signal r <-- a % b;

    // Check Euclidean division theorem, including r \in [0, b)
    a == q * b + r;

    signal valid_remainder <== LessThan(M)([r, b]);
    valid_remainder === 1;

    // Check that the quotient is bounded, otherwise we are in trouble
    signal n <== MinNumBits(N)(a);
    signal m <== MinNumBits(M)(b);

    // We cannot use anything smaller than N here since if b is 1, then q == a
    signal k === MinNumBits(N)(q);
    k == n - m + 1

}

You can see an implementation of a related template that computes just the remainder $r$, when a bound on $a$ is known, here.

TODOs

How do we assert non-overflows on variables like? What if they exceed the field size?

var MAX_QUOTIENT = (3 * MAX_ENCODED_LEN) \ 4;

Conclusion

circom is an amazing tools for implementing ZK relations that (obviously) makes the developer’s job 10,000x easier than writing R1CS constraints directly.

I am still trying to understand the many pitfalls developers can fall into when writing circom code, most importantly, under-constraining bugs.

Looking forward to looking at NoirLang to see if it makes it easier to distinguish between constraints and witness generation?

Awesome list

References

For cited works and other resources3$^,$4$^,$5$^,$6$^,$7, see below 👇👇

  1. A beginner’s intro to coding zero-knowledge proofs, by Santiago Palladino, May 4th 2023  2

  2. Tag, you’re it: Signal tagging in Circom, by Tjaden Hess, January 2nd, 2024 

  3. 0xPARC ZK bug tracker 

  4. Breaking down TornadoCash, a live explanation of TorandoCash’s circom implementation 

  5. Ecne, “verify that R1CS equations uniquely determine outputs given inputs (i.e. that the constraints are sound)” 

  6. Circom101 book 

  7. CIRCOM: A Robust and Scalable Language for Building Complex Zero-Knowledge Circuits, by Jose L. Muñoz-Tapia and Marta Belles and Miguel Isabel and Albert Rubio and Jordi Baylina, 2022, [URL]