Introduction

On May 20th, 2024, Linea requested an audit of the gnark std library. The audit was conducted by three zkSecurity engineers over a period of three weeks. The audit focused on the correctness and security of parts of the gnark std library.

We found the documentation and the code to be of high quality, and the team was especially helpful in discussing the various issues brought up during the engagement.

Scope

The scope of the audit included the following components:

Before we list the findings, we give a brief overview of some of the components of the gnark std library that were audited.

Non-native Arithmetic Implementation

Simulating operations of non-native fields in the native field (i.e. the circuit field) can be problematic as the values might be larger than the circuit field, and the operations might also lead to wrap-arounds which would lead to an incorrect result.

The general idea behind implementations of non-native arithmetic operations is to represent the non-native field elements as a number of limbs in the native field. For example, an element in the base field (or scalar field) of secp256k1 (~256 bits) can fit into 4 limbs of 64-bit integers, which can be represented as field elements in our circuit.

Then, operations are performed on the limbs themselves, potentially overflowing them, and the result is then reduced modulo the prime of the native field.

Reduction

The idea of doing a modular reduction from a value a (that might have become bigger than the non-native modulus f) to a value b, is to prove the following:

a=bmodf

This is essentially the same as proving that we have the values b and k in the integers, such that b,k<f:

a=b+k·f

Now, if we replace the variables with their limbs (of t bits) we obtain something like this:

iai·2ti=ibi·2ti+(iki·2ti)·(fi·2ti)

The first idea is that the limbs bi and ki are provided as a hint by the prover, are range-checked (using table lookups), and then the equation is checked.

But checking the equation as is is not possible as we already know that the modulus f is too large for our circuit field.

So the second idea is to represent the different values as polynomials with their limbs as coefficients. (This way we can instead check that the left-hand side polynomial is equal to the right-hand side polynomial.) For example, for a we have the polynomial:

a(x)=iai·xi

Note that this polynomial only takes the value a for x=2t.

So checking our previous equation, is reduced to checking that the following equation is true at the point x=2t:

a(x)=b(x)+k(x)·f(x)

Since 2t is a root, we know that there exist a polynomial c(x) such that:

a(x)b(x)+k(x)·f(x)=c(x)·(x2t)

So the third idea is to let the prover hint this polynomial c(x), and to let them show us that this identity exists. We can either do that at d+1 points if the maximum degree of the polynomials is d, or we can do that at a single random point using the Schwartz-Zippel lemma.

The latter solution is what gnark does, using an externalized Fiat-Shamir protocol (which we go over in Out-of-circuit Fiat-Shamir with Plonk) to compute a random challenge based on all of the input to the identity check.

Furthermore, since the same challenge is used for ALL field operations in a circuit, the computation of the challenge AND the identity checks are deferred to the end of the circuit compilation, once the externalized Fiat-Shamir protocol has amassed all of the inputs.

Multiplication

Multiplication is pretty straight forward, using the previous technique we check that a·b=cmodf by checking the following identity:

a(x)·b(x)=r(x)+k(x)·f(x)+c(x)·(x2t)

Thus, after evaluating the different polynomials at some random point α, we check that

a(α)·b(α)=r(α)+k(α)·f(α)+c(α)·(α2t)

Note that f(α) and (α2t) only need to be computed once for all multiplication checks.

Subtraction bounds

We are trying to perform abmodp on limbs of t bits.

To avoid underflow, we want to increase every limb ai with some padding ui such that:

  1. they won’t underflow in the integers: (ai+ui)bi>0.
  2. the overall padding u=iui·2ti won’t matter in the equation as it’ll be a multiple of the modulus: u=k·p

To satisfy 2, they create a value u1>p, then find its inverse of modulo p by computing as u_2 = (-u_1) % p. Since we have that u1>p and u2<p, we also have that u1+u2=k·p for some k>0.

To satisfy 1, u1 and u2 are created such that ui=u1+u2>bi.

First, u1 is created by setting its limbs to an overflowed value: u1,i=2t

Second, u2 is created as its negated congruent value as explained above: u_2 = (-u_1) % p, and decomposed into t-bit limbs such that u2,i<2t for all i.

  1. we need to prove that ui>bi
  2. which is equivalent to proving that u1,i+u2,i>bi
  3. by construction, 2t>bi
  4. u1,i+u2,i2t, since u1,i=2t and u2,i<2t
  5. by 3 and 4, we have that u1,i+u2,i>bi

One can easily extend the proof to include b’s potential overflow o, so that we have 2t+o>bi and set u1,i=2t+o.

Overflows

When adding two limbs of n-bit together, the maximum value obtained can be of n+1 bits. For the simple reason that using the maximum values we see that:

(2n1)+(2n1)=2n+12

When multiplying two limbs of n-bit together, we potentially obtain a 2n-bit value. We can see that by multiplying the maximum values again:

(2n1)·(2n1)=22n2·2n+122n

As such, …

Sometimes, it is easier to perform an addition or multiplication without directly reducing modulo the non-native field modulus. Reducing would allow us to get back to a normal representation of our non-native field element (e.g. four limbs of 64 bits).

Not reducing, means that we’re operating on the limbs directly, and as we’ve seen above this means overflow! And who says overflow, also says wrapping around our circuit modulus, which is a big no no.

For this reason, the code tracks the size of the limbs (including overflows that might have occurred) and ensures that the next resulting limbs will have manageable overflow (meaning that they won’t overwrap the circuit modulus).

The addition is quite simple to understand, so let’s focus on the multiplication here. If we were to perform a multiplication between two integer elements and their limbs, we would compute something like this:

a·b=(iai·2ti)(ibi·2ti)=i,jai·bj·2(i+j)t

If we want to imagine three limbs, we would get something like this:

a0+b0+(a0·b1+a1·b0)·2t+(a0·b2+a1·b1+a2·b0)·22t+(a1·b2+a2·b1)·23t+(a2·b2·)24t

Which gives us a result scattered across 5 limbs, with the middle limb being the largest.

If we generalize that, we obtain a result of l0+l11 limbs, if l0 (resp. l1) is the number of limbs of a (resp. b). With the middle limb being of size 2t+o1+o2+log2(n) where n is the number of terms in the largest limb (that middle limb).

We can see that by taking the maximum values again. We have n pairwise multiplication of limbs where a limbs have an overflow of o1 and b limbs have an overflow of o2. Thus we have:

(2t+o11)(2t+o21)·n=(22t+o1+o22t+o12t+o2+1)·n<(22t+o1+o2·2log2(n)

Note: Interestingly, the overflows are tracked element-wise, and not limb-wise. Not sure why I find that interesting, maybe it wouldn’t make sense to track that at the limb level because we have to care about the largest limb to know if we need to reduce the whole element (and its limbs) or not.

Note: we don’t care about overflow of the total value, because we never really act on the total value. The overflow only ever impacts the operation between two limbs and that’s it!

Elliptic curve gadgets

Several elliptic curves and their operations (including pairings for pairing-friendly curves) are implemented on top of different circuit fields.

When a curve’s base field is the same as the circuit field, then the curve is implemented “natively”, meaning that its coordinates are represented directly as field elements in the circuit field, and operations are implemented using the default arithmetic supported by the circuit field.

On the other hand, when a curve’s base field differs from the circuit field, then the emulated field types are used (see Non-native arithmetic for fields) to emulate the coordinates of a point, increasing the number of constraints in elliptic curve operations implemented on top.

There are three curves implemented via emulated field types BLS12-381, BN254, and BW6-761.

For the native curves, BLS12-377 is implemented natively on top of BW6-761 and BLS24-315 is implemented natively on top of BW6_633.

In addition, native twisted Edward curves are defined for every circuit field. For example, Baby Jubjub for BN254.

Multiplexer gadgets

The multiplexer libraries implement dynamic indexing in data structures. Since we’re in a circuit, dynamic indexing means that lookup algorithms are at best linear, as they have to handle every value potentially being checked out.

This makes implementing array lookups, slicing arrays, associated arrays, quite complicated and costly in practice. This section explains how the implementations work.

N-to-1 multiplexers in mux and multiplexer

There are two ways that multiplexers are implemented. The first way is used when the array of value is a power of 2, allowing the use of a binary tree:

mux

The second way is to simply create a mask: an array of bits b where a single bit is set to 1 where a value must be retrieved. The mask is then verified in circuit to be a well-formed hot vector, i.e. that it only has a single bit set in the correct position and surrounded by 0s). The value v is finally obtained by summing the dot products of the mask and the values v:

v=ibi·vi

From these two ways of constructing N-to-1 multiplexers, arrays and associated arrays are implemented as simple wrappers around these constructions.

N-to-n multiplexers in slices

The slice implementation allows a caller to provide an interval within an array, and nullify anything not in that interval. The implementing uses mask to nullify the right side, and then the left side, of that interval, as picture below:

slice

The mask is used so that values are copied from the input only when the matching bit within the mask is 1, and not copied when the matching bit within the mask is 0. In pseudo-code:

def(input_, mask1, mask2):
    for i in range(0, len(out)):
        # mask1 = [1, 1, 1, 1, 0, 0, 0, 0]
        #                      ^
        #                     end
        out[i] = input_[i] * mask1[i]

    for i in range(0, len(out)):
        # mask 2 = [0, 0, 1, 1, 1, 1, 1, 1]
        #                 ^
        #                start
        out[i] = out[i] * mask2[i]

    # at this point out only contains the [start, end] range of input:
    # out = [0, 0, _, _, 0, 0, 0, 0]
    return out

The mask is passed as a hint, and the following function that constrains the correctness of the mask is implemented in the gadget:

def verify_mask(out, pos, start_val, end_val):
    # pos = 0 means that everything will be end_val
    if pos != 0:
        assert out[0] == start_val

    # pos = -1 means that everything will be start_val
    if pos != len(out) - 1:
        assert out[-1] == end_val

    # ensure the following:
    # 
    # [start_val, start_val, end_val, end_val, end_val]
    #                        ^^^^^^^
    #                          pos
    for i in range(1, len(out)):
        if i != pos:
            assert out[i] = out[i-1]

Range checks implementation

Range checks are gadgets that check if a value v belongs to some interval [a,b]. The lower-level APIs that are exposed in gnark’s std library allow developers to check specifically that a value is in the range [0,2n) for some n. In other words, a range check enforces that a value v is n bits.

These range checks are implemented in two ways: either by verifiably decomposing the value into an n-bit array–where verifiably means that ibi2i=v is added as a constraint in the circuit– or by using a table lookup.

To range check values with a table lookup, the idea is pretty simple: create a table of all the values between 0 and some power of 2 (decided dynamically at compile time based on the bit sizes requested by the circuit range checks). Then cut a value v in limbs of size that power of 2 (verifiably, as explained above), and check that each limb is in the table (via a table lookup argument is out of scope for this document).

As the last limb might not be a perfect power of 2, its value is shifted to the left to fit the max value of the table.

For example, if we need to check that a value v is 9 bits using a table that gathers elements from 0 to 241, then we will need three limbs of 4 bits, with the last limb being 1 bit. To check the last limb, the value is shifted by 3 bits to the left.

Out-of-circuit Fiat-Shamir with Plonk

The computation of expensive functions in circuits are quite challenging, or even self-defeating in some cases. For this reason, Linea designed and implemented a scheme to delegate a specific computation: deriving randomness in a circuit. This design was introduced in the Fiat-Shamir computation of GKR statements in Recursion over Public-Coin Interactive Proof Systems; Faster Hash Verification (BSB22), but is used in the context of non-native arithmetic in the code we looked at.

To understand how the scheme works, first let’s take a simpler scenario and solution. Imagine that at some points in a circuit we want to compute the hash of some values, HASH(a, b, c), but that the hash function is very expensive to compute.

One way to solve that, is to just hint the result of the computation d = HASH(a, b, c) in the circuit (as the prover), and then expose a, b, c, d as public inputs. (And of course, make sure that the circuit wires these new public input values to their associated witness values in the circuit.) We illustrate this in the following diagram (where the pink rows are the new public inputs that are exposed, and d is the result of the hash function):

plonk1

The verifier can then verify that the digest d was computed correctly before interpolating the public input vector. (Or better, produce d themselves and insert it into the public input vector to avoid forgetting to check it. This works as plonk checks that the following constraint checks out as part of the protocol (if you imagine that other wires and gates are not enabled on the public input rows):

L(x)PI(x)=0

But at this point, the witness values that take part in the hash computation are exposed in the public input, and are thus leaked…

Can we hide these values to the verifier? Notice that for Fiat-Shamir use cases, we do not need to compute exactly the HASH function, but rather we need to generate random values based on the inputs. That’s right, it does not matter how we generate these random values as long as they are based on the private inputs. As such, we can relax our requirements and we can do something like this instead: d = HASH(commit(a, b, c)).

As the challenge is used for Fiat-Shamir, we can leave it in the public input, but now the inputs can be kept in the witness. Although, they are kept in rows that resemble public input rows, in that the left wire is enabled and no gate is selected.

Then, a hiding commitment of the values is produced by the prover and sent to the verifier, as we show in this diagram:

plonk2

We are almost here, there’s still one more problem: we need to prevent the prover from using this new commitment to alter random witness values. To do this, the verifier key includes one more selector vector that dictates where this new commitment can apply:

plonk3

Which ends up producing a constraint that looks like this (if again, we ignore the gates and wires not enabled in these rows):

L(x)PI(x)V(x)·Qcp(x)=0

Note that the implementation we looked at was generalized to work with n Fiat-Shamir instances by having n commitments [Vi(x)] and n committed selectors [Qcpi(x)], as well as n reserved rows of public inputs for the n digests.

On top of that, they evaluate commitments to the selectors [Qcpi(x)] during the linearization phase of Plonk, so as not to care about the randomization of the committed inputs to Fiat-Shamir, although the commitment is still randomized by randomizing entries of the Qcp vector in rows that are not activated.

In-Circuit KZG Verification

The in-circuit KZG implementation in gnark/std/commitments/kzg closely follows the one in the gnark-crypto library.

Single-Point Opening Proof Verification

For claimed value v=f(u) of polynomial f(X) at point u, KZG polynomial commitment verification is implemented as a pairing check:

e(v·G1[f(α)]G1u·[H(α)]G1,G2)·e([H(α)]G1,[α]G2)=?1

The commitment of f(X) is a 𝔾1 element [f(α)]G1.

The evaluation point is u𝔽r — an element of the scalar field of 𝔾1.

The opening proof is just 2 values: - [H(α)]G1 — commitment of the quotient polynomial H(X), - v=f(u)𝔽r — claimed value of f(X) at evaluation point u.

The verification key consists of - G2,[α]G2 — generator of group 𝔾2 scaled to 1 and α — toxic secret (supposed to be) destroyed after the generation of SRS (powers of α multiplied with G1 and G2). - G1 — generator of group 𝔾1.

The above check is equivalent to checking that

e([f(u)f(α)]G1,G2)=e([(uα)·H(α)]G1,G2)

Which it should if there exists the quotient polynomial H(X), defined as

H(X)=f(u)f(X)uX

The prover provides [H(α)]G1 as part of the proof, which is highly unlikely (with probability 1/|𝔽r|) if the prover doesn’t know the polynomial H(X).

Fold Batch Opening Proof at a Single Point

This method is used by the in-circuit Plonk verifier to fold multiple evaluation proofs at a single point ζ.

It takes: - a list commitments [fi(α)]G1, - evaluation point u, - a batch opening proof which consists of: - a single quotient polynomial commitment [H(α)]G1, - a list of values of the committed polynomials at the evaluation point vi=fi(u).

And produces: - folded commitment [f]G1, and - folded opening proof ([H(α)]G1,v) — here, the quotient commitment is simply passed through from the input, and v is a produced folded evaluation.

A random challenge λ is obtained by hashing all the input data, except the quotient commitment [H(α)]G1 because it is computed by the prover at a later stage and is itself dependent on λ.

The logic from here is straightforward. Folded commitment:

[f]G1=i=0n1λi·[fi(α)]G1

Folded evaluation:

v=i=0n1λi·fi(ui)

Multi-Point Batch Verification

This method is used by the in-circuit Plonk verifier instead of verification steps 11-12 in the Plonk paper.

Multi-point batch verification of KZG commitments and opening proofs takes a list of triplets (for i[0,n1]), of: - polynomial commitment [fi(α)]G1 - evaluation point ui𝔽r - opening proof ([Hi(α)]G1,vi=fi(ui))

All of them are checked against a single verification key (G1,G2,[α]G2).

The final check boils down to a pairing check:

e(v·G1[f]G1[uH]G1,G2)·e([H]G1,[α]G2)=?1

It looks similar the one for a single KZG proof verification. We will see in a second, how its components are obtained.

A random value λ is produced by hashing all the input data. Then it’s used to produce folded values for the final pairing check.

Folded quotients:

[H]G1=i=0n1λi·[Hi(α)]G1

Folded products of evaluation points and quotients:

[uH]G1=i=0n1λi·ui·[Hi(α)]G1

Folded evaluations:

v=i=0n1λi·fi(ui)

Folded commitments:

[f]G1=i=0n1λi·[fi(α)]G1