Introduction
On May 20th, 2024, Linea requested an audit of the gnark std library. The audit was conducted by three zkSecurity engineers over a period of three weeks. The audit focused on the correctness and security of parts of the gnark std library.
We found the documentation and the code to be of high quality, and the team was especially helpful in discussing the various issues brought up during the engagement.
Scope
The scope of the audit included the following components:
- Non-native arithmetic for fields, which is used to emulate various foreign fields, their elements and their operations in a circuit.
- Non-native arithmetic for curves, which is used to implement elliptic curves.
- Multiplexers, which is used to dynamically index or range over a fixed-sized array of values in a circuit.
- Range checks API, which is used to check if a value is within a given range in a circuit.
- In-circuit Plonk verifier, which is used to verify a Plonk proof in a circuit.
- In-circuit KZG verifier, which is used to batch verify KZG polynomial commitments and evaluation proofs by the in-circuit Plonk verifier.
Before we list the findings, we give a brief overview of some of the components of the gnark std library that were audited.
Non-native Arithmetic Implementation
Simulating operations of non-native fields in the native field (i.e. the circuit field) can be problematic as the values might be larger than the circuit field, and the operations might also lead to wrap-arounds which would lead to an incorrect result.
The general idea behind implementations of non-native arithmetic operations is to represent the non-native field elements as a number of limbs in the native field. For example, an element in the base field (or scalar field) of secp256k1 (~256 bits) can fit into 4 limbs of 64-bit integers, which can be represented as field elements in our circuit.
Then, operations are performed on the limbs themselves, potentially overflowing them, and the result is then reduced modulo the prime of the native field.
Reduction
The idea of doing a modular reduction from a value (that might have become bigger than the non-native modulus ) to a value , is to prove the following:
This is essentially the same as proving that we have the values and in the integers, such that :
Now, if we replace the variables with their limbs (of bits) we obtain something like this:
The first idea is that the limbs and are provided as a hint by the prover, are range-checked (using table lookups), and then the equation is checked.
But checking the equation as is is not possible as we already know that the modulus is too large for our circuit field.
So the second idea is to represent the different values as polynomials with their limbs as coefficients. (This way we can instead check that the left-hand side polynomial is equal to the right-hand side polynomial.) For example, for we have the polynomial:
Note that this polynomial only takes the value for .
So checking our previous equation, is reduced to checking that the following equation is true at the point :
Since is a root, we know that there exist a polynomial such that:
So the third idea is to let the prover hint this polynomial , and to let them show us that this identity exists. We can either do that at points if the maximum degree of the polynomials is , or we can do that at a single random point using the Schwartz-Zippel lemma.
The latter solution is what gnark does, using an externalized Fiat-Shamir protocol (which we go over in Out-of-circuit Fiat-Shamir with Plonk) to compute a random challenge based on all of the input to the identity check.
Furthermore, since the same challenge is used for ALL field operations in a circuit, the computation of the challenge AND the identity checks are deferred to the end of the circuit compilation, once the externalized Fiat-Shamir protocol has amassed all of the inputs.
Multiplication
Multiplication is pretty straight forward, using the previous technique we check that by checking the following identity:
Thus, after evaluating the different polynomials at some random point , we check that
Note that and only need to be computed once for all multiplication checks.
Subtraction bounds
We are trying to perform on limbs of bits.
To avoid underflow, we want to increase every limb with some padding such that:
- they won’t underflow in the integers: .
- the overall padding won’t matter in the equation as it’ll be a multiple of the modulus:
To satisfy 2, they create a value , then find its inverse of modulo by computing as u_2 = (-u_1) % p
. Since we have that and , we also have that for some .
To satisfy 1, and are created such that .
First, is created by setting its limbs to an overflowed value:
Second, is created as its negated congruent value as explained above: u_2 = (-u_1) % p
, and decomposed into -bit limbs such that for all .
- we need to prove that
- which is equivalent to proving that
- by construction,
- , since and
- by 3 and 4, we have that
One can easily extend the proof to include b’s potential overflow , so that we have and set .
Overflows
When adding two limbs of -bit together, the maximum value obtained can be of bits. For the simple reason that using the maximum values we see that:
When multiplying two limbs of -bit together, we potentially obtain a -bit value. We can see that by multiplying the maximum values again:
As such, …
Sometimes, it is easier to perform an addition or multiplication without directly reducing modulo the non-native field modulus. Reducing would allow us to get back to a normal representation of our non-native field element (e.g. four limbs of 64 bits).
Not reducing, means that we’re operating on the limbs directly, and as we’ve seen above this means overflow! And who says overflow, also says wrapping around our circuit modulus, which is a big no no.
For this reason, the code tracks the size of the limbs (including overflows that might have occurred) and ensures that the next resulting limbs will have manageable overflow (meaning that they won’t overwrap the circuit modulus).
The addition is quite simple to understand, so let’s focus on the multiplication here. If we were to perform a multiplication between two integer elements and their limbs, we would compute something like this:
If we want to imagine three limbs, we would get something like this:
Which gives us a result scattered across 5 limbs, with the middle limb being the largest.
If we generalize that, we obtain a result of limbs, if (resp. ) is the number of limbs of (resp. ). With the middle limb being of size where is the number of terms in the largest limb (that middle limb).
We can see that by taking the maximum values again. We have pairwise multiplication of limbs where limbs have an overflow of and limbs have an overflow of . Thus we have:
Note: Interestingly, the overflows are tracked element-wise, and not limb-wise. Not sure why I find that interesting, maybe it wouldn’t make sense to track that at the limb level because we have to care about the largest limb to know if we need to reduce the whole element (and its limbs) or not.
Note: we don’t care about overflow of the total value, because we never really act on the total value. The overflow only ever impacts the operation between two limbs and that’s it!
Elliptic curve gadgets
Several elliptic curves and their operations (including pairings for pairing-friendly curves) are implemented on top of different circuit fields.
When a curve’s base field is the same as the circuit field, then the curve is implemented “natively”, meaning that its coordinates are represented directly as field elements in the circuit field, and operations are implemented using the default arithmetic supported by the circuit field.
On the other hand, when a curve’s base field differs from the circuit field, then the emulated field types are used (see Non-native arithmetic for fields) to emulate the coordinates of a point, increasing the number of constraints in elliptic curve operations implemented on top.
There are three curves implemented via emulated field types BLS12-381, BN254, and BW6-761.
For the native curves, BLS12-377 is implemented natively on top of BW6-761 and BLS24-315 is implemented natively on top of BW6_633.
In addition, native twisted Edward curves are defined for every circuit field. For example, Baby Jubjub for BN254.
Multiplexer gadgets
The multiplexer libraries implement dynamic indexing in data structures. Since we’re in a circuit, dynamic indexing means that lookup algorithms are at best linear, as they have to handle every value potentially being checked out.
This makes implementing array lookups, slicing arrays, associated arrays, quite complicated and costly in practice. This section explains how the implementations work.
N-to-1 multiplexers in mux
and multiplexer
There are two ways that multiplexers are implemented. The first way is used when the array of value is a power of 2, allowing the use of a binary tree:
The second way is to simply create a mask: an array of bits where a single bit is set to 1 where a value must be retrieved. The mask is then verified in circuit to be a well-formed hot vector, i.e. that it only has a single bit set in the correct position and surrounded by 0s). The value is finally obtained by summing the dot products of the mask and the values :
From these two ways of constructing N-to-1 multiplexers, arrays and associated arrays are implemented as simple wrappers around these constructions.
N-to-n multiplexers in slices
The slice implementation allows a caller to provide an interval within an array, and nullify anything not in that interval. The implementing uses mask to nullify the right side, and then the left side, of that interval, as picture below:
The mask is used so that values are copied from the input only when the matching bit within the mask is 1, and not copied when the matching bit within the mask is 0. In pseudo-code:
def(input_, mask1, mask2):
for i in range(0, len(out)):
# mask1 = [1, 1, 1, 1, 0, 0, 0, 0]
# ^
# end
out[i] = input_[i] * mask1[i]
for i in range(0, len(out)):
# mask 2 = [0, 0, 1, 1, 1, 1, 1, 1]
# ^
# start
out[i] = out[i] * mask2[i]
# at this point out only contains the [start, end] range of input:
# out = [0, 0, _, _, 0, 0, 0, 0]
return out
The mask is passed as a hint, and the following function that constrains the correctness of the mask is implemented in the gadget:
def verify_mask(out, pos, start_val, end_val):
# pos = 0 means that everything will be end_val
if pos != 0:
assert out[0] == start_val
# pos = -1 means that everything will be start_val
if pos != len(out) - 1:
assert out[-1] == end_val
# ensure the following:
#
# [start_val, start_val, end_val, end_val, end_val]
# ^^^^^^^
# pos
for i in range(1, len(out)):
if i != pos:
assert out[i] = out[i-1]
Range checks implementation
Range checks are gadgets that check if a value belongs to some interval . The lower-level APIs that are exposed in gnark’s std library allow developers to check specifically that a value is in the range for some . In other words, a range check enforces that a value is bits.
These range checks are implemented in two ways: either by verifiably decomposing the value into an -bit array–where verifiably means that is added as a constraint in the circuit– or by using a table lookup.
To range check values with a table lookup, the idea is pretty simple: create a table of all the values between 0 and some power of 2 (decided dynamically at compile time based on the bit sizes requested by the circuit range checks). Then cut a value in limbs of size that power of 2 (verifiably, as explained above), and check that each limb is in the table (via a table lookup argument is out of scope for this document).
As the last limb might not be a perfect power of 2, its value is shifted to the left to fit the max value of the table.
For example, if we need to check that a value is 9 bits using a table that gathers elements from to , then we will need three limbs of 4 bits, with the last limb being 1 bit. To check the last limb, the value is shifted by 3 bits to the left.
Out-of-circuit Fiat-Shamir with Plonk
The computation of expensive functions in circuits are quite challenging, or even self-defeating in some cases. For this reason, Linea designed and implemented a scheme to delegate a specific computation: deriving randomness in a circuit. This design was introduced in the Fiat-Shamir computation of GKR statements in Recursion over Public-Coin Interactive Proof Systems; Faster Hash Verification (BSB22), but is used in the context of non-native arithmetic in the code we looked at.
To understand how the scheme works, first let’s take a simpler scenario and solution. Imagine that at some points in a circuit we want to compute the hash of some values, HASH(a, b, c)
, but that the hash function is very expensive to compute.
One way to solve that, is to just hint the result of the computation d = HASH(a, b, c)
in the circuit (as the prover), and then expose a, b, c, d
as public inputs. (And of course, make sure that the circuit wires these new public input values to their associated witness values in the circuit.) We illustrate this in the following diagram (where the pink rows are the new public inputs that are exposed, and d
is the result of the hash function):
The verifier can then verify that the digest d
was computed correctly before interpolating the public input vector. (Or better, produce d
themselves and insert it into the public input vector to avoid forgetting to check it. This works as plonk checks that the following constraint checks out as part of the protocol (if you imagine that other wires and gates are not enabled on the public input rows):
But at this point, the witness values that take part in the hash computation are exposed in the public input, and are thus leaked…
Can we hide these values to the verifier? Notice that for Fiat-Shamir use cases, we do not need to compute exactly the HASH
function, but rather we need to generate random values based on the inputs. That’s right, it does not matter how we generate these random values as long as they are based on the private inputs. As such, we can relax our requirements and we can do something like this instead: d = HASH(commit(a, b, c))
.
As the challenge is used for Fiat-Shamir, we can leave it in the public input, but now the inputs can be kept in the witness. Although, they are kept in rows that resemble public input rows, in that the left wire is enabled and no gate is selected.
Then, a hiding commitment of the values is produced by the prover and sent to the verifier, as we show in this diagram:
We are almost here, there’s still one more problem: we need to prevent the prover from using this new commitment to alter random witness values. To do this, the verifier key includes one more selector vector that dictates where this new commitment can apply:
Which ends up producing a constraint that looks like this (if again, we ignore the gates and wires not enabled in these rows):
Note that the implementation we looked at was generalized to work with Fiat-Shamir instances by having commitments and committed selectors , as well as reserved rows of public inputs for the digests.
On top of that, they evaluate commitments to the selectors during the linearization phase of Plonk, so as not to care about the randomization of the committed inputs to Fiat-Shamir, although the commitment is still randomized by randomizing entries of the vector in rows that are not activated.
In-Circuit KZG Verification
The in-circuit KZG implementation in gnark/std/commitments/kzg
closely follows the one in the gnark-crypto library.
Single-Point Opening Proof Verification
For claimed value of polynomial at point , KZG polynomial commitment verification is implemented as a pairing check:
The commitment of is a element .
The evaluation point is — an element of the scalar field of .
The opening proof is just 2 values: - — commitment of the quotient polynomial , - — claimed value of at evaluation point .
The verification key consists of - — generator of group scaled to 1 and — toxic secret (supposed to be) destroyed after the generation of SRS (powers of multiplied with and ). - — generator of group .
The above check is equivalent to checking that
Which it should if there exists the quotient polynomial , defined as
The prover provides as part of the proof, which is highly unlikely (with probability ) if the prover doesn’t know the polynomial .
Fold Batch Opening Proof at a Single Point
This method is used by the in-circuit Plonk verifier to fold multiple evaluation proofs at a single point .
It takes: - a list commitments , - evaluation point , - a batch opening proof which consists of: - a single quotient polynomial commitment , - a list of values of the committed polynomials at the evaluation point .
And produces: - folded commitment , and - folded opening proof — here, the quotient commitment is simply passed through from the input, and is a produced folded evaluation.
A random challenge is obtained by hashing all the input data, except the quotient commitment because it is computed by the prover at a later stage and is itself dependent on .
The logic from here is straightforward. Folded commitment:
Folded evaluation:
Multi-Point Batch Verification
This method is used by the in-circuit Plonk verifier instead of verification steps 11-12 in the Plonk paper.
Multi-point batch verification of KZG commitments and opening proofs takes a list of triplets (for ), of: - polynomial commitment - evaluation point - opening proof
All of them are checked against a single verification key .
The final check boils down to a pairing check:
It looks similar the one for a single KZG proof verification. We will see in a second, how its components are obtained.
A random value is produced by hashing all the input data. Then it’s used to produce folded values for the final pairing check.
Folded quotients:
Folded products of evaluation points and quotients:
Folded evaluations:
Folded commitments: