Introduction

On June 17th, 2024, Linea requested an audit of the wizard crypto and math libraries, as part of their zkEVM project. The audit was conducted by two zkSecurity engineers over a period of three weeks. The audit focused on the correctness and security of parts of the libraries, as well as the safety of their interface.

We found the code to be well documented and organized.

Scope

The scope of the audit included two inner components of the zkEVM repository.

The math/ component, which consisted of:

  • Definition of the scalar field of the BLS12-377 curve.
  • An implementation of FFT functions and related polynomial utility functions.
  • An implementation of Smartvectors, an abstraction over field element vectors.

The crypto/ component, which consisted of:

  • Three cryptographic hash implementations: Keccak, MiMC, and RingSIS.
  • A sparse merkle tree implementation, and an abstraction for a key-value store built as a wrapper around the sparse merkle tree.
  • A code-based polynomial commitment scheme: Vortex.

Below, we provide an overview of some of the subcomponents, before diving into findings and recommendations.

The math/ Accumulator and Sparse Merkle Tree

In order to enable verifiable accesses to the state of the world (in Linea), the zkEVM repository implements a key-value store on top of a sparse merkle tree implementation. In this section we describe exactly how.

A doubly-linked list for a key-value store. A continuous array is used to emulate a contiguous region of memory that can be written to in order from left to right, in a write-once fashion.

contiguous-memory

This array of memory is used to store a doubly-linked list. To do that, the first two entries are initialized with the head and the tail of the doubly-linked list. An element of the doubly-linked list is a tuple (hash(key), hash(value)) and the doubly-linked list is ordered by the hashed key of its elements.

Update to the key-value store. The doubly-linked list serves as a nice way to provide verifiable updates to the key-value store. For example, insertions can prove that they are inserting a key in the right place in the doubly-linked list, by proving that:

  • the previous element in the list has a smaller hashed key
  • the next element in the list has a bigger hashed key
  • these “sandwiching elements” points at each other, and can be updated to point at the newly inserted element instead

In addition, the key-value store can make its entries unique by enforcing strict comparisons between the hashed keys.

Merkelizing the memory region. In order to record the state of the world between proved state transitions, and verifiably access and update the doubly-linked-list, the memory containing the elements of the list is merkelized.

This way, its root can be passed in between state transitions, and its state can be deconstructed to verifiably access entries of the heap and update them. We provide an example of such a merkelized memory region (containing a simple doubly-linked list) below.

tree-data

Implementation details. Note that the implementation accesses and updates the merkle tree in multiple transitions instead of a single atomic update. We illustrate this in the following diagram.

accumulator insertion

The Vortex polynomial commitment scheme

In this section we briefly explain how the Vortex PCS works.

First, Vortex is based on codes. So let’s briefly mention how encoding using Reed-Solomon works. Imagine that you want to encode a message m=[m0,m1,m2,m3] into a larger codeword. The idea is that an invalid message is easier to observe if it is stretched into a larger array of values (which would amplify its anomalies).

The Reed-Solomon code does that by looking at the message as if it contains the coefficients or the evaluations of a polynomial, and then evaluate that polynomial at more points. For example:

encoding

In the previous example we choose to see the message like it is a polynomial in Lagrange form (we have its evaluations over some domain). Since the domain evaluated on is an extension that preserves the points of the original domain, we obtain a systematic code: the original message is embedded in the codeword.

In ZK, we deal with arithmetic circuits and so we are interested in a domain that’s a multiplicative subgroup in a field (specifically a subgroup of order a power of two, because this plays nice with FFTs, and FFTs allow us to quickly evaluate a polynomial on a domain, while iFFT allows us to quickly interpolate a polynomial from its evaluations).

In our previous example, the original “message domain” was D of size 2something. So what happens if we change the domain? The original elements of the message are still embedded in the codeword but at different positions: this is still a systematic code.

systematic2

This is because of how we find embedded subgroups of order powers of two in our field:

systematic_solution

Back to what Vortex provides: it allows you to get the evaluations of a number of polynomials at the same point.

So we start with a number of polynomials in lagrange form (their evaluations over some points w0,w1,w2,) which you can see as a matrix W:

vortex_matrix

In Vortex, the prover encodes all the rows of the matrix, getting a larger matrix. The greater the “blowup factor” of the Vortex parameters, the more columns we get. And since we have a systematic code, it preserves the original messages on each row:

blowup

From there, the verifier samples a random challenge β, which the prover uses to take a linear combination of all the encoded rows:

beta

In the paper the non-encoded u is sent instead, and the verifier encodes it. This is equivalent to interpolating u and checking that it’s a polynomial of degree at most d=m1 (which is what the implementation does):

beta2

We know that u interpolates to Intu(x) the linear combination of all polynomials. i.e. Intu(x)=f(x)+β·g(x)+β2·h(x)+β3·i(x)+β4·j(x).

As such, if we want to check that for some z we have

f(z)=yf,g(z)=yg,h(z)=yh,i(z)=yi,j(z)=yj

we can check that

Intu(z)=yf+β·yg+g(x)+β2·yh+h(x)+β3·yi+i(x)+β4·yj

The verifier can then query random columns, and check that the linear combination is correctly formed for the matching entry

queries

Implementation note: as the verifier needs to dynamically index into the linear combination vector u, it places the vector values in an indexed lookup table and retrieves its entries by performing table lookups.

The verifier obviously wants to verifiably query the encoded matrix W in an efficient way (without having to know the whole matrix). To do that, the RS-encoded matrix is committed to (using a Merkle tree), and only the columns that are queried are sent to the verifier (accompanied by merkle proofs):

merklecommit

A note on Ringsis modulo Xn1

The RingSIS implementation is mostly a wrapper over a core implementation. Although it does contain a few more helper functions, including a function to compute the result of the hash modulo the Xn1 polynomial.

The function’s goal is to get the result of the multiplication a(x)·b(x)=r(x)modxn1, which is the same as:

a(x)·b(x)=r(x)+(xn1)·t(x)

for some quotient polynomial t(x)

notice that xn1 is the vanishing polynomial for the n-th roots of unity, so for any n-th root of unity ω we have:

a(ω)·b(ω)=r(ω)+(ωn1)·t(ω)=r(ω)+0·t(ω)=r(ω)

so we can get n evaluations of r(x) by evaluating the left-hand side on all n n-th roots of unity, then interpolate (since r is of degree n1, we just need n evaluations).

Using SmartVectors for Optimizing Mathematical Operations

We further provide some background on smartvectors, an important abstraction used from optimizing math operations in Linea’s implementation.

The Linea prover defines a specialized abstraction over vectors to optimize both memory consumption and performance when working with vectors of field elements. This abstraction is smartvectors, which are implemented by four different types designed to handle specific common vector types. The types of smartvectors are Regular, Constant, Rotated, and PaddedCircularWindow.

Each smartvector type implements specific functions such as retrieving its length, accessing a specific element, producing a subvector, and rotating the vector.

Regular represents a normal vector (i.e., slice) of field elements. This is the default type, and every other smartvector implements a function to convert and create a regular smartvector.

type Regular []field.Element

Constant is a smartvector obtained by repeating the same value for the specified length. This implementation provides maximum memory optimization and speeds up some operations.

type Constant struct {
    val    field.Element
    length int
}

Rotated works by abstractly storing an offset and only applying the rotation when the vector is written, sub-vectored, or evaluated for specific operations. This makes rotations essentially free.

type Rotated struct {
    v      Regular
    offset int
}

PaddedCircularWindow represents a vector padded up to a certain length and possibly rotated. It combines optimizations from all previous smartvectors and implements extensive optimizations when performing arithmetic operations (c.f., processWindowedOnly in windowed.go).

type PaddedCircularWindow struct {
    window     []field.Element
    paddingVal field.Element
    totLen, offset int
}

The following figure visualizes the different types of smartvectors.

encoding

Usage and Benefits

Smartvectors are used for memory and performance optimizations. They play a crucial role in performing FFTs on the Linea prover, representing polynomials, and performing optimized operations such as batch interpolation by reducing computations through pre-processing constant vectors in advance.

The main operations involving smartvectors are defined in arithmetic_basic.go, arithmetic_gen.go, arithmetic_op.go, fft.go, and polynomial.go. A critical function is processOperator in arithmetic_gen.go, which handles various operations and performs optimizations based on the type of input smartvectors.

Precautions and Testing

While smartvectors are designed for optimization purposes, they should be used with care. The API is designed to prevent mutating smartvectors directly; instead, users should produce new instances. During our audit, we looked for potential issues arising from API misuse, identifying corner cases that could cause unexpected behavior, and ensured that the API was clear and not prone to misuse. We also ran multiple test cases and tweaked the existing test suite to validate that smartvectors work as expected.

In summary, despite the complexity of some optimizations, the code is well-documented and thoroughly tested. During our audit, we identified the following issues: