Introduction
On July 21st, 2025, the Ethereum Foundation engaged zkSecurity to perform a security assessment of the KZG libraries used by PeerDAS. These libraries include blst, c-kzg-4844, rust-eth-kzg, and go-eth-kzg. The goal of this assessment was to conduct an in-depth analysis of the various KZG polynomial commitment implementations (C, Rust, and Go) used for Ethereum’s EIP-4844 and EIP-7594, as well as some of their dependencies in the blst
BLS12-381 library. Over the course of the engagement, we evaluated compliance with relevant specifications, compared cross-implementation behaviors, and identified potential safety issues. Specified bindings and assembly routines were excluded from the scope as defined.
The engagement lasted four weeks and was conducted by two consultants. During this period, several observations and findings were identified and communicated to the respective library development teams. The detailed findings and their implications are discussed in the subsequent sections of this report.
Scope
blst
library at target commit v0.3.15: The scope includes implementations for blst_p1s_mult_pippenger_scratch_sizeof
and blst_p1s_mult_pippenger
, covering lower-level elliptic curve operations. However, the ASM implementations of finite field arithmetic are excluded.
c-kzg-4844
library at target commit 669e7484: The scope includes code associated with EIP-7594, such as common/lincomb.c
and all of src/eip7594
. It also includes bindings
, except for bindings/python
, bindings/elixir
, and bindings/node.js
.
rust-eth-kzg
library at target commit v0.7.1: The scope covers code related to both EIP-4844 and EIP-7594, including bindings, except for bindings/golang
.
go-eth-kzg
library at target commit v1.3.0: The scope includes code associated with EIP-7594, excluding internal/kzg
, internal/multiexp
, internal/utils
, tests
, and all _test.go
files.
Overview
EIP-7594
Following EIP-4844, a blob-carrying transaction in Ethereum can publish a blob containing 4096 field elements along with its KZG commitment. Validators must retrieve the entire blob and verify the commitment against it. EIP-7594 introduces 1-D peer DAS (Data Availability Sampling) for Ethereum. The blob is first extended using erasure coding (doubling its size). The extended blob is then divided into 128 cells, each containing 64 field elements. The blob is associated with a KZG commitment, and each cell has a KZG multi-proof to verify that its 64 field elements are consistent with the blob commitment. To ensure data availability, each validator only needs to receive and verify a small subset (e.g., 8) of the cells from the network, instead of downloading and verifying the entire blob as required in EIP-4844.
Cell Proof Generation
During block proposal, the proposer splits the blob into cells and generates a proof for each cell. First, the blob is extended with erasure coding to double its size, and then the extended blob is divided into cells, each containing 64 field elements. Next, the KZG multi-point proof for each cell is computed. The FK20 paper describes an efficient algorithm for computing all cell proofs of a blob. Below is the function signature for cell proof generation in c-kzg-4844
:
C_KZG_RET compute_cells_and_kzg_proofs(
Cell *cells, // The output cells
KZGProof *proofs, // The output proofs for each cell
const Blob *blob, // The input blob data
const KZGSettings *s // The settings, including the trusted setup and precomputation
);
Cell Proof Verification
Validators retrieve a subset of cells and proofs from the network and verify these KZG multi-point proofs against the blob commitment. Since a block can contain multiple blobs, validators use a universal verification method to efficiently verify cells from different blobs in a single batch. Below is the function signature for cell proof verification in c-kzg-4844
:
C_KZG_RET verify_cell_kzg_proof_batch(
bool *ok, // The verification result
const Bytes48 *commitments_bytes, // The commitment of the entire blob that the cell belongs to
const uint64_t *cell_indices, // The indices of the cells in the blob
const Cell *cells, // The array of cell data
const Bytes48 *proofs_bytes, // The proofs for the cells
uint64_t num_cells, // The number of cells in this batch, specifying the length of the above arrays
const KZGSettings *s // The settings, including the trusted setup and precomputation
);
Blob Recovery
To recover the original blob, other nodes (e.g., index nodes) can retrieve cell data from the network. Once more than half of the cell data is retrieved, the entire blob can be reconstructed using erasure coding. Below is the function signature for blob recovery in c-kzg-4844
:
C_KZG_RET recover_cells_and_kzg_proofs(
Cell *recovered_cells, // The output recovered cells
KZGProof *recovered_proofs, // The output recovered proofs for each cell
const uint64_t *cell_indices, // The input indices of each cell
const Cell *cells, // The input cell data
uint64_t num_cells, // The number of input cells
const KZGSettings *s // The settings, including the trusted setup and precomputation
);
Multi-Scalar Multiplication (MSM) in blst
The blst
library implements the BLS12-381 elliptic curve and pairing operations in C and assembly. Our scope focuses on the MSM (Multi-Scalar Multiplication) implementation, which primarily uses the Pippenger algorithm. Note that blst
is designed for cryptographic signatures and implements constant-time operations to prevent side-channel attacks. However, for data availability sampling use cases, constant-time operations are not required and may reduce efficiency.
Given a list of base points and scalars (all of fixed bit length, e.g., 256) , the Pippenger algorithm efficiently computes .
The main idea is to divide the scalars into smaller windows. Within each window, the scalars are grouped into buckets based on their values, allowing for efficient computation. For example, consider four points and four 4-bit scalars . If the window size is 2 bits, the scalars are split into two parts: the lower half and the upper half . The computation is then performed separately for each window:
The final result is obtained by summing the contributions from each window: . Within each window, points with the same scalar value are grouped into the same bucket to reduce the number of operations. For example, can be computed as , and as .
Booth Encoding is an optimization technique that reduces the bucket size by half. Instead of using scalar values in the range , Booth encoding maps them to . This optimization leverages the fact that the negative of a point (denoted ) can be easily computed by negating its coordinate.
For example, if a scalar is , the algorithm negates the point and places it in the bucket for , as . Given a scalar and a window of bits, Booth encoding splits the scalar into smaller components within the range , while ensuring that . This approach reduces the number of buckets required and improves efficiency.
The ptype##s_mult_pippenger
function computes the multiplication for each window dynamically, while the ptype##s_mult_wbits
function precomputes the multiplications for each window (e.g., ) and caches them. This allows for faster selection of points during computation but consumes more memory.
Below are listed the findings found during the engagement. High severity findings can be seen as
so-called
"priority 0" issues that need fixing (potentially urgently). Medium severity findings are most often
serious
findings that have less impact (or are harder to exploit) than high-severity findings. Low severity
findings
are most often exploitable in contrived scenarios, if at all, but still warrant reflection. Findings
marked
as informational are general comments that did not fit any of the other criteria.
Description. The function ptype##s_to_affine_row_wbits
(defined in blst/src/multi_scalar.c
) fails to correctly handle projective points at infinity (Z = 0
).
Specifically, in the section where the accumulator is built:
for (i = 0; i < npoints; i++)
for (j = nwin; --src, --j; acc++)
mul_##field(acc[0], acc[-1], src->Z);
the Z
coordinate is multiplied directly into the accumulator without guarding against the case where Z = 0
. This means that if any point in the input batch is infinity, the accumulator becomes zero. Since batch inversion relies on this accumulator, the reciprocal computation:
--acc; reciprocal_##field(acc[0], acc[0]);
will fail silently, propagating invalid zeroes into the affine coordinates of all points.
By contrast, the related function ptype##s_to_affine
correctly applies:
vec_select(acc++, BLS12_381_Rx.p, point->Z, sizeof(vec##bits),
vec_is_zero(point->Z, sizeof(point->Z)));
which substitutes a neutral Montgomery radix (1
) when Z = 0
, ensuring that infinity points do not corrupt the batch.
Fuzzer Context. To find this and another findings we developed and ran a custom fuzzer. The fuzzer targets the blst
BLS12-381 scalar multiplication routines, including both single-point, wbits-precompute, and Pippenger multi-point paths. Inputs are variable-length byte arrays that are split into a scalar (up to 320 bits) and a compressed curve point.
For each input, the fuzzer:
- Parses and validates the scalar and curve point.
- Executes all three multiplication paths. Using the zero scalar and point for the second and third paths as required.
- Compares results to detect inconsistencies or crashes.
The fuzzer uses AFL++ in persistent mode but can also run standalone for testing. Any mismatches trigger an immediate abort to report potential bugs.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <assert.h>
#include <unistd.h>
#include "bindings/blst.h"
#ifndef __AFL_FUZZ_TESTCASE_LEN
ssize_t fuzz_len;
#define __AFL_FUZZ_TESTCASE_LEN fuzz_len
unsigned char fuzz_buf[1024*1024];
#define __AFL_FUZZ_TESTCASE_BUF fuzz_buf
#define __AFL_FUZZ_INIT() void sync(void);
#define __AFL_LOOP(x) ((fuzz_len = read(0, fuzz_buf, sizeof(fuzz_buf))) > 0 ? 1 : 0)
#define __AFL_INIT() sync()
#endif
__AFL_FUZZ_INIT();
#define SCALAR_SIZE 32
#define POINT_SIZE 48
#define MAX_SCALAR_SIZE 40
#define MAX_INPUT_SIZE (MAX_SCALAR_SIZE + POINT_SIZE)
#define PIPPENGER_NPOINTS 32
static limb_t *g_scratch;
static size_t g_scratch_size;
static byte *g_zero_scalars;
static blst_p1_affine *g_zero_points;
static blst_p1_affine *g_points_array;
static byte *g_scalars_array;
static const blst_p1_affine **g_point_ptrs;
static const byte **g_scalar_ptrs;
static void init_fuzzer() {
g_scratch_size = blst_p1s_mult_pippenger_scratch_sizeof(PIPPENGER_NPOINTS);
size_t wbits_scratch = blst_p1s_mult_wbits_scratch_sizeof(PIPPENGER_NPOINTS);
if (wbits_scratch > g_scratch_size) g_scratch_size = wbits_scratch;
g_scratch = malloc(g_scratch_size);
g_zero_scalars = calloc(PIPPENGER_NPOINTS, MAX_SCALAR_SIZE);
g_zero_points = calloc(PIPPENGER_NPOINTS, sizeof(blst_p1_affine));
g_points_array = malloc(PIPPENGER_NPOINTS * sizeof(blst_p1_affine));
g_scalars_array = malloc(PIPPENGER_NPOINTS * MAX_SCALAR_SIZE);
g_point_ptrs = malloc(PIPPENGER_NPOINTS * sizeof(blst_p1_affine *));
g_scalar_ptrs = malloc(PIPPENGER_NPOINTS * sizeof(byte *));
}
static size_t calc_scalar_nbits(const byte *scalar, size_t max_bytes) {
for (size_t i = max_bytes; i > 0; i--) {
if (scalar[i-1]) {
size_t bits = i*8;
byte b = scalar[i-1];
while ((b & 0x80) == 0 && bits > (i-1)*8) { bits--; b <<= 1; }
return bits;
}
}
return 0;
}
static int parse_input(const uint8_t *data, size_t size, byte *scalar, size_t *scalar_nbits, blst_p1_affine *point) {
byte tmp[MAX_INPUT_SIZE];
if (size > MAX_INPUT_SIZE) return 0;
memcpy(tmp, data, size); if (size < MAX_INPUT_SIZE) memset(tmp+size, 0, MAX_INPUT_SIZE-size);
memcpy(scalar, tmp, MAX_SCALAR_SIZE);
*scalar_nbits = calc_scalar_nbits(scalar, MAX_SCALAR_SIZE);
return blst_p1_uncompress(point, tmp + MAX_SCALAR_SIZE) == BLST_SUCCESS;
}
static void test_path(const byte *scalar, size_t nbits, const blst_p1_affine *point, blst_p1 *r1, blst_p1 *r2, blst_p1 *r3) {
const blst_p1_affine *p1[1] = {point};
const byte *s1[1] = {scalar};
blst_p1s_mult_pippenger(r1, p1, 1, s1, nbits, g_scratch);
g_point_ptrs[0]=point; g_point_ptrs[1]=g_zero_points;
g_scalar_ptrs[0]=scalar; g_scalar_ptrs[1]=g_zero_scalars;
blst_p1s_mult_pippenger(r2, g_point_ptrs, 2, g_scalar_ptrs, nbits, g_scratch);
memcpy(g_points_array, point, sizeof(blst_p1_affine));
memcpy(g_scalars_array, scalar, MAX_SCALAR_SIZE);
g_point_ptrs[0]=g_points_array; g_point_ptrs[1]=NULL;
g_scalar_ptrs[0]=g_scalars_array; g_scalar_ptrs[1]=NULL;
blst_p1s_mult_pippenger(r3, g_point_ptrs, PIPPENGER_NPOINTS, g_scalar_ptrs, nbits, g_scratch);
}
static int points_equal(const blst_p1 *a, const blst_p1 *b) { return blst_p1_is_equal(a,b); }
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
byte scalar[MAX_SCALAR_SIZE]; blst_p1_affine point; size_t scalar_nbits;
if (!parse_input(data, size, scalar, &scalar_nbits, &point)) return 0;
size_t max_nbits = scalar_nbits + 10; if (max_nbits > MAX_SCALAR_SIZE*8) max_nbits = MAX_SCALAR_SIZE*8;
for (size_t nbits = scalar_nbits; nbits <= max_nbits; nbits++) {
blst_p1 r1,r2,r3; test_path(scalar, nbits, &point, &r1,&r2,&r3);
if (!points_equal(&r1,&r2) || !points_equal(&r1,&r3) || !points_equal(&r2,&r3)) abort();
}
return 0;
}
int main(int argc, char **argv) {
init_fuzzer();
unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
while (__AFL_LOOP(10000)) { int len=__AFL_FUZZ_TESTCASE_LEN; if(len>0 && len<=MAX_INPUT_SIZE){memset(g_scratch,0,g_scratch_size); LLVMFuzzerTestOneInput(buf,len);}}
if(argc==2){ FILE *fp=fopen(argv[1],"rb"); uint8_t data[MAX_INPUT_SIZE]; size_t read=fread(data,1,MAX_INPUT_SIZE,fp); fclose(fp); LLVMFuzzerTestOneInput(data,read);}
free(g_scratch); free(g_zero_scalars); free(g_zero_points); free(g_points_array); free(g_scalars_array); free(g_point_ptrs); free(g_scalar_ptrs);
return 0;
}
Impact.
ptype##s_to_affine_row_wbits()
is used in ptype##s_precompute_wbits()
to batch-convert the computed point table from Jabobi to affine representation. In turn, ptype##s_precompute_wbits()
is used as precomputation step of the main MSM routine prefix##s_mult_pippenger()
, in the case that the number of points is between 2 and 31:
void prefix##s_mult_pippenger(ptype *ret, \
const ptype##_affine *const points[], \
size_t npoints, \
const byte *const scalars[], size_t nbits, \
ptype##xyzz scratch[]) \
{ \
// ...
if ((npoints * sizeof(ptype##_affine) * 8 * 3) <= SCRATCH_LIMIT && \
npoints < 32) { \
ptype##_affine *table = alloca(npoints * sizeof(ptype##_affine) * 8); \
ptype##s_precompute_wbits(table, 4, points, npoints); \
ptype##s_mult_wbits(ret, table, 4, npoints, scalars, nbits, NULL); \
return; \
} \
ptype##s_mult_pippenger(ret, points, npoints, scalars, nbits, scratch, 0); \
}
According to our analysis, the issue is unlikely to affect the KZG libraries that depend on prefix##s_mult_pippenger()
. For example, g1_lincomb_fast()
in c-kzg-4844
depends on the problematic code path when the number of points is between 8 and 31 input points. However, it avoids zero input points by filtering them out before calling into blst
. The same is true for the Rust library.
Nevertheless, the issue is subtle and could affect the soundness of similar protocols or of the EIP-7594 verifier if point filtering was removed. Inherently, g1_lincomb_fast()
is called on prover-supplied input points (the KZG cell proofs) and is therefore vulnerable to any mishandling of invalid points.
Recommendation. Adopt the same protective logic used in ptype##s_to_affine
:
- Use
vec_select
to substitute Z = 1
when Z = 0
before contributing to the batch product.
- Apply
vec_czero
on the affine outputs to zero them out if the original point was infinity.
This ensures that infinity points are handled locally and do not corrupt other valid points in the batch.
Client Response. Fixed in https://github.com/supranational/blst/commit/f48500c1fdbefa7c0bf9800bccd65d28236799c1.
Description. In blst
, the ptype##s_mult_wbits
macro (expanding to blst_p1s_mult_wbits
and blst_p2s_mult_wbits
) implements multi-scalar multiplication (MSM) with a fixed window size. When extracting the most-significant (top) window, the code incorrectly uses wbits
instead of the computed window = nbits % wbits
in the call to get_wval_limb
. This can cause the function to read past the end of the scalar byte array when the top window is shorter than wbits
(including when window == 0
). Although the out-of-bounds value is masked and discarded, the memory access still occurs, which may cause a segmentation fault if the scalar buffer is followed by a protected or unmapped page, or trigger memory sanitizers.
/* top excess bits modulo target window size */
window = nbits % wbits; /* yes, it may be zero */
wmask = ((limb_t)1 << (window + 1)) - 1;
nbits -= window;
z = is_zero(nbits);
/* BUG: uses wbits instead of window for the top slice length */
wval = (get_wval_limb(scalar, nbits - (z^1), wbits + (z^1)) << z) & wmask;
wval = booth_encode(wval, wbits);
ptype##_gather_booth_wbits(&scratch[0], row, wbits, wval);
If nbits
is an exact multiple of wbits
, then window == 0
and nbits -= window
leaves nbits
unchanged. The call
get_wval_limb(scalar, nbits - (z^1), wbits + (z^1))
then requests wbits+1
bits at the top boundary, which reads one byte past the end of the scalar. Even when window > 0
but < wbits
, using wbits
instead of window
can extend the read beyond the buffer.
For example, with wbits = 4
and nbits = 320
(40 bytes scalar):
window = nbits % wbits = 0
- The buggy line requests
wbits + (z^1)
bits, causing a 1-byte over-read.
- The correct line should request
window + (z^1)
bits, i.e., 1 bit in this case.
The bug was validated with Valgrind by allocating the scalar on the heap so that the next byte is outside the allocated block:
valgrind --tool=memcheck --track-origins=yes ./test_pippenger
==33966== Invalid read of size 1
==33966== at 0x10A321: get_wval_limb (in /home/marco/repo/blst/test_pippenger)
==33966== by 0x118990: POINTonE1s_mult_wbits (in /home/marco/repo/blst/test_pippenger)
==33966== by 0x121367: blst_p1s_mult_pippenger (in /home/marco/repo/blst/test_pippenger)
==33966== by 0x109D79: main (test_pippenger.c:130)
==33966== Address 0x4a88482 is 0 bytes after a block of size 2 alloc'd
==33966== at 0x4846828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==33966== by 0x1098BD: main (test_pippenger.c:58)
The bug can be reliably triggered by placing a protected page immediately after the scalar buffer. Any read of ptr[size]
will fault, exposing the over-read:
// Allocate a vector of bytes and right after a protected page
// so that any call right after size crashes; i.e., ptr[size] faults.
uint8_t* alloc_vec_with_protected_page_end(size_t size) {
size_t pagesize = sysconf(_SC_PAGESIZE);
size_t alloc_size = ((size + pagesize - 1) / pagesize) * pagesize;
size_t total_size = alloc_size + pagesize;
void *base = mmap(NULL, total_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (base == MAP_FAILED) return NULL;
// Protect the last page (immediately after data region)
if (mprotect((uint8_t*)base + alloc_size, pagesize, PROT_NONE) != 0) {
munmap(base, total_size);
return NULL;
}
// Return pointer so that ptr[size] crashes
return (uint8_t*)base + alloc_size - size;
}
void free_vec_with_protected_page_end(uint8_t *ptr, size_t size) {
size_t pagesize = sysconf(_SC_PAGESIZE);
size_t alloc_size = ((size + pagesize - 1) / pagesize) * pagesize;
size_t total_size = alloc_size + pagesize;
void *base = (void*)(ptr - (alloc_size - size));
munmap(base, total_size);
}
Note: If scalars are stored contiguously (e.g., scalar_0
immediately followed by scalar_1
), the over-read will typically fetch the first byte of scalar_1
, and the result gets masked out and discarded. This hides the issue but is still incorrect behavior.
Impact. blst_p1s_mult_wbits
and blst_p2s_mult_wbits
can read out of bounds of the scalar array and may cause segmentation faults in some cases. As KZG libraries typically store scalars contiguously, this is unlikely to be triggered in those contexts, but the bug remains.
Recommendation. Patch ptype##s_mult_wbits
to use window
(not wbits
) for the top-window call to get_wval_limb
to avoid reading past the end of the scalar buffer:
- wval = (get_wval_limb(scalar, nbits - (z^1), wbits + (z^1)) << z) & wmask;
+ wval = (get_wval_limb(scalar, nbits - (z^1), window + (z^1)) << z) & wmask;
Client Response. Fixed in https://github.com/supranational/blst/commit/01d167c8bfb0a76a6f44dd479902e2662983a1e9.
Description. The core KZG verifier method allows each entry of bit_reversed_coset_evals
to be a vector of dynamic size, and does not check that their sizes are all equal:
pub fn verify_multi_opening(
&self,
deduplicated_commitments: &[G1Point],
commitment_indices: &[CommitmentIndex],
bit_reversed_coset_indices: &[CosetIndex],
// [ZKSECURITY] the sizes of these vectors could be different
bit_reversed_coset_evals: &[Vec<Scalar>],
bit_reversed_proofs: &[G1Point],
) -> Result<(), VerifierError> {
However, the Fiat-Shamir logic implicitly assumes the sizes to be the same, and does not commit the sizes of these vectors individually.
fn compute_fiat_shamir_challenge(
// [ZKSECURITY] ...
) -> Scalar {
const DOMAIN_SEP: &str = "RCKZGCBATCH__V1_";
let hash_input_size = DOMAIN_SEP.len()
// [ZKSECURITY] ...
+ coset_evals.len() * verification_key.coset_size * size_of::<Scalar>()
+ proofs.len() * G1Point::compressed_size();
let mut hash_input: Vec<u8> = Vec::with_capacity(hash_input_size);
hash_input.extend(DOMAIN_SEP.as_bytes());
hash_input.extend((verification_key.num_coefficients_in_polynomial as u64).to_be_bytes());
// [ZKSECURITY] we commit to the _intended_ size of each cell of coset evals
hash_input.extend((verification_key.coset_size as u64).to_be_bytes());
// [ZKSECURITY] ...
for k in 0..num_cosets {
hash_input.extend(row_indices[k as usize].to_be_bytes());
hash_input.extend(coset_indices[k as usize].to_be_bytes());
// [ZKSECURITY] we hash each cell of coset evals without hashing their sizes individually
for eval in &coset_evals[k as usize] {
hash_input.extend(eval.to_bytes_be());
}
hash_input.extend(proofs[k as usize].to_compressed());
}
// [ZKSECURITY] this assertion forces the _total_ size of coset evals to be as expected
assert_eq!(hash_input.len(), hash_input_size);
In another place in the verifier, coset eval sizes are forced to be powers of two, but other than that, the verifier places no restrictions on these sizes; the polynomial interpolation logic resizes them to the expected length before performing an IFFT, potentially truncating or padding with zeros.
For example, empirically, the verifier runs through to the end when resizing the first three coset evals from 64, 64, 64 to 128, 32, 32 (keeping the total size the same, as required by the final assertion above).
This means that we could shift values from the coset evals to other places like the proofs
and row_indices
while keeping the same Fiat-Shamir challenge. This is undesirable as it represents a potential weakness in the soundness of Fiat-Shamir.
Impact. In the main EIP-7594 verifier, coset evals are static-size vectors, so this issue does not affect production code.
However, the core FK20 library ekzg-multi-open
is published as a standalone crate where verify_multi_opening()
is part of the public API. In general, the core library aims to enforce soundness on its own and frequently adds assertions to prevent invalid inputs. That’s why we report this issue, even though it seems unlikely that verify_multi_opening()
will be used in a deployment that allows dynamic-sized evaluation inputs.
Recommendation. At the beginning of verify_multi_opening()
, add an assertion that every entry of bit_reversed_coset_evals
is of size verification_key.coset_size
.
Client Response. Fixed in https://github.com/crate-crypto/rust-eth-kzg/pull/415.
In go-eth-kzg, the comment of nextPowerOfTwo
function is misleading. It is actually returning the next power of two exactly greater than n
. For example, if the input n
is 2
, it will return 4
.
// nextPowerOfTwo returns the next power of two greater than or equal to n
func nextPowerOfTwo(n int) int {
if n == 0 {
return 1
}
k := 1
for k <= n {
k <<= 1
}
return k
}
In rust-eth-kzg, fixed_base_msm_window.rs
, the following comment is misleading. In fact, each point has 2*48 = 96 bytes, not 64.
// The total amount of memory is roughly (numPoints * 2^{wbits} - 1)
// where each point is 64 bytes.
In lincom.rs
, this comment above g1_lincomb()
is misleading:
/// A multi-scalar multiplication algorithm over G1 elements
///
/// Returns None if the points and the scalars are not the
/// same length.
In fact, this method will silently do the scalar multiplication on fewer points or scalars if one of them is of smaller length. The same is true for g2_lincomb()
. Tests in the same file document the actual behavior while still having a self-contradictory comment about returning None
:
fn g1_lincomb_length_mismatch_not_empty() {
// MSM returns None when point and scalar lengths differ
let points = vec![G1Point::generator(); 4];
let scalars = vec![Scalar::from(1), Scalar::from(2), Scalar::from(3)];
assert_eq!(
g1_lincomb(&points, &scalars),
Some(
G1Point::generator() * Scalar::from(1)
+ G1Point::generator() * Scalar::from(2)
+ G1Point::generator() * Scalar::from(3)
)
);
}
This incorrect documentation is repeated in fixed_base_msm.rs
above msm()
:
/// Panics if the number of scalars doesn't match the number of generators.
In reed-solomon.rs
, the following comment inside of construct_vanishing_poly_from_block_erasures()
is misleading:
// Expand the vanishing polynomial, so that it vanishes on all blocks in the codeword
// at the same indices.
//
// Example; consider the following polynomial f(x) = x - r
// It vanishes/has roots at `r`.
//
// Now if we expand it by a factor of three which is the process of shifting all coefficients
// up three spaces, we get the polynomial g(x) = x^3 - r.
// g(x) has all of the roots of f(x) and a few extra roots.
//
// The roots of g(x) can be characterized as {r, \omega * r, \omega^2 * r}
// where \omega is a third root of unity.
In the example given, the polynomial x^3 - r
does not actually vanish at {r, \omega * r, \omega^2 * r}
. More generally, the operation of “expanding” alone is not what leads to getting the desired roots. Instead, this is achieved by a combination of:
- Using roots of a smaller domain, which are roots of the large domain, raised to a power.
- “Expanding” the coefficients, which correspond to evaluating the original polynomial at a power.
Client Response. Partially fixed in https://github.com/crate-crypto/go-eth-kzg/pull/118, https://github.com/crate-crypto/rust-eth-kzg/pull/416, https://github.com/crate-crypto/rust-eth-kzg/pull/417, and https://github.com/crate-crypto/rust-eth-kzg/pull/418.
Description. The FK20 verifier is instantiated with three parameters:
num_points_to_open
, the full domain size we operate on
num_cosets
verification_key.coset_size
The parameters are supposed to be related and satisfy:
num_cosets * verification_key.coset_size = num_points_to_open
Furthermore, all three of these parameters are supposed to be powers of two.
The FK20Verifier
constructor, however, takes in these parameters independently and does not validate their relation.
impl FK20Verifier {
pub fn new(
verification_key: VerificationKey,
num_points_to_open: usize,
num_cosets: usize,
) -> Self {
const BIT_REVERSED: bool = true;
let coset_gens = coset_gens(num_points_to_open, num_cosets, BIT_REVERSED);
// [ZKSECURITY] `coset_size` is recalculated here but not linked to `verification_key.coset_size`
let coset_size = num_points_to_open / num_cosets;
assert!(
verification_key.g2s.len() >= coset_size,
"need as many g2 points as coset size"
);
// [ZKSECURITY] this line will internally use the next power of two from
// `verification_key.coset_size` to create the domain
let coset_domain = Domain::new(verification_key.coset_size);
// [ZKSECURITY] however, `verification_key.coset_size` itself enters the cryptographic protocol
// as an essential parameter without being rounded to a power of two
let n = verification_key.coset_size;
// [tau^n]_2
let tau_pow_n = G2Prepared::from(verification_key.g2s[n]
Empirically, the verifier can be instantiated and run successfully with num_cosets
not equal to num_points_to_open / verification_key.coset_size
. In this case, the relation being checked refers to different coset generators than intended, which feels like a footgun.
Even more surprising, VerificationKey
and FK20Verifier
can be instantiated with verification_key.coset_size
not being a power of two. For creating roots of unity internally, Domain::new(verification_key.coset_size)
will replace it with the next power of two. Meanwhile, for preparing the universal verification equation, we use n = verification_key.coset_size
directly which is allowed to be a non-power of two.
(This is not the case for num_points_to_open
and num_cosets
: these are asserted to be powers of two.)
To see why being a non-power of two is bad, let . Note that the verification equation guarantees that, for a collection of coset generators , committed polynomials , input values we have
for all , where is a -th root of unity and is some quotient polynomial (committed to by the KZG proof).
In the intended protocol, and the second term on the RHS vanishes, giving , i.e. we prove that are evaluations of as desired.
However, if we allow , the verification equation becomes meaningless as the prover can arbitrarily change the purported evaluations by tweaking the quotient .
Incidentally, while the constructor allows , verify_multi_opening()
will throw due to checks on the length of evaluations. Nevertheless, it seems like an unnecessary footgun. And as documented in another finding, the lengths of evaluations are only weakly restricted as well.
Impact. None of the above affects the end-to-end EIP-7594 verifier, where parameters are hardcoded to correct values.
Recommendation. Enforce both the relation num_cosets * verification_key.coset_size = num_points_to_open
and that verification_key.coset_size
is a power of two, in the FK20Verifier
constructor.
Client Response. Fixed in https://github.com/crate-crypto/rust-eth-kzg/pull/420.