Introduction
On July 21st, 2025, the Ethereum Foundation engaged zkSecurity to perform a security assessment of the KZG libraries used by PeerDAS. These libraries include blst, c-kzg-4844, rust-eth-kzg, and go-eth-kzg. The goal of this assessment was to conduct an in-depth analysis of the various KZG polynomial commitment implementations (C, Rust, and Go) used for Ethereum’s EIP-4844 and EIP-7594, as well as some of their dependencies in the blst BLS12-381 library. Over the course of the engagement, we evaluated compliance with relevant specifications, compared cross-implementation behaviors, and identified potential safety issues. Specified bindings and assembly routines were excluded from the scope as defined.
The engagement lasted four weeks and was conducted by two consultants. During this period, several observations and findings were identified and communicated to the respective library development teams. The detailed findings and their implications are discussed in the subsequent sections of this report.
Scope
blst library at target commit v0.3.15: The scope includes implementations for blst_p1s_mult_pippenger_scratch_sizeof and blst_p1s_mult_pippenger, covering lower-level elliptic curve operations. However, the ASM implementations of finite field arithmetic are excluded.
c-kzg-4844 library at target commit 669e7484: The scope includes code associated with EIP-7594, such as common/lincomb.c and all of src/eip7594. It also includes bindings, except for bindings/python, bindings/elixir, and bindings/node.js.
rust-eth-kzg library at target commit v0.7.1: The scope covers code related to both EIP-4844 and EIP-7594, including bindings, except for bindings/golang.
go-eth-kzg library at target commit v1.3.0: The scope includes code associated with EIP-7594, excluding internal/kzg, internal/multiexp, internal/utils, tests, and all _test.go files.
Overview
EIP-7594
Following EIP-4844, a blob-carrying transaction in Ethereum can publish a blob containing 4096 field elements along with its KZG commitment. Validators must retrieve the entire blob and verify the commitment against it. EIP-7594 introduces 1-D peer DAS (Data Availability Sampling) for Ethereum. The blob is first extended using erasure coding (doubling its size). The extended blob is then divided into 128 cells, each containing 64 field elements. The blob is associated with a KZG commitment, and each cell has a KZG multi-proof to verify that its 64 field elements are consistent with the blob commitment. To ensure data availability, each validator only needs to receive and verify a small subset (e.g., 8) of the cells from the network, instead of downloading and verifying the entire blob as required in EIP-4844.
Cell Proof Generation
During block proposal, the proposer splits the blob into cells and generates a proof for each cell. First, the blob is extended with erasure coding to double its size, and then the extended blob is divided into cells, each containing 64 field elements. Next, the KZG multi-point proof for each cell is computed. The FK20 paper describes an efficient algorithm for computing all cell proofs of a blob. Below is the function signature for cell proof generation in c-kzg-4844:
C_KZG_RET compute_cells_and_kzg_proofs(
Cell *cells, // The output cells
KZGProof *proofs, // The output proofs for each cell
const Blob *blob, // The input blob data
const KZGSettings *s // The settings, including the trusted setup and precomputation
);
Cell Proof Verification
Validators retrieve a subset of cells and proofs from the network and verify these KZG multi-point proofs against the blob commitment. Since a block can contain multiple blobs, validators use a universal verification method to efficiently verify cells from different blobs in a single batch. Below is the function signature for cell proof verification in c-kzg-4844:
C_KZG_RET verify_cell_kzg_proof_batch(
bool *ok, // The verification result
const Bytes48 *commitments_bytes, // The commitment of the entire blob that the cell belongs to
const uint64_t *cell_indices, // The indices of the cells in the blob
const Cell *cells, // The array of cell data
const Bytes48 *proofs_bytes, // The proofs for the cells
uint64_t num_cells, // The number of cells in this batch, specifying the length of the above arrays
const KZGSettings *s // The settings, including the trusted setup and precomputation
);
Blob Recovery
To recover the original blob, other nodes (e.g., index nodes) can retrieve cell data from the network. Once more than half of the cell data is retrieved, the entire blob can be reconstructed using erasure coding. Below is the function signature for blob recovery in c-kzg-4844:
C_KZG_RET recover_cells_and_kzg_proofs(
Cell *recovered_cells, // The output recovered cells
KZGProof *recovered_proofs, // The output recovered proofs for each cell
const uint64_t *cell_indices, // The input indices of each cell
const Cell *cells, // The input cell data
uint64_t num_cells, // The number of input cells
const KZGSettings *s // The settings, including the trusted setup and precomputation
);
Multi-Scalar Multiplication (MSM) in blst
The blst library implements the BLS12-381 elliptic curve and pairing operations in C and assembly. Our scope focuses on the MSM (Multi-Scalar Multiplication) implementation, which primarily uses the Pippenger algorithm. Note that blst is designed for cryptographic signatures and implements constant-time operations to prevent side-channel attacks. However, for data availability sampling use cases, constant-time operations are not required and may reduce efficiency.
Given a list of base points and scalars (all of fixed bit length, e.g., 256) , the Pippenger algorithm efficiently computes .
The main idea is to divide the scalars into smaller windows. Within each window, the scalars are grouped into buckets based on their values, allowing for efficient computation. For example, consider four points and four 4-bit scalars . If the window size is 2 bits, the scalars are split into two parts: the lower half and the upper half . The computation is then performed separately for each window:
The final result is obtained by summing the contributions from each window: . Within each window, points with the same scalar value are grouped into the same bucket to reduce the number of operations. For example, can be computed as , and as .
Booth Encoding is an optimization technique that reduces the bucket size by half. Instead of using scalar values in the range , Booth encoding maps them to . This optimization leverages the fact that the negative of a point (denoted ) can be easily computed by negating its coordinate.
For example, if a scalar is , the algorithm negates the point and places it in the bucket for , as . Given a scalar and a window of bits, Booth encoding splits the scalar into smaller components within the range , while ensuring that . This approach reduces the number of buckets required and improves efficiency.
The ptype##s_mult_pippenger function computes the multiplication for each window dynamically, while the ptype##s_mult_wbits function precomputes the multiplications for each window (e.g., ) and caches them. This allows for faster selection of points during computation but consumes more memory.
Below are listed the findings found during the engagement. High severity findings can be seen as
so-called
"priority 0" issues that need fixing (potentially urgently). Medium severity findings are most often
serious
findings that have less impact (or are harder to exploit) than high-severity findings. Low severity
findings
are most often exploitable in contrived scenarios, if at all, but still warrant reflection. Findings
marked
as informational are general comments that did not fit any of the other criteria.
Description. The function ptype##s_to_affine_row_wbits (defined in blst/src/multi_scalar.c) fails to correctly handle projective points at infinity (Z = 0).
Specifically, in the section where the accumulator is built:
for (i = 0; i < npoints; i++)
for (j = nwin; --src, --j; acc++)
mul_##field(acc[0], acc[-1], src->Z);
the Z coordinate is multiplied directly into the accumulator without guarding against the case where Z = 0. This means that if any point in the input batch is infinity, the accumulator becomes zero. Since batch inversion relies on this accumulator, the reciprocal computation:
--acc; reciprocal_##field(acc[0], acc[0]);
will fail silently, propagating invalid zeroes into the affine coordinates of all points.
By contrast, the related function ptype##s_to_affine correctly applies:
vec_select(acc++, BLS12_381_Rx.p, point->Z, sizeof(vec##bits),
vec_is_zero(point->Z, sizeof(point->Z)));
which substitutes a neutral Montgomery radix (1) when Z = 0, ensuring that infinity points do not corrupt the batch.
Fuzzer Context. To find this and another findings we developed and ran a custom fuzzer. The fuzzer targets the blst BLS12-381 scalar multiplication routines, including both single-point, wbits-precompute, and Pippenger multi-point paths. Inputs are variable-length byte arrays that are split into a scalar (up to 320 bits) and a compressed curve point.
For each input, the fuzzer:
- Parses and validates the scalar and curve point.
- Executes all three multiplication paths. Using the zero scalar and point for the second and third paths as required.
- Compares results to detect inconsistencies or crashes.
The fuzzer uses AFL++ in persistent mode but can also run standalone for testing. Any mismatches trigger an immediate abort to report potential bugs.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <assert.h>
#include <unistd.h>
#include "bindings/blst.h"
#ifndef __AFL_FUZZ_TESTCASE_LEN
ssize_t fuzz_len;
#define __AFL_FUZZ_TESTCASE_LEN fuzz_len
unsigned char fuzz_buf[1024*1024];
#define __AFL_FUZZ_TESTCASE_BUF fuzz_buf
#define __AFL_FUZZ_INIT() void sync(void);
#define __AFL_LOOP(x) ((fuzz_len = read(0, fuzz_buf, sizeof(fuzz_buf))) > 0 ? 1 : 0)
#define __AFL_INIT() sync()
#endif
__AFL_FUZZ_INIT();
#define SCALAR_SIZE 32
#define POINT_SIZE 48
#define MAX_SCALAR_SIZE 40
#define MAX_INPUT_SIZE (MAX_SCALAR_SIZE + POINT_SIZE)
#define PIPPENGER_NPOINTS 32
static limb_t *g_scratch;
static size_t g_scratch_size;
static byte *g_zero_scalars;
static blst_p1_affine *g_zero_points;
static blst_p1_affine *g_points_array;
static byte *g_scalars_array;
static const blst_p1_affine **g_point_ptrs;
static const byte **g_scalar_ptrs;
static void init_fuzzer() {
g_scratch_size = blst_p1s_mult_pippenger_scratch_sizeof(PIPPENGER_NPOINTS);
size_t wbits_scratch = blst_p1s_mult_wbits_scratch_sizeof(PIPPENGER_NPOINTS);
if (wbits_scratch > g_scratch_size) g_scratch_size = wbits_scratch;
g_scratch = malloc(g_scratch_size);
g_zero_scalars = calloc(PIPPENGER_NPOINTS, MAX_SCALAR_SIZE);
g_zero_points = calloc(PIPPENGER_NPOINTS, sizeof(blst_p1_affine));
g_points_array = malloc(PIPPENGER_NPOINTS * sizeof(blst_p1_affine));
g_scalars_array = malloc(PIPPENGER_NPOINTS * MAX_SCALAR_SIZE);
g_point_ptrs = malloc(PIPPENGER_NPOINTS * sizeof(blst_p1_affine *));
g_scalar_ptrs = malloc(PIPPENGER_NPOINTS * sizeof(byte *));
}
static size_t calc_scalar_nbits(const byte *scalar, size_t max_bytes) {
for (size_t i = max_bytes; i > 0; i--) {
if (scalar[i-1]) {
size_t bits = i*8;
byte b = scalar[i-1];
while ((b & 0x80) == 0 && bits > (i-1)*8) { bits--; b <<= 1; }
return bits;
}
}
return 0;
}
static int parse_input(const uint8_t *data, size_t size, byte *scalar, size_t *scalar_nbits, blst_p1_affine *point) {
byte tmp[MAX_INPUT_SIZE];
if (size > MAX_INPUT_SIZE) return 0;
memcpy(tmp, data, size); if (size < MAX_INPUT_SIZE) memset(tmp+size, 0, MAX_INPUT_SIZE-size);
memcpy(scalar, tmp, MAX_SCALAR_SIZE);
*scalar_nbits = calc_scalar_nbits(scalar, MAX_SCALAR_SIZE);
return blst_p1_uncompress(point, tmp + MAX_SCALAR_SIZE) == BLST_SUCCESS;
}
static void test_path(const byte *scalar, size_t nbits, const blst_p1_affine *point, blst_p1 *r1, blst_p1 *r2, blst_p1 *r3) {
const blst_p1_affine *p1[1] = {point};
const byte *s1[1] = {scalar};
blst_p1s_mult_pippenger(r1, p1, 1, s1, nbits, g_scratch);
g_point_ptrs[0]=point; g_point_ptrs[1]=g_zero_points;
g_scalar_ptrs[0]=scalar; g_scalar_ptrs[1]=g_zero_scalars;
blst_p1s_mult_pippenger(r2, g_point_ptrs, 2, g_scalar_ptrs, nbits, g_scratch);
memcpy(g_points_array, point, sizeof(blst_p1_affine));
memcpy(g_scalars_array, scalar, MAX_SCALAR_SIZE);
g_point_ptrs[0]=g_points_array; g_point_ptrs[1]=NULL;
g_scalar_ptrs[0]=g_scalars_array; g_scalar_ptrs[1]=NULL;
blst_p1s_mult_pippenger(r3, g_point_ptrs, PIPPENGER_NPOINTS, g_scalar_ptrs, nbits, g_scratch);
}
static int points_equal(const blst_p1 *a, const blst_p1 *b) { return blst_p1_is_equal(a,b); }
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
byte scalar[MAX_SCALAR_SIZE]; blst_p1_affine point; size_t scalar_nbits;
if (!parse_input(data, size, scalar, &scalar_nbits, &point)) return 0;
size_t max_nbits = scalar_nbits + 10; if (max_nbits > MAX_SCALAR_SIZE*8) max_nbits = MAX_SCALAR_SIZE*8;
for (size_t nbits = scalar_nbits; nbits <= max_nbits; nbits++) {
blst_p1 r1,r2,r3; test_path(scalar, nbits, &point, &r1,&r2,&r3);
if (!points_equal(&r1,&r2) || !points_equal(&r1,&r3) || !points_equal(&r2,&r3)) abort();
}
return 0;
}
int main(int argc, char **argv) {
init_fuzzer();
unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
while (__AFL_LOOP(10000)) { int len=__AFL_FUZZ_TESTCASE_LEN; if(len>0 && len<=MAX_INPUT_SIZE){memset(g_scratch,0,g_scratch_size); LLVMFuzzerTestOneInput(buf,len);}}
if(argc==2){ FILE *fp=fopen(argv[1],"rb"); uint8_t data[MAX_INPUT_SIZE]; size_t read=fread(data,1,MAX_INPUT_SIZE,fp); fclose(fp); LLVMFuzzerTestOneInput(data,read);}
free(g_scratch); free(g_zero_scalars); free(g_zero_points); free(g_points_array); free(g_scalars_array); free(g_point_ptrs); free(g_scalar_ptrs);
return 0;
}
Impact.
ptype##s_to_affine_row_wbits() is used in ptype##s_precompute_wbits() to batch-convert the computed point table from Jabobi to affine representation. In turn, ptype##s_precompute_wbits() is used as precomputation step of the main MSM routine prefix##s_mult_pippenger(), in the case that the number of points is between 2 and 31:
void prefix##s_mult_pippenger(ptype *ret, \
const ptype##_affine *const points[], \
size_t npoints, \
const byte *const scalars[], size_t nbits, \
ptype##xyzz scratch[]) \
{ \
// ...
if ((npoints * sizeof(ptype##_affine) * 8 * 3) <= SCRATCH_LIMIT && \
npoints < 32) { \
ptype##_affine *table = alloca(npoints * sizeof(ptype##_affine) * 8); \
ptype##s_precompute_wbits(table, 4, points, npoints); \
ptype##s_mult_wbits(ret, table, 4, npoints, scalars, nbits, NULL); \
return; \
} \
ptype##s_mult_pippenger(ret, points, npoints, scalars, nbits, scratch, 0); \
}
According to our analysis, the issue is unlikely to affect the KZG libraries that depend on prefix##s_mult_pippenger(). For example, g1_lincomb_fast() in c-kzg-4844 depends on the problematic code path when the number of points is between 8 and 31 input points. However, it avoids zero input points by filtering them out before calling into blst. The same is true for the Rust library.
Nevertheless, the issue is subtle and could affect the soundness of similar protocols or of the EIP-7594 verifier if point filtering was removed. Inherently, g1_lincomb_fast() is called on prover-supplied input points (the KZG cell proofs) and is therefore vulnerable to any mishandling of invalid points.
Recommendation. Adopt the same protective logic used in ptype##s_to_affine:
- Use
vec_select to substitute Z = 1 when Z = 0 before contributing to the batch product.
- Apply
vec_czero on the affine outputs to zero them out if the original point was infinity.
This ensures that infinity points are handled locally and do not corrupt other valid points in the batch.
Client Response. Fixed in https://github.com/supranational/blst/commit/f48500c1fdbefa7c0bf9800bccd65d28236799c1.
Description. The FK20 verifier is instantiated with three parameters:
num_points_to_open, the full domain size we operate on
num_cosets
verification_key.coset_size
The parameters are supposed to be related and satisfy:
num_cosets * verification_key.coset_size = num_points_to_open
Furthermore, all three of these parameters are supposed to be powers of two.
The FK20Verifier constructor, however, takes in these parameters independently and does not validate their relation.
impl FK20Verifier {
pub fn new(
verification_key: VerificationKey,
num_points_to_open: usize,
num_cosets: usize,
) -> Self {
const BIT_REVERSED: bool = true;
let coset_gens = coset_gens(num_points_to_open, num_cosets, BIT_REVERSED);
// [ZKSECURITY] `coset_size` is recalculated here but not linked to `verification_key.coset_size`
let coset_size = num_points_to_open / num_cosets;
assert!(
verification_key.g2s.len() >= coset_size,
"need as many g2 points as coset size"
);
// [ZKSECURITY] this line will internally use the next power of two from
// `verification_key.coset_size` to create the domain
let coset_domain = Domain::new(verification_key.coset_size);
// [ZKSECURITY] however, `verification_key.coset_size` itself enters the cryptographic protocol
// as an essential parameter without being rounded to a power of two
let n = verification_key.coset_size;
// [tau^n]_2
let tau_pow_n = G2Prepared::from(verification_key.g2s[n]
Empirically, the verifier can be instantiated and run successfully with num_cosets not equal to num_points_to_open / verification_key.coset_size. In this case, the relation being checked refers to different coset generators than intended, which feels like a footgun.
Even more surprising, VerificationKey and FK20Verifier can be instantiated with verification_key.coset_size not being a power of two. For creating roots of unity internally, Domain::new(verification_key.coset_size) will replace it with the next power of two. Meanwhile, for preparing the universal verification equation, we use n = verification_key.coset_size directly which is allowed to be a non-power of two.
(This is not the case for num_points_to_open and num_cosets: these are asserted to be powers of two.)
To see why being a non-power of two is bad, let . Note that the verification equation guarantees that, for a collection of coset generators , committed polynomials , input values we have
for all , where is a -th root of unity and is some quotient polynomial (committed to by the KZG proof).
In the intended protocol, and the second term on the RHS vanishes, giving , i.e. we prove that are evaluations of as desired.
However, if we allow , the verification equation becomes meaningless as the prover can arbitrarily change the purported evaluations by tweaking the quotient .
Incidentally, while the constructor allows , verify_multi_opening() will throw due to checks on the length of evaluations. Nevertheless, it seems like an unnecessary footgun. And as documented in another finding, the lengths of evaluations are only weakly restricted as well.
Impact. None of the above affects the end-to-end EIP-7594 verifier, where parameters are hardcoded to correct values.
Recommendation. Enforce both the relation num_cosets * verification_key.coset_size = num_points_to_open and that verification_key.coset_size is a power of two, in the FK20Verifier constructor.
Client Response. Fixed in https://github.com/crate-crypto/rust-eth-kzg/pull/420.
Description. For the recover_cells_and_kzg_proofs() method of EIP-7594, the spec uses a subroutine called recover_polynomialcoeff. The method assumes a number of given cells, interprets them as polynomial evaluations and aims to recover the full polynomial in coefficient form. We found that the algorithm could be a bit more efficient than currently specified and implemented.
Let be the polynomial degree, and let the evaluation domain be the -th roots of unity . We assume the domain is evenly divided into blocks of size , where .
A cell with cell index is the structured subset of length of the form
We are given evaluations for all and , where are the missing cells. Recovery works by constructing the following vanishing polynomial that vanishes exactly on the missing cells:
Indeed, for all elements of a cell , we have
and therefore . Furthermore, these are the only roots of .
Finally, observe that is a function of and we could write where . As it happens, is a vanishing polynomial on a subset of the small domain of -th roots .
Recovery, as implemented, works as follows:
- Compute the in coefficient form. This can be done by first computing the small-domain in , using naive polynomial multiplications. And then, expanding those coefficients to a sparse polynomial of times the degree, by shifting a coefficient at position into position . This corresponds to replacing by .
- Perform a size- FFT to obtain all values of , on the full evaluation domain.
- Perform a size- coset-FFT to obtain evaluations , , for a coset generator .
- Compute products , . Observe that the vanishing polynomial kills the contributions of the missing evaluations.
- Perform a size- IFFT to obtain in coefficient form.
- Perform a size- coset-FFT to obtain evaluations , .
- Compute on the coset. Thanks to evaluating on a coset, denominators are non-zero. Note: this step can be done in field multiplications using batch inversion.
- Compute a size- coset-IFFT to recover in coefficient form.
This algorithm is dominated by 5 size- FFTs. It will recover the original degree- polynomial, if at most half the cell evaluations were missing.
Our observation is that steps (2) and (3) are mostly unnecessary, because the vanishing polynomial evaluations are repeating in each block:
Similarly, on the coset we have
It is therefore enough to compute ’s evaluations on the small size- domain and its -coset, which can be done by 2 size- FFTs. Whenever one of the evaluations of or are needed, we can look them up in an array of length .
In practice, is much smaller than , and small FFTs have negligible effort compared to full-domain operations. So we expect this optimization to save close to 40% of the effort (2 out of 5 full-domain FFTs).