Introduction
Starting September 29th, 2025, zkSecurity conducted a security audit of new serialization, hash and signature instructions added to the SnarkVM. A single consultant reviewed the code for a week.
We reviewed the snarkVM repository on the feat/bytes branch at commit 41d32c5f.
The audit focused on the addition of new instructions for serialization (to bits) of types, hashing of serializable types, and ECDSA signature operations.
The new hash and ECDSA signature instructions are usable only out-of-circuit in the finalize;
a snippet of code run after the in-circuit part of a function which can read account state and apply effects (beyond consuming/creating records).
Motivation
The primary motivation for the addition of these instructions is to interoperate with systems outside of Aleo,
for instance, enabling the verification of Ethereum-style ECDSA signatures produced by bridges.
Since the complexity and constraint count of writing circuits for ECDSA verification is substantial,
requiring the implementation of elliptic curve operations over a foreign curve (not defined over the field of the SNARK)
and arithmetizing the hash functions, Aleo has opted to only support these operations in the out-of-circuit component of the VM, namely in the finalize.
This substantially reduces the scope and complexity of this effort.
Native & Raw Encoding
The changes introduce a new form of encoding: raw.
Prior to this, Aleo types were serialized using a custom Type-Length-Value (TLV) encoding scheme
during signature verification / hashing etc. This encoding is referred to as native encoding and is the “default” encoding for instructions not having the .raw postfix.
This encoding ensures that all values of distinct types are serialized as distinct sequences of bits: any sequence of bits corresponds to a distinct value of a distinct type within the system.
This is done to avoid confusion about the “semantics” of signed/hashed sequences of bits/field elements to be signed:
by signing a sequence of encoded bits, the signer is signing a unique message of a distinct type within the overall system.
As a side-effect the particular encoding is prefix-free, meaning that a sequence of bits can be padded with a constant to e.g. a multiple of 8 bits, and still correspond uniquely to a single type/value pair.
The problem with this encoding is that it prevents interoperability (and is verbose): using the existing hashing/signing interfaces it is not possible to verify a signature on a value encoded in another format, e.g. an ASN.1 DER encoded message or one encoded using RLP (used in Ethereum).
To enable this, Provable introduced a raw encoding, which serializes Aleo/SnarkVM types without the type and length prefix, for instance a struct:
struct example:
v0 as u32;
v1 as u32;
Is serialized simply as 64 bits: the bits of v0 in little endian followed by the bits of v1 in little endian.
The result is an encoding which is much more efficient, and allows “parsing” signed sequences of bits into Aleo structs; by “deserializing” (effectively casting) the bits into a struct.
One obvious thing to observe is that this format does not uniquely describe the type of a value, for instance, the struct above, once encoded, has the same bit representation as this:
struct example2:
vx as u64;
Which is also serialized 64 bits: as the bits of vx in little endian.
Therefore, implementing / ensuring adequate domain separation is left to the application by design.
New Instructions
For context, let’s provide a reference of every new instruction added to the SnarkVM.
A total of 37 new instructions were added in the reviewed pull request:
Serialization Instructions
Instructions for converting types to bit arrays:
serialize.bits — converts types to bit arrays in the native encoding.
serialize.bits.raw — converts types to raw bit arrays without metadata
Deserialization Instructions
Instructions for converting bit arrays back to typed values:
deserialize.bits — converts bit arrays with metadata back to typed values
deserialize.bits.raw — converts raw bit arrays back to typed values
Hash Instructions
Instructions for hashing SnarkVM types:
-
Keccak hash of “native” TLV encoded values:
hash.keccak256.native
hash.keccak384.native
hash.keccak512.native
-
Sha3 hash of “native” TLV encoded values:
hash.sha3_256.native
hash.sha3_384.native
hash.sha3_512.native
-
Keccak hash of “raw” encoded values:
hash.keccak256.raw
hash.keccak384.raw
hash.keccak512.raw
-
Sha3 hash of “raw” encoded values:
hash.sha3_256.raw
hash.sha3_384.raw
hash.sha3_512.raw
ECDSA Signature Verification Instructions
Adds support for ECDSA signatures over the Secp256k1 (“Bitcoin”/”Ethereum”) curve.
Verification of an ECDSA signature against message digest:
ecdsa.verify.digest
ecdsa.verify.digest.eth
ECDSA verification/signing starts by computing , the rest of the signing/verification is independent of the hash function (except that the output may be truncated). This allows to potentially support hash functions besides Keccak* and Sha3*
as well as interop with other systems where the hash is computed separately.
Verification (Native Encoded Messages)
Verification of ECDSA signatures with various hash functions on messages encoded using the native encoding scheme:
ecdsa.verify.keccak256
ecdsa.verify.keccak384
ecdsa.verify.keccak512
ecdsa.verify.sha3_256
ecdsa.verify.sha3_384
ecdsa.verify.sha3_512
Verification (Raw Encoded Messages)
Verification of signatures, with various hash functions on messages encoded using the raw encoding scheme:
ecdsa.verify.keccak256.raw
ecdsa.verify.keccak384.raw
ecdsa.verify.keccak512.raw
ecdsa.verify.sha3_256.raw
ecdsa.verify.sha3_384.raw
ecdsa.verify.sha3_512.raw
Verification (Raw Encoded Messages, Ethereum Addresses)
Verification of signatures, with various hash functions on messages against “Ethereum addresses” with raw encoding:
ecdsa.verify.keccak256.eth
ecdsa.verify.keccak384.eth
ecdsa.verify.keccak512.eth
ecdsa.verify.sha3_256.eth
ecdsa.verify.sha3_384.eth
ecdsa.verify.sha3_512.eth
Native Signatures with Raw Encoding
Verification of native Aleo signatures is extended to support raw encoded messages:
Which works by:
- Serializing the type using
raw encoding.
- Packing the bits into field elements.
- Signing the sequence of field elements.
Primary Considerations
The implementation uses the well-known k259 crate,
which has previously undergone audit and is out of scope for this report.
As indicated by findings, the primary source of “subtle behavior” (both bugs and questions about intended behavior),
is in the way that values are encoded/packed/interpreted when fed to the hash functions,
subsequently used directly or as part of the ECDSA signature verification.
Below are listed the findings found during the engagement. High severity findings can be seen as
so-called
"priority 0" issues that need fixing (potentially urgently). Medium severity findings are most often
serious
findings that have less impact (or are harder to exploit) than high-severity findings. Low severity
findings
are most often exploitable in contrived scenarios, if at all, but still warrant reflection. Findings
marked
as informational are general comments that did not fit any of the other criteria.
Description.
Unlike Pedersen/Bowe-Hopwood-Pedersen hashes, which are well-defined for sequences of bits,
the new hash functions like, Keccak, are only defined on byte sequences*.
This leaves the question: what are the desired semantics when hashing a sequence of bits, which is not a multiple of 8?
The proposed implementation does this by padding with zero bits up to the next byte boundary.
The result is that the hash of a bit sequence is the same as the hash of the same bit sequence padded with zero bits up to the next byte boundary, e.g.
hash([]) = hash([0])
The result is that these hash functions are not collision-resistant (over bit sequences)
and as a result, different values of different types with different sizes can produce the same digest when using the raw encoding.
This is unexpected behavior: users expect that for raw encoding, types of the same size may produce the same sequence of serialized bits,
however, it is unexpected that different types of different sizes, serializing to different bit sequences under raw encoding, can produce the same digest.
A concrete example demonstrates the severity: a 33-bit Vote struct can collide with a 34-bit Send struct:
struct Vote {
choice: u32, // who to vote for
is_final: bool // can this vote be updated?
}
struct Send {
approval: bool, // approval required from controller?
fast: bool, // fast transfer?
dst: u32, // index of dst account
}
Signing Vote { vote: 4, is_final: true } produces the bit sequence:
0100000000000000000000000000000 | 1 | 0000000
This is identical to Send { approval: false, fast: true, dst: 2 }:
0 | 1 | 0000000000000000000000000000010 | 000000
A signature on one type can be replayed as a valid signature for a completely different message type.
Impact. Developers may assume that different-sized types are automatically domain-separated, leading to signature replay attacks and hash collisions across different message types. This is especially dangerous when structs contain boolean flags or when protocol versions introduce new message formats.
This affects the following instructions:
hash.keccak256.raw
hash.keccak384.raw
hash.keccak512.raw
hash.keccak256.native.raw
hash.keccak384.native.raw
hash.keccak512.native.raw
hash.sha3_256.raw
hash.sha3_384.raw
hash.sha3_512.raw
hash.sha3_256.native.raw
hash.sha3_384.native.raw
hash.sha3_512.native.raw
Recommendation. Implement a check that rejects inputs whose bit length is not a multiple of 8 (i.e., not byte-aligned).
We believe this is the most reasonable behavior: Keccak hashes of bit sequences not a multiple of 8 bits are not well-defined.
By rejecting these inputs as “not in the domain of Keccak”,
developers are forced to explicitly handle padding, making the collision risk explicit
and requiring them to make a deliberate choice about how to handle padding.
Description. The sign function creates signatures that are not unforgeable in the general case due to potential hash collisions for variable-length messages:
/// Returns a signature on a `message` using the given `signing_key` and hash function.
pub fn sign<H: Hash<Output = Vec<bool>>>(
signing_key: &SigningKey,
hasher: &H,
message: &[H::Input],
) -> Result<Self> {
// Hash the message.
let hash_bits = hasher.hash(message)?;
// Convert the hash output to bytes.
let hash_bytes = bytes_from_bits_le(&hash_bits);
// Sign the prehashed message.
signing_key
.sign_prehash(&hash_bytes)
.map(|(signature, recovery_id)| {
let recovery_id = RecoveryID { recovery_id, chain_id: None };
Self { signature, recovery_id }
})
.map_err(|e| anyhow!("Failed to sign message: {e:?}"))
}
The issue occurs because hash functions like the Bool hasher Keccak256 may have collisions for different-length messages. For example:
sigma = sign(sk, Keccak256, [0, 0, 1, 1])
This signature can be verified against a different message:
verify(vk, Keccak256, sigma, [0, 0, 1, 1, 0, 0, 0, 0])
Impact.
An attacker could exploit the hash collisions across structs of different sizes
to produce valid signatures on structures without possessing the signing key. This allows signature replay attacks where a signature on one message is reused to validate a different message that hashes to the same value.
Recommendation. Ensure the hash function includes message length in its domain separation or use a collision-resistant encoding that prevents different-length messages from hashing to the same value. Consider encoding the message length as part of the hash input.
Description. Multiple FromStr/FromBytes implementations do not validate that the entire input string is consumed during parsing.
This means that adding junk to the encodings also deserializes successfully.
For instance ECDSASignature and CircuitVerifyingKey:
impl FromStr for ECDSASignature {
type Err = Error;
/// Parses a hex-encoded string into an ECDSASignature.
fn from_str(signature: &str) -> Result<Self, Self::Err> {
let mut s = signature.trim();
// Accept optional 0x prefix
if let Some(rest) = s.strip_prefix("0x").or_else(|| s.strip_prefix("0X")) {
s = rest;
}
// Decode the hex string into bytes.
let bytes = hex::decode(s)?;
// Construct the signature from the bytes.
Self::from_bytes_le(&bytes)
}
}
This occurs because the default from_bytes_le implementation treats the byte slice as a reader without checking for EOF:
pub trait FromBytes {
/// Reads `Self` from `reader` as little-endian bytes.
fn read_le<R: Read>(reader: R) -> IoResult<Self>
where
Self: Sized;
/// Returns `Self` from a byte array in little-endian order.
fn from_bytes_le(bytes: &[u8]) -> anyhow::Result<Self>
where
Self: Sized,
{
Ok(Self::read_le(bytes)?)
}
}
As a result, any additional bytes appended to a valid signature hex string are silently ignored during parsing.
This is expected behavior for read_le as it can be used to parse a sequence of objects from a reader, one after another,
however from_bytes_le is given a slice and does not indicate the number of bytes consumed.
This issue is also present for verification keys:
impl<E: PairingEngine> FromStr for CircuitVerifyingKey<E> {
type Err = anyhow::Error;
#[inline]
fn from_str(vk_hex: &str) -> Result<Self, Self::Err> {
Self::from_bytes_le(&hex::decode(vk_hex)?)
}
}
Which means that any verification key followed by junk will still be parsed successfully.
A similar issue is present in the deserializing of, for instance, native signatures from bech32:
impl<N: Network> FromStr for Signature<N> {
type Err = Error;
/// Reads in the signature string.
fn from_str(signature: &str) -> Result<Self, Self::Err> {
// Decode the signature string from bech32m.
let (hrp, data, variant) = bech32::decode(signature)?;
if hrp != SIGNATURE_PREFIX {
bail!("Failed to decode signature: '{hrp}' is an invalid prefix")
} else if data.is_empty() {
bail!("Failed to decode signature: data field is empty")
} else if variant != bech32::Variant::Bech32m {
bail!("Found an signature that is not bech32m encoded: {signature}");
}
// Decode the signature data from u5 to u8, and into the signature.
Ok(Self::read_le(&Vec::from_base32(&data)?[..])?)
}
}
Meaning that any verification key followed by junk will still be parsed successfully.
Recommendation. Add a length check to ensure the input contains exactly the expected number of bytes, for instance, by replacing the existing default implementation of from_bytes_le with a version that checks that all the bytes of the slice have been consumed, e.g.
pub trait FromBytes {
/// Reads `Self` from `reader` as little-endian bytes.
fn read_le<R: Read>(reader: R) -> IoResult<Self>
where
Self: Sized;
/// Returns `Self` from a byte array in little-endian order.
fn from_bytes_le(bytes: &[u8]) -> anyhow::Result<Self>
where
Self: Sized,
{
use std::io::Cursor;
let mut buf = Cursor::new(bytes);
let value = Self::read_le(&mut buf)?;
if buf.position() != bytes.len() as u64 {
Err(anyhow::anyhow!("Unexpected trailing bytes"))
} else {
Ok(value)
}
}
}
And then use from_bytes_le consistently throughout the codebase, in place of read_le(Vec::from(..)) as done in e.g.
impl<N: Network> FromStr for Ciphertext<N> {
type Err = Error;
/// Reads in the ciphertext string.
fn from_str(ciphertext: &str) -> Result<Self, Self::Err> {
// Decode the ciphertext string from bech32m.
let (hrp, data, variant) = bech32::decode(ciphertext)?;
if hrp != CIPHERTEXT_PREFIX {
bail!("Failed to decode ciphertext: '{hrp}' is an invalid prefix")
} else if data.is_empty() {
bail!("Failed to decode ciphertext: data field is empty")
} else if variant != bech32::Variant::Bech32m {
bail!("Found an ciphertext that is not bech32m encoded: {ciphertext}");
}
// Decode the ciphertext data from u5 to u8, and into the ciphertext.
Ok(Self::read_le(&Vec::from_base32(&data)?[..])?)
}
}