Introduction
On September 11th, 2023, zkSecurity was tasked to audit Aleo’s synthesizer for use in the Aleo blockchain. Two consultants worked over the next 3 weeks to review Aleo’s codebase for security issues.
The code was found to be thoroughly documented, rigorously tested, and well specified. A number of findings were reported in this document.
Scope
zkSecurity used the testnet3-audit-zks
branch (commit 9f5246d8c28afc0aef769b723eeaa4ec4a402b9c
).
Included in scope was the synthesizer part of the snarkVM which lives in the synthesizer folder of the snarkVM repository. We provide more detail on the concepts behind the synthesizer in the Aleo synthesizer overview section.
Note that we were also provided with a specification of Aleo covering some of the protocols behind the synthesizer, as well as a specification of Varuna, Aleo’s proof system.
Recommendations
In addition to addressing the findings of this report, we offer the following strategic recommendations to Aleo:
Improved specifications for complex parts of the protocol. A number of important technical decisions are currently not well-documented or specified. This led to a number of findings, including issues with the deployment workflow (Incorrect Fiat-Shamir Transform in Circuit Certification), the authorization/request workflow (Caller Is Not Fixed Throughout Function Execution), the transition design (Non-Committing Encryption Used in InputID::Private, Non-Committing Encryption Used in OutputID::Private), and the delegation design (Proof Delegation Is Subject To Truncation). We strongly recommend more time investment on specifying and documenting the rationale behind the different subprotocols of the synthesizer.
Refactor complex parts of the codebase. Some parts of the synthesizer are quite complex and contain code that is difficult to read and understand. This is especially the case for the deployment, authorization, and execution flows supported by the synthesizer. All of these flows are implemented for both the prover and the verifier, and all of these entry points end up calling the same set of functions that share common logic. Specifying and refactoring these parts to improve readability would benefit the project.
Review of lower-level gadgets. Most of the audit focused on the synthesizer, its many flows and edge-cases, as well as the protocols around it. The synthesizer relies on a number of lower-level gadgets, such as hash functions, arithmetic, and bitwise operations, that might benefit from further review.
Aleo synthesizer overview
The Aleo synthesizer is the core protocol used in the deployment and execution of user programs on the Aleo blockchain. It wraps Varuna (Aleo’s Marlin-based zero-knowledge proof system) and provides a high-level interface for users to deploy and execute programs and for the network to verify correct deployments and executions.
There’s in theory a third flow supported by the synthesizer, the fee collection process, which is significantly less complex and which we’ll omit in this overview.
This section provides an overview of the deployment and the execution flows of the synthesizer, from the user and from the network perspectives.
Aleo Programs
Aleo provides two ways of writing programs for its blockchain: using the Leo programming language or using a lower-level VM instruction set called the Aleo instruction language.
Leo programs are eventually compiled down to Aleo instructions, which is the form used to deploy and execute programs. For this reason this is the last time we mention Leo.
At a high-level, an Aleo program includes a number of structures that are similar to Ethereum smart contracts:
- A list of other (already-deployed) programs imported by the program.
- A list of public mappings that can be mutated by the program (only through
finalize
functions, see later explanations). - A list of functions that can be executed by users.
So far, this shouldn’t look that much different from Ethereum’s smart contract design. The exception is that functions are only executed once, locally on a user’s machine, and then verified by the network via the use of zero-knowledge proofs.
Because user transactions execute functions before they can be ordered by the network, they introduce contention issues on the resources they consume. Imagine the following: if two transactions rely on the state of the program being A
, and then both try to mutate the state to B
and C
respectively, then only one of them can succeed. As they are conflicting transactions, the first one being processed by the network will invalidate the other.
For this reason, functions in Aleo programs are actually split into two parts.
The first part is simply called the “function”, and contains highly-parallelizable logic that should only act on single-user objects. These single-user objects are called “records” and are akin to UTXOs in Bitcoin, but stored in a shielded pool similar to Zcash. As previously explained, the user can execute a function and provide a proof of correct execution to the network.
As such, Aleo programs define their own custom records, which always have an owner
field (enforcing who can consume them) and that can contain application-specific data.
The second part is some logic, which execution is delegated to the network (like in Ethereum). Its execution is deferred to after its wrapping transaction has been ordered and sequenced by Aleo’s consensus. A function can encode such logic by creating a finalize
function with the same name.
Finalize functions cost a fee to run depending on the instructions it contains (similar to the “gas” notion of Ethereum). A finalize function is run in the clear by the network, and can perform public state transitions (specifically, mutate any mappings defined in the program). This is the only way to mutate the public state of a program.
To simplify this concept of split logic, one can think of developing on Aleo as a similar experience as developing on Ethereum if one would stick to hosting most of the logic in finalize
functions. But to decrease the cost of execution, or to provide privacy features, developers have the choice to move part of the logic in the non-finalize part.
Synthesizing the circuits associated to the functions
The SnarkVM synthesizer’s main goal is to produce the zero-knowledge proof circuits that encode the functions of an Aleo program. In practice, the synthesizer does that by parsing an Aleo program into its functions written in Aleo instructions, and then by converting each of the Aleo instructions into their matching circuit gadget.
In addition, a number of subcircuits are added at the beginning and end of a program’s function:
Request verification. As Aleo programs offer user privacy, the caller of the function cannot be leaked. For this reason, each circuit enforces that the caller of the function is in possession of their private key. While this could be done by simply witnessing the private key and enforcing that it correctly derives to the caller’s address, signatures on “requests” are used instead.
These signed requests (as seen below) authorize the execution of some program’s function with specific arguments (or inputs). A user can use them to delegate the creation of the proof to a third party prover.
pub struct Request<N: Network> {
/// The request caller.
caller: Address<N>,
/// The network ID.
network_id: U16<N>,
/// The program ID.
program_id: ProgramID<N>,
/// The function name.
function_name: Identifier<N>,
/// The input ID for the transition.
input_ids: Vec<InputID<N>>,
/// The function inputs.
inputs: Vec<Value<N>>,
/// The signature for the transition.
signature: Signature<N>,
/// The tag secret key.
sk_tag: Field<N>,
/// The transition view key.
tvk: Field<N>,
/// The transition secret key.
tsk: Scalar<N>,
/// The transition commitment.
tcm: Field<N>,
}
Inputs and outputs as public inputs. Each input to the function is committed to and exposed as a public input. The same is done with any output created by the function.
Call to finalize. Finally, if the function calls a finalize function, a hash of all of the inputs to the finalize function is also produced. These values will also be sent in the clear in the Transition
object below (since the network needs them to run the finalize function in the clear).
Encoding nested function calls
A large part of the complexity of the synthesizer comes from allowing functions to call other programs’ functions. This is akin to smart contracts calling other smart contracts in Ethereum.
In Aleo, functions are translated into circuits, and so a function call means that two circuits are created: one for the caller and one for the callee. To drive the point home, if a function ends up producing function calls, then there’ll be circuits that will be run (and proofs will be created when executing the function).
In practice, the execution of a “root” function is encoded as a list of “transitions”, where each transition represents the execution of a single function call. Transitions are ordered from most-nested calls to least-nested calls. This means that the last transition is the main (or root) function being called by the user.
pub struct Transition<N: Network> {
/// The transition ID.
id: N::TransitionID,
/// The program ID.
program_id: ProgramID<N>,
/// The function name.
function_name: Identifier<N>,
/// The transition inputs.
inputs: Vec<Input<N>>,
/// The transition outputs.
outputs: Vec<Output<N>>,
/// The inputs for finalize.
finalize: Option<Vec<Value<N>>>,
/// The transition public key.
tpk: Group<N>,
/// The transition commitment.
tcm: Field<N>,
}
In a function’s synthesized circuit, a call to an external function is replaced by witnessing (publicly) the arguments of the function call, and then by witnessing (publicly) the outputs of the resulting call.
It is thus the role of the verifier to “glue” together the different function calls (or transitions) by ensuring that the publicly-witnessed arguments are used to verify the callee’s circuit, and that the publicly-witnessed outputs are indeed the values output by the callee’s circuit.
Note that we also don’t want to leak inputs and outputs between calls, and so inputs and outputs are committed before being exposed as public inputs. In Aleo the hiding commitments are referred to as input IDs and output IDs.
Program deployment flow
Users can deploy their own Aleo programs which will be uploaded in their totality to the Aleo blockchain. This means that the program’s code (made out of Aleo instructions) will be stored on-chain.
In addition, a deployment has the user (who wishes to deploy their program) produce as many verifier keys as there are functions. The verifier keys are then deployed on-chain.
pub struct Deployment<N: Network> {
/// The edition.
edition: u16,
/// The program.
program: Program<N>,
/// The mapping of function names to their verifying key and certificate.
verifying_keys: Vec<(Identifier<N>, (VerifyingKey<N>, Certificate<N>))>,
}
Since verifier keys are expensive to produce (they consist of a number of large MSMs which commit to R1CS-related structures), the user produces them and then a proof of correctness (the Certificate
above).
The proof is basically a Sigma protocol that the commitments encode low-degree extensions of R1CS-related vectors, by evaluating the low-degree extensions at a random point.
Function execution flow
To execute a function, a user uses the synthesizer in a similar way as the deployment process. As such, the user will produce transition proofs if they wish to execute a function call that triggers nested calls. The synthesizer is run with the actual values (as opposed to random values) for the inputs as well as the different registers in the circuits.
In addition to producing these proofs, a user also produces “inclusion proofs”. These inclusion proofs are used to prove that any record being used as input in the transition or function call indeed exists. Inclusion proofs prove that records exist either in some previous block that has been included in the blockchain, or in one of the previous transition outputs.
An inclusion proof also publicly outputs the serial numbers (also called a nullifier in Zcash-like systems) that uniquely identify the records without leaking any information about them. This way, records cannot be consumed more than once. (In addition, the network enforces that no serial number is seen twice within the same transaction.)
Note that all of these different proofs are eventually aggregated together into a single proof using Varuna’s batching capabilities.
Review of Mitigations
After the original audit, ZKSecurity was subsequently contracted for one man-week to review the mitigations of Aleo to the findings in this report. In this section we provide a description of the mitigations implemented by the Aleo team, as well as arguments for why they are sufficient.
Fixed: Non-Committing Encryption in Inputs/Outputs
During the audit an issue with the binding of values across transition/function call boundaries was uncovered,
namely, the hash of a counter-mode encryption was used as a cryptographic commitment, without committing to the encryption key.
This was successfully mitigated by Aleo via exposing of the tcm
(“transition commitment”) of the callee as public input on the callers side.
This ensures that the symmetric encryption, of both inputs and outputs, is now committing:
- The exposed
tcm
is a commitment totvk
, from which the encryption key is derived. - The encryption is derandomized (the nonce is a monotonic counter, enforced in circuit).
In simplified terms, the commitment is now:
(Com(k), Hash(PRG(k) + m))
Which is an injective function from (k, m)
to commitments; assuming collision intractability of Com(.)
and Hash(.)
.
The correspondence between tcm
exposed as public input of the callee/caller is checked during verify_execution
:
the mitigation correctly adds the tcm
of the child call (callee) to the public input of the parent (caller).
Fixed: Incorrect Fiat-Shamir Transform in Circuit Certification
During the audit a vulnerability in the circuit certification proof was uncovered:
a small oversight meant that the circuit_id
included in the verification was not added to the Fiat-Shamir transcript,
meaning a malicious prover to create proofs of invalid certifications.
Aleo successfully mitigated this issue by hashing the entire verification key, rather than just the commitments.
The relevant parts of the new code are:
#[derive(Debug, Clone, PartialEq, Eq, CanonicalSerialize, CanonicalDeserialize)]
pub struct CircuitVerifyingKey<E: PairingEngine> {
pub circuit_info: CircuitInfo,
pub circuit_commitments: Vec<sonic_pc::Commitment<E>>,
pub id: CircuitId,
}
...
fn init_sponge_for_certificate(
fs_parameters: &FS::Parameters,
verifying_key: &CircuitVerifyingKey<E>,
) -> Result<FS> {
let mut sponge = FS::new_with_parameters(fs_parameters);
sponge.absorb_bytes(&to_bytes_le![&Self::PROTOCOL_NAME].unwrap());
sponge.absorb_bytes(&verifying_key.circuit_info.to_bytes_le()?);
sponge.absorb_native_field_elements(&verifying_key.circuit_commitments);
sponge.absorb_bytes(&verifying_key.id.0);
Ok(sponge)
}
Fixed: Proof Delegation is Subject to Truncation
During the audit an issue was uncovered in the proof delegation, enabling a malicious prover to prove another execution that the one intended by the client: the call stack of an execution could be truncated to avoid executing the parent calls.
This issue is mitigated by Aleo with the introduction of a Boolean is_root
field into the signed requests.
Only the root call of the execution is marked with is_root = True
and all internal/child calls are marked with is_root = False
.
The is_root
field of requests is exposed as public input when verifying the SNARK,
this enables the verifier to enforce that the root call of the execution indeed corresponds to a signed request with is_root = True
.
Mitigating the truncation issue: where a child transition was treated as the root by a malicious prover.
During the review, ZKS verified more broadly that the signed requests can only produce state transitions intended by the signer. The issue of truncation was a special case of such an issue: where a child request could be executed without the parent call. Below we outline the security argument:
In essence, a request signs the arguments of a function call within the SnarkVM.
This authorizes the function call, ensuring that self.caller
did indeed execute this call, while allowing for delegation of the actual proof computation.
The resulting function execution is referred to as the transition.
These requests/transition are forced to be unique globally by the network, which enforces that the transition IDs are globally unique:
a transition ID uniquely determines a request assuming the collision intractability of Poseidon.
This is because a transition ID is computed as a hash (Merkle tree) over the InputID’s of the request and the tcm,
these are (randomized) collision intractable commitments and their values/randomness is explicitly signed by the request.
Since each request is signed separately two requests with the same set of inputs can be successfully switched during proving by a malicious prover. Hence even though the authorization of individual transitions is immediate, the integrity of the entire execution is slightly more involved: the witness is not (computationally) unique.
The central observation in the security argument is that:
the arguments of the request marked with is_root = True
uniquely determines the arguments of all internal calls.
This is easily seen by induction over the call graph, noting that the output of a function in the SnarkVM is uniquely determined by its arguments.
Hence unforgability of the root request (marked with is_root = True
) is sufficient to ensure soundness of the entire execution:
all arguments of the children follow (deterministically) from the arguments (inputs) of the root call (request) and hence are implicitly signed.
In particular the unforgability of the internal requests is not central to the security argument:
one may consider the internal requests as being provided solely to enable an honest prover to successfully compute the proof;
they are needed for “completeness” since any function might be the root of the execution.
Note that the argument above crucially relies on:
-
The root request being clearly demarcated from the internal/child requests (the truncation issue). \ Otherwise, the client effectively signs more than one possible execution. Since there is now only one request in an execution marked with
is_root = True
, this is avoided. -
Replay protection at the request/transition level: \ To argue that an execution can only be applied once you can reduce it to replay protection on the root request.
-
Transitions in the SnarkVM being functions (in the mathematical sense): \ Meaning the return value and finalise arguments are uniquely determined by the arguments. \ This is non-trivial in general: if for instance, a request was allowed to “witness” the value of a variable in the function scope, then it would need to be ensured that this value is (computationally) unique given the arguments.
ZKSecurity has verified that these preconditions are satisfied in the SnarkVM codebase after applying the mitigations detailed in this report.
Note that self.caller
not being fixed across the execution (another finding in this report) violated the 3rd condition above and would invalidate the security argument:
self.caller
is essentially an implicit argument to all functions and
the reasoning above therefore requires that self.caller
is uniquely determined by the root request.
This was fixed by https://github.com/AleoHQ/snarkVM/pull/2076/ and is covered further below.
Fixed: Caller is Not Fixed Throughout Function Execution
ZKSecurity observed that the self.caller
might change at every function call in an execution: each request is signed individually and might therefore have distinct signers.
ZKSecurity cautioned that this subtlety might lead to logic bugs and furthermore it violates the security argument above:
allowing the prover to change the self.caller
of a call with no records during delegation.
This issue was successfully mitigated by Aleo with the introduction of a “signer commitment” (scm
), exposed as public input from every transition proof.
The commitment is opened in every transition proof and checked against the signer:
let scm = A::hash_psd2(&[self.signer.to_field(), root_tvk]);
...
scm.is_equal(&self.scm)
The randomness for the signer commitment, root_tvk
, which needs to be the same across all calls, is the viewing key of the root transition.
The mitigation does not enforce this relation (not even in the root) and does not need to:
it sound relying solely on the binding of the Poseidon commitment.
During verification, equality is enforced between the signer commitments of all transitions:
// Ensure the same signer is used for all transitions.
execution.transitions().try_fold(None, |signer, transition| {
Ok(match signer {
None => Some(transition.scm()),
Some(signer) => {
ensure!(
signer == transition.scm(),
"The transitions did not use the same signer"
);
Some(signer)
}
})
})?;
Fixed: Proof Delegation Leaks User Signing Key
ZKSecurity observed that the randomness used to create the Chaum-Pedersen signature, the transition secret key (tsk
), was included in the requests.
This would result in leaking the signing secret key whenever delegation was used.
Aleo has mitigated this issue by simply removing tsk
from Request<A: Aleo>
along with the explicit checks for the correct relationship between tsk
and tvk
/tpk
:
tvk = [tsk] * self.signer
and tpk = [tsk] * G
.
The omission of these checks is not an issue: the Chaum-Pedersen proof in the request already ensures this relation
and therefore the additional in-SNARK checks on tpk
and tvk
were redundant.
Fixed: Ambiguous Encoding in Varuna Fiat-Shamir Transcript
ZKSecurity had an informational finding about the ambiguity of the Veruna transcript. There was no concrete vulnerability discovered, however Aleo has added explicit length checks as a defense in depth measure.