Introduction
Between June 1st and June 14th, zkSecurity was tasked by Provable to audit the upcoming “Program Upgradability” update for Aleo.
Two consultants worked over two weeks to review the codebase for potential security issues and provide design feedback and recommendations.
Prior to the engagement, the team spent two weeks getting acquainted with the SnarkVM codebase and the Aleo ecosystem. In the last several days of the audit, we also verified the fixes for all findings reported in this assessment.
Aleo Program Upgradability
Prior to this upgrade all Aleo programs were immutable:
a program was deployed once and could not be modified or upgraded hereafter.
With the upcoming “Program Upgradability” update, programs may now change after initial deployment.
This has interesting implications throughout the system which may have relied on the immutability of programs up until this point,
we explore these and note a number of security implications/considerations with the proposed design and implementation (prior to the release).
Glossary
For the reader’s convenience, we include a brief glossary of central terms used within the Aleo SnarkVM:
- Program : Collection of functions, mappings, records, closures. Called a “contract” in other systems.
- Program ID : Unique program identifier, composed of a name (e.g.
example.aleo
) and a network identifier.
- Transition : Call to a single function in a program.
- Execution : Sequence of transitions, for the root call and any internal calls.
- Deployment : Deployment of a new program or (after this update) an upgrade of an existing program.
- Transaction : Execution or a deployment.
- Constructor (new) : Function run during deployment; restricts upgradability of the program.
Constructors
This upgrade introduces the ability to upgrade programs on-chain, by redeploying them.
Every time a program is upgraded, the edition
of the program must increment by one, prior to the update,
the edition was not exposed to the snarkVM and internally fixed to zero.
When and under which conditions a program can be upgraded is controlled by a method added to all newly deployed programs called “the constructor”.
For instance, the following constructor disallows any upgrades, by requiring that the edition
of the new program to be zero:
program example.aleo;
constructor:
assert.eq example.aleo/edition 0u16;
Note that the constructor is also run during the initial deployment of the program,
and that the constructor above can only be satisfied during the initial deployment.
As a result, note that e.g.
program example.aleo;
constructor:
assert.eq false true;
Is an undeployable program.
The following constructor requires that program and any upgrades are deployed by a specific address:
program example.aleo;
constructor:
assert.eq example.aleo/program_owner <ADDRESS>;
Constructors have access to the mappings of a program and hence the rules
for upgrading a program can be controlled dynamically by manipulating the mappings using the other functions in the program.
However, the constructor itself is immutable and cannot be modified or upgraded.
All legacy programs, which do not have a constructor are immutable.
Permissible Upgrades
Upgrades are only allowed to expand or leave unchanged the interface of a program,
e.g. by only adding new functions or new mappings – which can be read externally.
This is important to avoid breaking any dependent program which call methods of the upgraded program: such programs would not “type” after the upgrade,
referencing e.g. functions which no longer exist.
Note however that functions can be “functionally” deleted by making them trivially unsatisfiable,
and may otherwise change their behavior in arbitrary ways and thus there is no
guarantee that the dependent program will remain satisfiable after the upgrade.
Program Owner
The program owner is the address which deployed (the latest edition of) a program.
Depending on the constructor logic, this party may have special privileges and
the program owner is used in the constructor to identify the party deploying the upgrade,
allowing the constructor to check if the party is eligible to deploy the program.
Cryptographically, the program owner is bound to the deployment by signing the “deployment id”
which is meant to uniquely identify the program being deployed in the transition.
Below are listed the findings found during the engagement. High severity findings can be seen as
so-called
"priority 0" issues that need fixing (potentially urgently). Medium severity findings are most often
serious
findings that have less impact (or are harder to exploit) than high-severity findings. Low severity
findings
are most often exploitable in contrived scenarios, if at all, but still warrant reflection. Findings
marked
as informational are general comments that did not fit any of the other criteria.
During node bootup, programs are loaded in order of block height. However, within a single block, the load order of multiple programs is not stable. This instability can cause loading failures and stall node bootup.
/// Initializes the VM from storage.
#[inline]
pub fn from(store: ConsensusStore<N, C>) -> Result<Self> {
[...]
// Retrieve the list of deployment transaction IDs and their associated block heights.
let deployment_ids = transaction_store.deployment_transaction_ids().collect::<Vec<_>>();
let mut deployment_ids = cfg_into_iter!(deployment_ids)
.map(|transaction_id| {
// Retrieve the height.
let height =
match block_store.find_block_hash(&transaction_id)?.map(|hash| block_store.get_block_height(&hash))
{
Some(Ok(Some(height))) => height,
_ => {
bail!("Block height for deployment transaction '{transaction_id}' is not found in storage.")
}
};
Ok((transaction_id, height))
})
.collect::<Result<Vec<_>>>()?;
// Sort the deployment transaction IDs by their block heights.
deployment_ids.sort_unstable_by(|(_, a), (_, b)| a.cmp(b));
// Load the deployments in order of their block heights.
const PARALLELIZATION_FACTOR: usize = 256;
for (i, chunk) in deployment_ids.chunks(PARALLELIZATION_FACTOR).enumerate() {
// Load the deployments.
let deployments = cfg_iter!(chunk)
.map(|(transaction_id, _)| {
// Retrieve the deployment from the transaction ID.
match transaction_store.get_deployment(transaction_id)? {
Some(deployment) => Ok(deployment),
None => bail!("Deployment transaction '{transaction_id}' is not found in storage."),
}
})
.collect::<Result<Vec<_>>>()?;
// Add the deployments to the process.
// Note: This iterator must be serial, to ensure deployments are loaded in the order of their dependencies.
deployments.iter().try_for_each(|deployment| process.load_deployment(deployment))?;
}
[...]
}
SnarkVM enforces restrictions on finalize_cost
and number_of_calls
for programs, which are checked during program initialization. If Program B imports and calls Program A, an upgrade to Program A may cause Program B’s functions to exceed these restrictions. This is not an issue during deployment execution, since Program B is not re-checked after Program A is upgraded. However, during node bootup, every program is re-checked, and because the program load order within a block is not stable, Program B may be loaded after Program A’s upgrade. This can trigger a restriction check failure and prevent the node from booting.
Example sequence:
- Block 1: Deploy Program A.
- Block 2: Deploy Program B (which imports A) and upgrade Program A, increasing its calls or finalize instructions.
- During node bootup, when loading programs in Block 2, if Program B is loaded after Program A’s upgrade, the restriction check fails and node bootup is stalled.
Recommendation
It is recommended to load programs in the order of their transaction index within each block during node bootup to ensure a stable and deterministic load order.
Client Response
The client fixed this by sorting the deployment transaction according to the block height and transaction index.
let mut deployment_ids = cfg_into_iter!(deployment_ids)
.map(|transaction_id| {
// Retrieve the block hash for the deployment transaction ID.
let Some(hash) = block_store.find_block_hash(&transaction_id)? else {
bail!("Deployment transaction '{transaction_id}' is not found in storage.")
};
// Retrieve the height.
let Some(height) = block_store.get_block_height(&hash)? else {
bail!("Block height for deployment transaction '{transaction_id}' is not found in storage.")
};
// Get the corresponding block's transactions.
let Some(transactions) = block_store.get_block_transactions(&hash)? else {
bail!("Transactions for deployment transaction '{hash}' is not found in storage.")
};
// Find the index of the deployment transaction ID in the block's transactions.
let Some(index) = transactions.transactions().get_index_of(transaction_id.deref()) else {
bail!("Transaction for deployment transaction '{transaction_id}' is not found in storage.")
};
Ok((transaction_id, (height, index)))
})
.collect::<Result<Vec<_>>>()?;
When a program is deployed, the deployment structure contains an edition
field that tracks the version number:
Deployment {
edition: 0,
program: PROGRAM_A,
verification_keys: VKS_A,
program_checksum: CHECKSUM_A,
}
The program owner signs this deployment, and subsequent upgrades increment the edition number.
However, since the edition field is not included in the signature, an attacker can:
- Take an old deployment with its valid signature.
- Modify the edition number to be higher than the current version.
- Redeploy the old program version with the manipulated edition number.
Attack
Consider this sequence of events:
- Initial deployment (edition 0) with
PROGRAM_A
- signed by the program owner
- Upgrade deployment (edition 1) with
PROGRAM_B
- signed by the program owner
- Attacker takes the old deployment, changes edition to 2, and redeploys
PROGRAM_A
This works, because the edition field is not included in the signature,
and results in a potentially unauthorized (as defined by the constructor) rollback to older version.
Note that the rollback must satisfy the conditions in check_upgrade_is_valid
which means that the old version of the program PROGRAM_A
must have the same interface as the new version PROGRAM_B
; for instance, PROGRAM_B
might be an updated version of PROGRAM_A
which includes a bug fix, but otherwise has the same functionality.
In the case where a program is only deployed once, an attacker can still cause a denial of service by redeploying the program with the maximum edition number u16::MAX
, this makes the program unupgradable regardless of the conditions in the constructor.
Recommendation
Include the edition field in the program owner’s signature (or add it to the deployment id).
We recommend making the deployment id dependent on the contents of the whole deployment to avoid any possible mallability issues.
Client Response
The fix implemented by Provable changes the computation of the deployment_tree
(of which the deployment_id
is the root) into:
pub fn deployment_tree_v2(deployment: &Deployment<N>) -> Result<DeploymentTree<N>> {
// Ensure the number of leaves is within the Merkle tree size.
Self::check_deployment_size(deployment)?;
// Compute a hash of the deployment bytes.
let deployment_hash = N::hash_sha3_256(&to_bits_le!(deployment.to_bytes_le()?))?;
// Prepare the header for the hash.
let header = to_bits_le![deployment.version()? as u8, deployment_hash];
// Prepare the leaves.
let leaves = deployment.program().functions().values().enumerate().map(|(index, function)| {
// Construct the transaction leaf.
Ok(TransactionLeaf::new_deployment(
u16::try_from(index)?,
N::hash_bhp1024(&to_bits_le![header, function.to_bytes_le()?])?,
)
.to_bits_le())
});
// Compute the deployment tree.
N::merkle_tree_bhp::<TRANSACTION_DEPTH>(&leaves.collect::<Result<Vec<_>>>()?)
}
Meaning every leaf in the tree (function), is bound to the hash deployment_hash
of the entire deployment,
which includes the edition
(and the verification keys as well).
This means that the owner signature is computed over the whole deployment as recommended.
The new operands Operand::Edition
and Operand::Checksum
are designed to retrieve the edition and checksum of a given program. Currently, they are valid in both the function scope (off-chain execution) and the finalize scope (on-chain execution). Since the edition and checksum of a program can change after an upgrade, these operands are expected to always provide the latest values. However, in the function scope, they are assigned as constants in the circuit:
match operand {
// If the operand is the checksum, retrieve the checksum from the stack.
Operand::Checksum(program_id) => {
let checksum = match program_id {
Some(program_id) => *self.get_external_stack(program_id)?.program_checksum(),
None => *self.program_checksum(),
};
Ok(circuit::Value::Plaintext(circuit::Plaintext::from(checksum.map(circuit::U8::constant))))
}
// If the operand is the edition, retrieve the edition from the stack.
Operand::Edition(program_id) => {
let edition = match program_id {
Some(program_id) => *self.get_external_stack(program_id)?.program_edition(),
None => *self.program_edition(),
};
Ok(circuit::Value::Plaintext(circuit::Plaintext::from(circuit::Literal::U16(
circuit::U16::new(circuit::Mode::Constant, edition),
))))
}
}
The verifying key of the circuit is fixed at program deployment. As a result, in the function scope, the edition and checksum values are also fixed. Even if another program is upgraded, these values remain unchanged, so the operands may return stale information. For example:
- Deploy program
foo.aleo
, which retrieves the edition of bar.aleo
using the Operand::Edition
operand. The current edition of bar.aleo
is 0
.
- Upgrade
bar.aleo
, increasing its edition to 1
.
- Call
foo.aleo
to get the edition of bar.aleo
. It still returns 0
, which is now outdated.
Recommendation
It is recommended to disallow the use of Operand::Edition
and Operand::Checksum
in the function scope to prevent returning stale values.
Client Response
The client opted to remove both the Operand::Edition
and Operand::Checksum
from the set of allowed operands for in-circuit Aleo instructions.
They remain accessible from “finalize”, which can also be used to access them from within the circuit, should the user wish to:
by witnessing these values in circuit and returning them to the finalize, which then ensures that the values exported from the function agree with Operand::Edition
and Operand::Checksum
as obtained in finalize.
Aleo allows the delegation of SNARK computation to a third-party by creating
“requests” which is subsequently proved by a third-party.
These are signatures on the inputs/outputs of every function called (“transition”) during the execution of the transaction.
Prior to the upgradability update, all programs were immutable meaning the transitions a request are guaranteed
to lead to the execution of a specific and static set of instructions.
With the upgradability update, programs can now be upgraded,
since a request only signs the inputs/outputs of the transitions it calls and not the program itself,
this means that there is no binding between a request and the current version of a program being invoked,
or any of its dependencies.
This means that semantics of a request can change between the point of creation (when the user signs the request)
and the point of execution (when the request is proved).
This is most obvious when the callgraph remains the same but the functions in the callgraph are upgraded,
however, because there is no explicit binding between a parent transition (with is_root = True
)
and its child transitions (from functions invoked by the parent transition),
a malicious prover could theoretically “stitch together” a request which proves the execution of a newer version of the root program,
e.g. the root program contains a function of the form:
import foo.aleo;
program bar.aleo;
function root:
...
call foo.aleo/sub r0 into r1;
...
With one call from bar.aleo/root
to foo.aleo/sub
.
The user creates two transactions, calling bar.aleo/root
, this includes two transactions invoking foo.aleo/sub
.
The foo.aleo
program is now upgraded to a newer version:
import foo.aleo;
program bar.aleo;
function root:
...
call foo.aleo/sub r0 into r1;
call foo.aleo/sub r2 into r3;
...
With two calls from bar.aleo/root
to foo.aleo/sub
.
Note that this upgrade is allowed as the interfaces of both foo.aleo
and bar.aleo
remain unchanged.
However, the malicious prover can still “stitch together” a request which proves the execution of a newer version of the root program
assuming the two original calls to foo.aleo/sub
have the same inputs as the calls in the new version of foo.aleo/root
.
Recommendation
A request should be bound to the (versions of) all programs in the callgraph:
by including a hash of all the checksums into the signature, ensuring that if any of the programs involved in the transaction changes, the request becomes invalid.
Additionally, the monotonically increasing “edition” of all the programs should be included in the hash as well,
such that an invalid request cannot become valid again at a later time due to a program rolling back to an older version.
Observe that since the request is verified in-circuit,
this requires feeding the hash as public input to the circuit.
For security, this hash is only required to be included for the root transition.
We believe that this is the most straightforward semantics for the user to reason about.
Client Response
The chosen mitigation is different from the suggested mitigation above, but successfully mitigates the issue as well.
The signature of every request is computed over a message which includes:
- The checksum of the program to which the called function belongs.
- The
root_tvk
(the root transition view key).
The checksum is exposed from the SNARK as public input directly,
where as the root_tvk
is exposed from the SNARK as public input indirectly via the scm
which is a commitment to the root_tvk
:
let root_tvk = root_tvk.unwrap_or(tvk);
let scm = N::hash_psd2(&[signer.deref().to_x_coordinate(), root_tvk])?;
This means that the root_tvk
acts as a per-authorization (consisting of multiple transitions/requests) nonce.
This serves to bind each request to a unique authorization, this prevents “cut-and-paste” attacks:
where a malicious prover constructs a new valid authorization from requests in multiple different authorizations.
Observe, that this alone does not prevent “cut” attacks, where a malicious prover might try to simply remove requests from an authorization.
The overall security argument is then fairly straightforward:
- The checksum of a request uniquely identifies the program and its version.
- Since the Aleo VM is deterministic, the set of child calls is uniquely determined by:
- The checksum of the parent
- The arguments to the parent
Applying this observation inductively over the callgraph from the root,
we conclude that the execution is uniquely identified by the set of checksums.
Finally, the set of checksums cannot be maulled across different authorizations because the checksum
of each request is bound to the authorization via a unique nonce, the root_tvk
,
which separates the domain of signatures across different authorizations.
Since the set of calls is uniquely determined,
removing any request from an honestly produced authorization,
which will have exactly one signed request per call in the callgraph, as determined by the arguments to the root function and the set of checksums,
would result in a call with a missing request.
For deployments, in the snarkVM’s constructor finalization process the transition ID from the fee transition is used to seed the ChaCha random number generator. Since there is no cryptographic binding between the deployment transition (in particular, the program_owner
signature)
and the fee transition paying for the deployment,
an attacker can manipulate the “randomness” produced by chacha.rand
inside the constructor finalize
by creating a new transaction with a different fee transitions while reusing the same deployment.
During program deployment, when a constructor exists, the system executes:
if deployment.program().contains_constructor() {
let operations = finalize_constructor(state, store, &stack, *fee.transition_id())?;
finalize_operations.extend(operations);
lap!(timer, "Execute the constructor");
}
The fee.transition_id()
is passed to the constructor finalization process and subsequently used in the ChaCha random number generator’s seed computation. The seed preimage includes the transition ID as a key component:
let preimage = if (ConsensusVersion::V1..=ConsensusVersion::V2).contains(&consensus_version) {
to_bits_le![
registers.state().random_seed(),
**registers.transition_id(), // This comes from fee.transition_id()
stack.program_id(),
registers.function_name(),
self.destination.locator(),
self.destination_type.type_id(),
seeds
]
} else {
// Similar structure with additional nonce field
}
Attack
An attacker can exploit this vulnerability through the following steps:
- Create Initial Deployment: The attacker creates a legitimate program deployment transaction with a constructor that uses
rand.chacha
operations.
- Extract Deployment Transition: The attacker extracts the deployment transition from the original transaction, leaving it completely unchanged.
- Generate New Fee Transitions: The attacker creates multiple new transactions, each containing:
- The same unchanged deployment transition
- A different fee transition with a different transition ID
- Grind ChaCha.Rand Outputs: By controlling the fee transition ID, the attacker can influence the ChaCha seed and potentially affect the execution of the constructor.
Recommendation
The constructor execution should be deterministic based upon:
- The new deployment.
- The current program state.
To achieve this, the constructor should use a deterministic seed that cannot be manipulated by changing fee transitions.
Two obvious solutions exists:
- Use Deployment ID: Replace
fee.transition_id()
with the deployment ID instead.
- Use Constant Seed: Use a constant transition ID for all deployment finalizations.
Client Response
The client decided to seed chacha.rand
using the default transition ID.