Introduction
On April 7th, 2025, NEAR requested zkSecurity to perform a security assessment of the Rust p256
crate (https://crates.io/crates/p256). No major issues were found and the codebase was found to be thoroughly tested and well-architected.
In-Scope Components
The scope of the audit included:
- Elliptic Curve. The p256 elliptic curve group operations, including (de)serialization and exposed functionalities relevant to NEAR.
- ECDSA. ECDSA-related code, focused on signature verification (as it is NEAR’s main use case).
- Field Arithmetic. The internal scalar and base field implementations for the curve, including dependencies on the
crypto-bigint
crate (https://github.com/RustCrypto/crypto-bigint).
Out-of-Scope Components
As the usage mainly focused on signature verification on 32-bit machine, the following items were not the focus of this audit:
- Signature generation. The audit focused on signature verification.
- Constant-time code. Due to the previous point, while we did look at the constant-time code to ensure correctness of implementation, we did not focus on ensuring that the code was indeed constant-time (as signature verification does not involve secret data).
- 64-bit specific code. This is important as many algorithms are reimplemented for 32-bit and 64-bit machines, and thus the absence of bugs in the 32-bit implementation might not necessarily mean that the 64-bit implementation is bug-free.
- Unrelated primitives. We did not look at ECDH or H implementations,TasCthey are not used in NEAR.
Code Base Reviewed
Noteworthy: the audit focused on the last commit of the p256
crate which was 32343a78f1522aa5bd856556f114053d4bb938e0
at the time of the audit and integrated changes like the move from generic-array
to hybrid-array
(https://github.com/RustCrypto/elliptic-curves/pull/1011) not used in NEAR’s code. We expect NEAR to upgrade to newer releases as they become available.
Reference Integrations Provided by NEAR
In addition, we were provided the following PRs as example usage of NEAR:
- usage of the crate example - https://github.com/near/intents/pull/43/files
- example of other precompiles implemented in nearcore - https://github.com/near/nearcore/pull/9317
Overview Of The p256 Crate
The p256
crate implements the NIST P-256 elliptic curve along with a number of cryptographic schemes that make use of P-256. The P-256 curve is a widely used elliptic curve defined over a prime field and with a prime order. We define the curve later in this document.
Organization Of The Code
The p256
crate is part of the elliptic-curves repository and provides high-level APIs to access primitives (like ECDSA and ECDH) built on top of P-256 as well as low-level P-256 functionalities. The crate includes local code but also builds on top of many other dependencies from the same ecosystem of RustCrypto crates:
elliptic-curve
(https://github.com/RustCrypto/traits): it provides the basicCurve
trait and thePublicKey
struct.crypto-bigint
(https://github.com/RustCrypto/crypto-bigint): it provides big integer implementations based on 32-bit or 64-bit machine words.hybrid-array
(https://github.com/RustCrypto/hybrid-array): it provides theArray
type with const generics.primeorder
(https://github.com/RustCrypto/elliptic-curves): it implements the affine/projective point struct and point arithmetic (add
anddouble
operations). This crate provides some macros (e.g.,impl_mont_field_element
) to defineField
and implement field arithmetic. Thep256
crate does not use macros for performance reasons.ff
(https://github.com/zkcrypto/ff): some field arithmetic operations are pulled from this crate. For example, the square root functions.
The signature primitives rely on:
ecdsa
(https://github.com/RustCrypto/signatures): it provides the core structs (VerifyingKey
andSignature
) and functions (verify_prehashed
) for signature verification in ECDSA.signature
(https://github.com/RustCrypto/traits): it defines the basic traits (Verifier
andPrehashVerifier
) for signature verification.sec1
(https://github.com/RustCrypto/formats): it implements the SEC1 (https://www.secg.org/sec1-v2.pdf) encoding format.
Note that, as mentioned in the introduction, we did not look into the generic-array crate (https://github.com/fizyk20/generic-array) as it is not used anymore in the latest commit of the p256
crate. Furthermore, while we looked at parts of the subtle crate (https://github.com/dalek-cryptography/subtle) to understand its logic, we did not focus on ensuring that it indeed provides constant-time operations, as this was not the focus of the audit.
The P-256 Curve
The P-256 elliptic curve is a widely used elliptic curve defined over a prime field, and with a prime order. It is known under different names and specified in different standards: P-256 in NIST’s SP 800-186 (https://csrc.nist.gov/pubs/sp/800/186/final), secp256r1 in SECG’s SEC2v1 (https://www.secg.org/SEC2-Ver-1.0.pdf), prime256v1 in ANSI X9.62 (https://www.x9.org/).
The curve is defined by the short Weierstrass equation:
over the finite field , where:
The following is an implementation of the P-256 curve in SageMath:
p = 0xffffffff00000001000000000000000000000000ffffffffffffffffffffffff
a = 0xffffffff00000001000000000000000000000000fffffffffffffffffffffffc
b = 0x5ac635d8aa3a93e7b3ebbd55769886bc651d06b0cc53b0f63bce3c3e27d2604b
G = (0x6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296, 0x4fe342e2fe1a7f9b8ee7eb4a7c0f9e162bce33576b315ececbb6406837bf51f5)
n = 0xffffffff00000000ffffffffffffffffbce6faada7179e84f3b9cac2fc632551
h = 0x1
# Create the base field
Fp = FiniteField(p)
# Define the elliptic curve E: y^2 = x^3 + ax + b over Fp
E = EllipticCurve(Fp, [a, b])
# Define the generator point G
G = E(G[0], G[1])
Some of these properties have implications over the performance and security for the implementations.
Optimize multiplication operations. The constant coefficient allows performance optimization for point addition and doubling by avoiding multiplication operations related to the constant .
Optimize modular arithmetic. The sparse structure of the prime field can be represented as 32-bit limbs in little-endian format:
[0xffffffff, 0xffffffff, 0xffffffff, 0x00000000, 0x00000000, 0x00000000, 0x00000001, 0xffffffff]
These simplified values in each limb allow for efficient modular arithmetic using bitwise operations during the reduction process.
Avoid small subgroup attack. When the cofactor h
is 1, all points on the curve belong to a single subgroup of order . As there are no small subgroups, extra checks against small subgroup attacks are unnecessary.
BigInt Implementation Overview
In order to take advantage of the architecture’s native word size, the crypto-bigint
crate uses a representation of integers based on machine words. This allows for efficient arithmetic operations on large integers.
To implement this, the crate defines the type Word
(and wrapper type Limb(Word)
as u32
or u64
depending on the Rust feature flag target_pointer_width
.
A Uint<LIMBS>
type is then defined generically as a fixed-size array of Limb
s. For example, the p256 code will make use of Uint<8>
(resp. Uint<4>
) to represent 256-bit integers on 32-bit (resp. 64-bit) architectures.
Furthermore, limbs are ordered in a little-endian way, meaning the least-significant limb comes first (on the left). This allows for efficient serialization and deserialization of integers, as well as efficient arithmetic operations, without the need to reorder limbs on little-endian architecture (which makes up the majority of modern systems).
For example, any u8
value n
can be represented as [n, 0, 0, 0, 0, 0, 0, 0]
on a 32-bit architecture.
Field Implementation
As seen above, the P-256 curve is defined over a prime field that we call the base field (implemented in the field
submodule). The order of the curve (i.e. the number of points) is prime, producing a field we call the scalar field (implemented in the scalar
submodule).
Both fields are of size 256-bit, and have similar implementations relying on the same U256 = Uint<8>
type (on a 32-bit machine).
A notable difference is that:
- the base field implementation is based on the Montgomery reduction
- the scalar field implementation is based on Barrett reduction
As Montgomery reduction requires converting inputs to Montgomery form (and then back to a canonical form), it is generally best suited for applications that perform many multiplications in a row. Barrett reduction, on the other hand, is more efficient for applications that perform only a few multiplications. ECDSA signature verification follow the latter approach and thus Barrett reduction is used.
We also note that while the underlying implementation supports Karatsuba multiplication for larger integers (i.e. with at least 16 limbs), the P-256 implementation uses schoolbook multiplication in the integers.
In the next sections we explain the implementations based on Montgomery and Barrett reduction.
Montgomery Form and Montgomery Reduction
Montgomery reduction is explained at a low-level in section 14.3.2 on the Handbook of Applied Cryptography. Below, we provide a more understandable explanation of the algorithm and its implementation.
What Is The Montgomery Form?
For a modulus and an element modulo , we can shift them with some known Montgomery value in the following way:
This is the Montgomery form of a value !
Naturally, we can unshift to recover the value:
Montgomery reduction does exactly this: it multiplies a value with the inverse of and produces the result modulo . As such, it should be immediately clear that Montgomery reduction is at least useful to convert a value in Montgomery form back to its canonical form.
Note that for the inverse of modulo to exist we need and to be coprime. In other words, we need .
When we perform field operations (add and multiply) we constantly need to perform modular reductions (to reduce the result of an operation if it became larger than ). Modular reductions can get quite expensive. In fact, they are the most expensive part of handling field elements. For example, reducing a large number modulo a 256-bit prime involves multiple division and subtraction steps, which are computationally intensive.
On the other hand, it turns out that modular reductions in the Montgomery form can be faster to compute. For example, consider two elements in Montgomery form, and , each 256 bits. When we multiply them, the result is , a 512-bit value. Using Montgomery reduction, we can efficiently reduce modulo to obtain . This result, , is still in Montgomery form and ready for further operations.
Note that for addition things are much simpler, adding two values in Montgomery form still gets us a value in Montgomery form which can be easily reduced by subtracting if needed. As such, Montgomery form is useful mainly for optimizing modular reductions after multiplications.
It remains to show: why is modular reduction faster in Montgomery form, and how to perform that modular reduction.
Let’s tl;dr the first question first: why is it faster? This is because the Montgomery value is usually chosen as for big enough such that , which makes operations extremely straightforward and fast as we can use bitwise operations which are usually fast on hardware. Indeed, if you didn’t know that yet, a left shift <<
is a multiplication by 2 and a right shift >>
is a division by 2.
Next, we explain how the Montgomery reduction works.
How Does The Montgomery Reduction Work?
Notice that we would like to perform the following division by in the integer to simulate the multiplication with :
This will only work if is divisible by in the integers. As most likely it is not, we need to find a different element of the same residue class that is divisible by . That is, a such that is divisible by in the integers. Or in other words:
We can directly solve for :
Finally, if this all makes sense, you now understand that we can calculate modulo using the following formula in the integers:
This should give us a value that is equivalent to but not necessarily in a standard representative in .
In the bound analysis we will see that the above formula is in , producing either exactly (the “canonical representative”) or (which is easy to correct).
Here’s a quick way to see that in SageMath:
class Montgomery:
def __init__(self, modulus, R):
self.modulus = modulus
self.R = R
assert R > modulus
assert gcd(modulus, R) == 1
def to_montgomery(self, x):
return (x * self.R) % self.modulus
def from_montgomery(self, x):
R_inv = inverse_mod(self.R, self.modulus)
return (x * R_inv) % self.modulus
def redc(self, x):
assert x >= 0
assert x < self.modulus * self.R
# this can be precomputed
inv_mod_R = inverse_mod(-self.modulus, self.R)
# we find a representative that is divisible by R
k = (x * inv_mod_R) & (self.R - 1) # k = (x * -m^(-1)) % R
nominator = x + k * self.modulus
assert nominator % self.R == 0
# nominator / R
res = nominator >> self.R.log(2)
# we correct the representative we found
if res >= self.modulus:
return res - self.modulus
else:
return res
# examples
M = Montgomery(modulus=13 * 17, R=2^8) # we use an RSA modulus in this example
# (notice that it works for 0 and 1 as well which are special cases)
for x in range(0, 100):
assert M.redc(M.to_montgomery(x)) == x
Bound analysis
In Montgomery reduction, the is usually the multiplication result of two field elements. Thus, we assume . Then we have
This shows that the above formula is within the range of .
Word-by-Word Montgomery Reduction
In practice, Montgomery reduction is usually implemented in a word-by-word fashion. Rather than computing with large integers directly, the algorithm operates over fixed-size words—typically 32 or 64 bits wide—and processes them one at a time. This enables the use of fast hardware operations like word multiplication, and allows division to be replaced with bit shifts, which are significantly more efficient.
Let the word (or base) be (usually or ), and suppose the modulus is an -word integer. Define and assume . The goal is to compute the Montgomery reduction of a -word integer (we assume ), i.e., compute:
The idea is to reduce by one word at a time until it becomes a -word integer. Notice that
We can just divide by per round, and repeat round to produce the final . In each round, the limb size of will decrease by one, then after round will become a -word integer, which is close to the final target. Below is the process of the idea:
-
Step 0. Initialize .
-
Step 1. Compute , where . This ensures that the numerator is divisible by . Now, , and is a -word integer.
-
Step i (from 1 to n). Compute , where . This reduces the word size of by one in each step, ensuring .
-
Final Step. After iterations, is a -word integer equivalent to . If , perform a final subtraction: .
Note that the calculation of can be simplified. First, precompute . Then, is simply the least significant limb of , denoted as . Thus, can be computed as:
The overall algorithm proceeds as follows:
- For each step from to , reduce by one word:
- If , then .
- Output .
This algorithm is efficient because it avoids costly division operations. Instead, it uses word-sized multiplication and addition, which are highly optimized on modern CPUs, making it well-suited for large integers.
Barrett Reduction
Barrett reduction is an algorithm to reduce a value modulo . We have a value that’s in because a multiplication must have occured between two values modulo . We would like to thus reduce the value modulo .
Intuition on why it’s fast
Barrett Reduction is a way to do that efficiently, relying on the same principles of the Montgomery reduction algorithm:
- we do things with limbs of size the size of the machine words on which we’re executing the algorithm
- we do mostly division and multiplication and modular operations with an operand that’s a power of 2, because it’s easy to do on our binary machines ( translates to
x >> n
, isx << n
, isx & (2^n - 1)
).
To make things even more optimal, we also define limbs using a base corresponding to the machine word size (let’s say or ) and such that . In other words the smallest such that fits in or bits.
Main idea
We want to compute , which is the same as for some .
One technique to compute is to compute as:
And once you find compute . Barrett Reduction allows us to compute (or an approximation of as you will see) more efficiently than with the formula above.
First, let’s write the formula above as:
This allows us to precompute and to rewrite the above as:
So far our computation is exact. But instead of computing exactly, we’ll approximate it by computing:
Note: Notice that and are fast to compute with right shifts.
Since the two approximations can be smaller than their exact computations, we have that . The analysis section will show that .
Computing in
If we had computed exactly then what would be left for us to do would be to compute the result as:
and since we know that we also know that only the bitlength of is involved in that subtraction. As such we can also compute it faster by only involving the least significant bits:
(where modulo can be computed efficiently)
But remember, we have computed an approximation of , that is .
We know that either:
So what we do is that we compute 1 first, then if we don’t get something smaller than we subtract by once or twice to get there.
This means that we might not have right away. But instead a value or times larger.
Since we have that . This allows us to upperbound our approximation:
Then we can calculate in a more efficient way:
Again, here operation is very efficient on binary machine.
Analysis of the bound of Barrett reduction
Let’s analyze the bound of calculated regarding to the actual quotient , where
First, since is derived from a truncated approximation of , it naturally satisfies .
Let’s denote , where . Denote , where . Then we can remove the floor operation:
Then we can simplify :
We use to denote the red part (note that ). Then we have . By the floor function inequality , we have:
That is saying .
The bound of is the key to analyze the bound of . If we can prove that , then we will have and finally have . That’s exactly our goal. Let’s analyze the bound of .
Prove that
We have:
We know that (because is -limb) and . Then we have:
And thus:
That’s exactly what we want. Therefore, we conclude that .
Tighter bound: in practice
We have proved that under all circumstances. However, that is a loose bound. Here we claim that for almost all of the modulus in practice, we have a tighter bound: and thus have .
Let’s see what kind of will have that tighter bound. Remembering that , then holds if the following holds:
Which means:
Furtherly:
That’s saying if is in the range of , then .
We know that . Then is always in . In practice, is or . is close to . Therefore is very close to . That means the range fills most of the range .
Since behaves like a uniformly random number in [0, m) for a random modulus m, it’s highly likely that . As a result, for most modulus , we have and thus .
We assume that (which is the common case in practice) and is uniformly random. Then
- If , then probability is greater than .
- If , then probability is greater than .
That is, for nearly all moduli in practice, the tighter bound holds.
The bound for P-256 scalar field
We have proven that, given a modulus , if holds (where ), then the tighter bound holds.
For the P-256 scalar field, it’s not surprising that it satisfies for or . This immediately implies the tighter bound:
That is, the computed is at most 1 less than the actual quotient .
Therefore, the second sub
here and here in the Barrett reduction—which conditionally subtracts another copy of the modulus—is unnecessary and can be safely removed in this case.
Elliptic Curve Implementation Detail
The p256
module defines the basic properties of the curve shown in the P-256 section above, as well as the arithmetics for both base field and scalar field. The primeorder
module provides the curve arithmetic specific for the curve property .
Encoding Points
The primeorder
module uses the crate sec1 to define how the points on the curve should be encoded and decoded. Depending on the encoding tag, a point can be encoded in different formats: identity, compressed, uncompressed, and compact.
Identity (0x00) represents the point at infinity (or the zero point), which has no affine coordinates. The encoding only uses one byte, which is the tag 0x00
.
Compressed (0x02, 0x03) represents a point on the curve with only the x-coordinate provided. The y-coordinate is computed from the x-coordinate using the curve equation. Substituting the x-coordinate into the curve equation yields two possible y values, and resulted from the square root . Because the prime field is odd, either or is even, and the other is odd.
The tag 0x02
is used to choose even y-coordinates, while 0x03
is used to choose odd y-coordinates. Whether the y-coordinate is even or odd can be determined by taking .
The total length of the compressed point is 33 bytes, which includes the tag byte and the x-coordinate (32 bytes).
Uncompressed (0x04) represents a point on the curve with both and -coordinates provided. The tag is followed by both the -coordinate (32 bytes) and the -coordinate (32 bytes). The total length of the uncompressed point is 65 bytes.
Compact (0x05) is similar to the compressed tags, but it always choose the larger -coordinate resulted from the square root. This is not a encoding standard specified in SEC1 V2. But it is implemented in the crate dependency sec1.
Formulas In Affine Coordinates
To perform curve arithmetic we need to implement one main low-level operation: point addition, and sometimes a distinct point doubling operation as well. The latter is needed when the two points are identical because the vanilla formula for point addition would lead to a division by zero (unlike complete addition formulas when they are implemented).
When , the formula for adding two distinct points:
This formula can’t be applied to two identical points, because the denominator in will be zero.
Therefore, when , the formula should be based on the tangent line of the curve at the point , instead of the secant line between and :
In affine form, these addition methods are sufficient to compute any point resulted from the scalar multiplication of a point, , on the curve.
Projective Formulas And Optimizations
The problem is that the division operation(realized as multiplying the inverse of one field) needed in is expensive and is harder for constant time computation. By converting the points to the projective form, these division operations can be replaced with multiplications and additions, which are more efficient, and easier to implement for constant time computing.
The implementation uses the algorithms from Renes-Costello-Batina 2015, which takes advantage of the fact that the constant , allowing further optimizations.
Point addition follows the Algorithm 4 from [RCB 2015], which expresses the result in projective coordinates:
The constants come from the curve property . Because the constant is small, instead of a full field multiply, the algorithm uses the Double and Add method to compute the multiplication by with the intermediate results.
For example, it applies Double and Add method with the intermediate result for the effect of multiplication by 3, as shown in the code:
let bzz_part = xz_pairs - (C::EQUATION_B * zz); // 19, 20
let bzz3_part = bzz_part.double() + bzz_part; // 21, 22
For point doubling formula, the addition formula can be simplified to the following, because the points and are the same:
Similarly, instead of doing multiplications with the constants in red color, the implementation applies uses field addition operations with the intermediate results.
Mixed Addition
Mixed addition is another optimization method for point addition, in which one of the points is in affine coordinates and the other is in projective coordinates. By converting the point in affine coordinates to projective coordinates, the converted point has . By plugging it into the addition formula, it cancels the in the formula:
This avoids the multiplication operations for the terms involved with .
Scalar Multiplication
In Elliptic Curve cryptography, the scalar multiplication
is the fundamental building block for popular cryptographic schemes, such as ECDSA and ECDH. It is crucial to implement the scalar multiplication in constant time to prevent side-channel attacks, while ensuring the performance is efficient even it is done in constant time.
Typically, the scalar value can be represented as an -bit value:
where is the -th bit of the scalar value .
The scalar multiplication can be computed using the constant-time Double and Add algorithm, in which the next increment is always doubled () in each bit iteration and accumulated by multiplying with the bit value :
Instead of relying on this typical Double and Add algorithm, the implementation uses a windowed method to speed up the computation while preserving the constant time property.
The windowed method precomputes a table of points for , where and is the -th digit in base .
With the precomputed points, the typical Double and Add algorithm can be generalized as:
The implementation uses a 16 points table precomputed, which means that and the scalar multiplication can be computed as:
With Horner’s method, it computes the scalar multiplication from the most significant digit to the least significant digit:
In the implementation, the scalar value is firstly encoded in a byte array of size . Because the precomputed points table is 16 points, the scalar value needs to be processed in 4 bits at a time. Thus, it needs to process the byte array like a 4-bit array.
The pos
tracks the current position of the bit in the byte array where . The pos
starts from the most significant byte, and decrements 4 bit positions after each windowed process:
With the current position, it determines which index of the byte array:
Then it determins the bit offset of the position, which is either 0 or 4, in that byte :
Right shift the offset in the byte to obtain the slot value, which is either top or the bottom 4 bits in the byte:
The slot value is computed as:
let slot = (k[pos >> 3] >> (pos & 7)) & 0xf;
To choose the correct point from the table, it compares the slot value, which is , with the index of the precomputed table. When the indexes match, underflows to the maximum value of the field due to wrap around, and its most significant bit will be 1 for conditional_assign
.
for i in 1..16 {
t.conditional_assign(
&pc[i],
Choice::from(((slot as usize ^ i).wrapping_sub(1) >> 8) as u8 & 1),
);
}
ECDSA Overview
ECDSA has a number of standards, such as FIPS 186-5 (https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-5.pdf) and SEC1 (https://www.secg.org/sec1-v2.pdf), which specify the algorithm and its parameters.
RustCrypto’s ecdsa crate follows the RFC 6979 for deterministic nonce generation instead of relying on random nonce generation, which can be vulnerable to attacks if the same nonce is reused for different messages.
It also handles DER-encoded signatures and OIDs for compatibility with certificate systems, as specified in RFCs 3279, 5758, 5480, 5280 and 5912.
Besides, the crate supports signature normalization to avoid signature malleability attacks, and supports recovering public keys from signatures.
Note that the crate partially implements the hash truncation bit2int
specified in RFC6979. Even follows the truncation from the standards, it doesn’t guarantee resistance to hash collisions. Below explains the implementation details and the security implications.
Signing Algorithm
At the core, the crate uses the following ECDSA signing algorithm:
1. Hash the Message
Compute the cryptographic hash of the message :
Let be the bit‐length of the curve order , and define
and reduce via modulo
2. Generate Nonce
Deterministically generate based on , private key and an optional entropy via HMAC, such that
3. Compute Point
Multiply the generator point by :
4. Compute
Reduce the x-coordinate modulo :
5. Compute
Compute the signature’s second component:
where is your private key.
6. Output the Signature
The final signature is the pair .
Hazmat Functions
The hazmat functions in the RustCrypto ecdsa
crate are low-level functions that provide the fundamental algorithms for the crate’s higher-level ECDSA functions. These hazmat functions require careful handling due to their security assumptions. These functions include:
-
bits2field: Converts a hash value into field bytes to represent . It handles truncation and padding to ensure the hash is within a certain range on a specific curve.
-
sign_prehashed: Signs a pre-hashed message using the private key and nonce . It is up to the caller to ensure that the nonce is not reused.
-
sign_prehashed_rfc6979: Uses RFC 6979’s deterministic nonce generation, ensuring that the generated nonce is unique for each combination of truncated message and private key .
-
verify_prehashed: Verifies a pre-hashed signature against the public key and message hash .
Hash Truncation
At step 1, the hash value is truncated to the leftmost bits to .
The implementation bits2field
partially follows the bit2int
defined in [RFC6979 § 2.3.2]. The bits2field
function requires the hash value to be at least half the field size (e.g., 128 bits for the P-256 curve). This is an additional security check, deviated from the standards, which do not mandate a minimum hash size.
pub fn bits2field<C: EcdsaCurve>(bits: &[u8]) -> Result<FieldBytes<C>> {
// Minimum allowed bits size is half the field size
if bits.len() < C::FieldBytesSize::USIZE / 2 {
return Err(Error::new());
}
let mut field_bytes = FieldBytes::<C>::default();
match bits.len().cmp(&C::FieldBytesSize::USIZE) {
cmp::Ordering::Equal => field_bytes.copy_from_slice(bits),
cmp::Ordering::Less => {
// If bits is smaller than the field size, pad with zeroes on the left
field_bytes[(C::FieldBytesSize::USIZE - bits.len())..].copy_from_slice(bits);
}
cmp::Ordering::Greater => {
// If bits is larger than the field size, truncate
field_bytes.copy_from_slice(&bits[..C::FieldBytesSize::USIZE]);
}
}
Ok(field_bytes)
}
The truncation ensures that the bit length of is not greater than the scalar field of the elliptic curve. But this doesn’t guarantee that it can entirely avoid the hash collisions due to the hash value overflowing the scalar field, as explained later below.
The is determined by the FieldBytesSize
defined for a curve. For P-256, its field size in bits happens to be multiple of 8, so the FieldBytesSize
is 32 bytes.
impl elliptic_curve::Curve for NistP256 {
/// 32-byte serialized field elements.
type FieldBytesSize = U32;
TRUNCATED...
}
Even though the truncated hash value is having the same bit length as the scalar modulus , it is not guaranteed that the hash value is in the range of . For example, the is in the range , but the upper bound scalar modulus is far less than . So the may overflow the field size.
This overflow issue can be demonstrated in the following test:
#[test]
fn rfc6979() {
let x = hex!("c9afa9d845ba75166b5c215767b1d6934e50c3db36e89b127b8a622b120f6721");
let signer = SigningKey::from_bytes(&x.into()).unwrap();
let digest1 = hex!("ffffffff00000000ffffffffffffffffbce6faada7179e84f3b9cac2fc632552");
let digest2 = hex!("0000000000000000000000000000000000000000000000000000000000000001");
let signature1: Signature = signer.sign_prehash(&digest1).unwrap();
let signature2: Signature = signer.sign_prehash(&digest2).unwrap();
assert_eq!(
signature1.to_bytes().as_slice(),
signature2.to_bytes().as_slice()
);
}
For P-256, the set of 256-bit integers has size roughly
so about one in random SHA-256 outputs will lie above and “wrap around”. Collisions do exist, but for a uniformly random 256-bit hash they’re vanishingly rare in practice.
Therefore, the truncation specified in the standards, such as RFC6979, is to reduce the chances of hash collisions rather than entirely eliminate them.
The security implication also depends on how the verifier incorporates the hash value . If the application directly uses the hash value supplied by the prover instead of generating the hash, then the prover can take advantage of this overflow issue to do replay attacks. For example, the prover can reuse a signature but with a different hash value if the application relies on the hash value for identity purposes, such as a key for nullifier.
Furthermore, note that for the curve P-521, the field size is not a multiple of 8, so the FieldBytesSize
is 66 bytes, which is equivalent to 528 bits, leading to 7 more bits (528 - 521) for hash collisions. Beside the higher chances of hash collisions, it doesn’t follow the RFC6979 standard, which requires the hash value to be at least the same size as the field size.
impl elliptic_curve::Curve for NistP521 {
/// 66-byte serialized field elements.
type FieldBytesSize = U66;
TRUNCATED...
}
Deterministic Nonce Generation (RFC 6979)
One of the potential problems of directly using the hazmat function sign_prehashed
is the risk of nonce reuse, which will lead to private key recovery. The nonce must be unique for each signature.
The hazmat function sign_prehashed_rfc6979
implements the deterministic nonce method specified in the RFC 6979:
pub fn sign_prehashed_rfc6979<C, D>(
d: &NonZeroScalar<C>,
z: &FieldBytes<C>,
ad: &[u8],
) -> Result<(Signature<C>, RecoveryId)>
where
C: EcdsaCurve + CurveArithmetic,
D: Digest + BlockSizeUser + FixedOutput + FixedOutputReset,
SignatureSize<C>: ArraySize,
{
let z2 = <Scalar<C> as Reduce<C::Uint>>::reduce_bytes(z);
let k = NonZeroScalar::<C>::from_repr(rfc6979::generate_k::<D, _>(
&d.to_repr(),
&C::ORDER.encode_field_bytes(),
&z2.to_repr(),
ad,
))
.unwrap();
sign_prehashed(d, &k, z)
}
The nonce is generated using HMAC, seeded with the private key , curve order , reduced hash value , and optional additional data (extra entropy or domain separation). This ensures is unique per message and private key, eliminating reuse risks.
Recovery ID
From the sign_prehashed
, along with the signature, a recovery ID is also returned. The recovery ID is used to recover the public key from the signature and message. It is particularly useful for checking if a signature is corresponding to a specific public key.
Recall that in the public key recovery algorithm, the potential public key can be calculated as:
where can be the residual of modulo , and can be calculated by plugging into the elliptic curve equation , in which and defined for specific curve.
Because the point is a field element on the field , which can be greater than curve order , so the can be either or . Meanwhile, the can be either or .
To obtain the correct public key, the recovery Id indicates which of the possible points corresponds to the signature:
let recovery_id = RecoveryId::new(R.y_is_odd().into(), x_is_reduced);
Normalize Signature
ECDSA signatures are malleable because signatures and are both valid for the same message. To prevent the verifier from accepting both signatures, the sign_prehashed
function normalizes the value to ensure it is in the lower half of the range. This is done by checking if is greater than half the curve order and adjusting it accordingly.
The crate provides an interface NORMALIZE_S
for the curve implementation to define whether sign_prehashed
should normalize the value.
if C::NORMALIZE_S {
recovery_id.0 ^= s.is_high().unwrap_u8();
signature = signature.normalize_s();
}
The is_high()
determines if , and normalize_s()
adjusts . Since adjusting to is equivalent to replacing the nonce by , the new nonce produces the public key point , which flips the y-coordinate. The recovery ID needs to be adjusted accordingly to indicate the new point .
Signature Format
The signature is encoded in a format called DER (Distinguished Encoding Rules), ensuring that the signature can be transmitted and stored in a standardized way.
OID (Object Identifier) is a unique identifier used to specify the algorithm used for signing. In the context of ECDSA, the OID indicates the specific elliptic curve and hash function used in the signature. For example, the OID for ECDSA with SHA-256 with the P-256 curve is 1.2.840.10045.3.1.7
.
In the RustCrypto ecdsa crate, the combination of the signature and the OID can be represented by the struct SignatureWithOid<C>
.
pub struct SignatureWithOid<C: EcdsaCurve> {
signature: Signature<C>,
oid: ObjectIdentifier,
}
Verification Algorithm
Given a public key , message or pre-hash , and signature , the verification algorithm is as follows:
1. Validate the ranges
2. Hash and truncate
If not pre-hashed, compute and set
3. Inverse
4. Twin scalars
5. Joint multiplication
Reject the point at infinity.
6. Final check
Accept only if
Whether to reject signatures where to prevent malleability, it is up to the NORMALIZE_S
definition of the curve or how the caller uses the verify API.
Verification Implementations
The verification can be done either by hashing the message and then verifying the signature, or by providing hash and verifying the signature directly.
With SignatureWithOid
to represent the signature and the hashing algorithm, it hashes the message with corresponding hashing algorithm to different OID before verifying the signature:
impl<C> Verifier<SignatureWithOid<C>> for VerifyingKey<C>
where
C: EcdsaCurve + CurveArithmetic + DigestPrimitive,
SignatureSize<C>: ArraySize,
{
fn verify(&self, msg: &[u8], sig: &SignatureWithOid<C>) -> Result<()> {
match sig.oid() {
ECDSA_SHA224_OID => self.verify_prehash(&Sha224::digest(msg), sig.signature()),
ECDSA_SHA256_OID => self.verify_prehash(&Sha256::digest(msg), sig.signature()),
ECDSA_SHA384_OID => self.verify_prehash(&Sha384::digest(msg), sig.signature()),
ECDSA_SHA512_OID => self.verify_prehash(&Sha512::digest(msg), sig.signature()),
_ => Err(Error::new()),
}
}
}
Otherwise, it verifies with pre-hashed message:
impl<C> PrehashVerifier<Signature<C>> for VerifyingKey<C>
where
C: EcdsaCurve + CurveArithmetic,
SignatureSize<C>: ArraySize,
{
fn verify_prehash(&self, prehash: &[u8], signature: &Signature<C>) -> Result<()> {
if C::NORMALIZE_S && signature.s().is_high().into() {
return Err(Error::new());
}
hazmat::verify_prehashed::<C>(
&ProjectivePoint::<C>::from(*self.inner.as_affine()),
&bits2field::<C>(prehash)?,
signature,
)
}
}
Performance Optimizations
The arithmetic operations involved in ECDSA signing and verification can be categorized into two types: scalar multiplication and field element operations.
The point additions and doubling, which are performed in the calculations like public key and , can be optimized through the Montgomery reduction. This is because in projective coordinates, it needs to deal with more multiplications. With the Montgomery reduction, the overhead of repeated multiplications can be amortized.
For the scalar field operations, such as , can be optimized by Barrett reduction.