Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add library for compile time instantiation of elliptic curves #3979

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

randombit
Copy link
Owner

@randombit randombit commented Apr 6, 2024

This new approach has a lot of benefits. It avoids the side channels that plague anything based on BigInt (or any general purpose arbitrary length integer type), it also avoids almost all heap allocations and inlines nicely.

Performance advantage depends on curve and compiler. For smaller curves (eg secp256r1) ECDSA seems be almost twice as fast. For larger such as secp521r1 the advantage is more like 40%. Clang seems to be able to optimize everything much better than GCC here, for reasons I haven't been able to nail down.

Performance can be improved significantly from here; besides P-521 we're not taking advantage of specialized reductions possible with many primes, nor is there a system to express the precomputed addition chains used for inversions. The BigInt based code makes use of both of these already.

One disadvantage is that the code size is quite significant. The worst overhead is actually the symbol names. I've applied various tricks to reduce these but still over half the size of math_pcurves.o is (per Google's bloaty tool) symbols. I'd like to reduce these in the future, but I don't think it's a blocker to merge.

@randombit randombit added this to the Botan 3.5.0 milestone Apr 6, 2024
@randombit randombit requested a review from reneme April 6, 2024 10:24
@reneme
Copy link
Collaborator

reneme commented Apr 6, 2024

Awesome! I'll certainly have a look. Not before tomorrow, though.

@randombit
Copy link
Owner Author

randombit commented Apr 6, 2024

No rush! This is certainly not going into 3.4 so there is plenty of time.

Most interesting thing about the sea of red in CI - GCC 11.4 on x86 fails with constexpr timeout, while same version of GCC on say Aarch64 or S390 accepts. So constexpr limits are architecture dependent [*] :( and thus intrinsically flaky since you can be near some limit without knowing it

[*] And also version dependent since GCC 13 on my machine is fine with the code without increasing the constexpr limit.

@randombit
Copy link
Owner Author

Clang seems to have a nasty bug where std::is_constant_evaluated returns false incorrectly in constexpr context, then it rejects the code because we used asm or volatile in constexpr context in the !std::is_constant_evaluated branch 😡

It works fine for me in Clang 17 on my machines so I'm assuming this was a Clang bug that was fixed subsequently. I'd vote for just bumping the Clang minimum version - that's an absolutely insane bug and nearly impossible to work around - but unfortunately the version in Android NDK also has the bug and we are stuck with that version at least until the next NDK release.

@coveralls
Copy link

coveralls commented Apr 6, 2024

Coverage Status

coverage: 91.923% (-0.1%) from 92.024%
when pulling 43dfb1d on jack/pcurves
into adda2dd on master.

@randombit
Copy link
Owner Author

Downside of this approach is that (due to name mangling being somewhat horribly thought out for this case) object sizes are really big. Even with just 3 curves, on my machine the object file is one of the largest from the whole library, that will get a lot worse with 27. And compile times may prove untenable.

I'm sure 90% of this can be salvaged but I suspect in the end the string based instantiation won't work in practice 😭 which is a shame since it's quite pretty imo

@randombit
Copy link
Owner Author

Clang bug might be llvm/llvm-project#55638

@randombit
Copy link
Owner Author

This error

 call to consteval function 'Botan::p_minus<(unsigned char)'\x01', unsigned long, 4UL>' is not a constant expression

Looks like llvm/llvm-project#51182

@randombit
Copy link
Owner Author

I can repro the std::if_constant_evaluated problem using Clang 16 on my machine so it looks to be a bug fixed in Clang 17, though I don't see anything that looks relevant in the release notes.

@randombit randombit force-pushed the jack/pcurves branch 4 times, most recently from 3496464 to cd40575 Compare April 7, 2024 13:17
Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! 😄

I merely skimmed through the changes and left a few suggestions and comments here and there. No thorough review whatsoever.

*/

#ifndef BOTAN_PCURVES_UTIL_H_
#define BOTAN_PCURVES_UTIL_H_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment for this file (and perhaps other places):

I'd suggest to use std::span<const W, N> instead of std::array<>& for the parameters of those utilities. No need to restrict those functions to arrays, IMO. And a static-length span should provide the same guarantees and optimization opportunities as an array, no?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main problem is for some technical reason I'm not bothering to learn the details of, C++ can't deduce the conversion from a statically sized array to a statically sized span. https://stackoverflow.com/questions/70983595

This can be worked around with some additional noise but the additional flexibility doesn't buy us anything in this context so I'd rather keep things as simple as possible.

src/lib/pubkey/pcurves/pcurves_util.h Outdated Show resolved Hide resolved
src/lib/pubkey/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
Comment on lines 217 to 225
auto v = bigint_monty_redc(m_val, Self::P, Self::P_dash);
std::reverse(v.begin(), v.end());
auto bytes = store_be(v);

if constexpr(Self::BYTES == Self::N * WordInfo<W>::bytes) {
return bytes;
} else {
// Remove leading zero bytes
const size_t extra = Self::N * WordInfo<W>::bytes - Self::BYTES;
std::array<uint8_t, Self::BYTES> out;
copy_mem(out.data(), &bytes[extra], Self::BYTES);
return out;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could get away without the if constexpr and the potential memory copy entirely?

Suggested change
auto v = bigint_monty_redc(m_val, Self::P, Self::P_dash);
std::reverse(v.begin(), v.end());
auto bytes = store_be(v);
if constexpr(Self::BYTES == Self::N * WordInfo<W>::bytes) {
return bytes;
} else {
// Remove leading zero bytes
const size_t extra = Self::N * WordInfo<W>::bytes - Self::BYTES;
std::array<uint8_t, Self::BYTES> out;
copy_mem(out.data(), &bytes[extra], Self::BYTES);
return out;
}
auto v = bigint_monty_redc(m_val, Self::P, Self::P_dash);
std::reverse(v.begin(), v.end());
std::array<uint8_t, Self::BYTES> out = {0};
static_assert(Self::N * WordInfo<W>::bytes >= Self::BYTES); // is this guranteed somehow anyway?
const size_t zero_padding_offset = Self::N * WordInfo<W>::bytes - Self::BYTES;
store_be(std::span{out}.template subspan<zero_padding_offset>(), v);
return out;

... this is untested. Just a sketch of the idea. store_be with a statically-sized out-param will static_assert that the byte buffer matches the input range exactly.

src/lib/pubkey/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
Comment on lines 350 to 353
constexpr std::array<uint8_t, Self::BYTES> serialize() const {
std::array<uint8_t, Self::BYTES> r = {};
BufferStuffer pack(r);
pack.append(0x04);
pack.append(m_x.serialize());
pack.append(m_y.serialize());
return r;
}
Copy link
Collaborator

@reneme reneme Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could use the concat helpers for this. Currently, that doesn't work, because concat requires the output type to have .insert().This could then also (statically) assert that the output array was filled entirely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3994 is work towards this. It makes concat() constexpr and also automagically inferring the output array size. So this should actually work as:

Suggested change
constexpr std::array<uint8_t, Self::BYTES> serialize() const {
std::array<uint8_t, Self::BYTES> r = {};
BufferStuffer pack(r);
pack.append(0x04);
pack.append(m_x.serialize());
pack.append(m_y.serialize());
return r;
}
constexpr std::array<uint8_t, Self::BYTES> serialize() const {
return concat(
store_le<uint8_t>(0x04), // store_le is just to make it look like a byte-array
m_x.serialize(),
m_y.serialize());
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems to be missing still

build/include/internal/botan/internal/loadstor.h: In instantiation of ‘constexpr auto Botan::store_le(ParamTs&& ...) [with ModifierT = unsigned char; ParamTs = {int}]’:
build/include/internal/botan/internal/pcurves_impl.h:351:30:   required from here
build/include/internal/botan/internal/loadstor.h:699:67: error: no matching function for call to ‘store_any<Botan::detail::Endianness::Little, unsigned char>(int)’
  699 |    return detail::store_any<detail::Endianness::Little, ModifierT>(std::forward<ParamTs>(params)...);
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

void conditional_assign(bool cond, const Self& pt) {
m_x.conditional_assign(cond, pt.x());
m_y.conditional_assign(cond, pt.y());
m_z.conditional_assign(cond, pt.z());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is needed for a constant-time implementation? So far, I saw it as an unspoken rule that all methods that are "supposed to be constant-time" are prefixed with ct_.... I found this a helpful convention, that might be applicable here as well.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this code I'm going for the opposite convention namely that everything is constant time by default and anything that is intentionally variable time has a _vartime suffix.

(There are some current exceptions eg in the point multiplication that will be fixed before merging.)

@randombit
Copy link
Owner Author

GCC 11 miscompilation, excellent

@randombit randombit force-pushed the jack/pcurves branch 12 times, most recently from 2feb181 to 3bde391 Compare April 14, 2024 09:51
This drastically speeds up the projective->affine conversion
since we can use a batch operation.
Improves point doubling performance by over 20% with Clang
Good enough and saves a lot of binary size
In practice (at least currently) we don't require either single point
precomputed multiplications *or* on the fly mul2.

Removing these operations saves quite a bit of binary space.
@randombit
Copy link
Owner Author

@reneme I think this is ready to go.

Allow setting the scalar blinding bits to zero
Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments/questions/nits. I'm still figuring out the structure of this. So far: really nice!
Currently, my biggest concerns are:

  • Why are the instances of the PrimeOrderCurveImpl template singletons? My concern is that the singletons' states will become an issue in testing (for instance). And can those singletons be avoided? I didn't get deep enough into the implementation to be able to have a decent opinion on that.
  • Should there be a way to select specific curves at compile-time? At the moment, its "all-or-nothing" and, I feel, especially for embedded use cases it might be helpful for binary size to be able to fine-tune which curves should be compiled.

And one general wish:

Some Doxygen comments describing the rough use case of each class and also their relationship would really help to get started with this. 😅

@@ -0,0 +1,304 @@
/*
* (C) 2014,2015,2019 Jack Lloyd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* (C) 2014,2015,2019 Jack Lloyd
* (C) 2024 Jack Lloyd

... I guess?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy(right)pasta

Comment on lines +860 to +870
if(i->second.starts_with("0x")) {
if(i->second.size() % 2 == 0) {
return Botan::hex_decode(i->second.substr(2));
} else {
std::string z = i->second;
std::swap(z[0], z[1]); // swap 0x to x0 then remove x
return Botan::hex_decode(z.substr(1));
}
} else {
return Botan::hex_decode(i->second);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ecdsa_pcurves.vec is the only test vector that got added in this pull request; and it doesn't seem to include 0x prefixes. Am I missing something, or could that be a left-over that is actually not needed anymore?

(No objection to leave it in regardless, just wondering)

Comment on lines +96 to +97
/// Creates a generic non-optimized version
//static std::shared_ptr<const PrimeOrderCurve> from_params(...);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO or left over?

Comment on lines +99 to +100
typedef std::array<word, StorageWords> StorageUnit;
typedef std::shared_ptr<const PrimeOrderCurve> CurvePtr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move up those definitions to use CurvePtr for from_name() and from_id(), perhaps?

Comment on lines +104 to +107
Scalar(const Scalar& other) = default;
Scalar(Scalar&& other) = default;
Scalar& operator=(const Scalar& other) = default;
Scalar& operator=(Scalar&& other) = default;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule of five (applies also to the following inline classes):

Suggested change
Scalar(const Scalar& other) = default;
Scalar(Scalar&& other) = default;
Scalar& operator=(const Scalar& other) = default;
Scalar& operator=(Scalar&& other) = default;
Scalar(const Scalar& other) = default;
Scalar(Scalar&& other) = default;
Scalar& operator=(const Scalar& other) = default;
Scalar& operator=(Scalar&& other) = default;
~Scalar() = default;

}

template <WordType W, size_t N, size_t L>
inline constexpr auto bytes_to_words(const uint8_t bytes[L]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not:

Suggested change
inline constexpr auto bytes_to_words(const uint8_t bytes[L]) {
inline constexpr auto bytes_to_words(std::span<const uint8_t, L> bytes) {

At the call sites you'd then do bytes_to_words<...>(std::span{padded_bytes}) (which is type safe and conserves the statically known array length) instead of the bytes_to_words<...>(&padded_bytes[0]) dereference.

return Self(Rep::wide_to_rep(bytes_to_words<W, 2 * N, 2 * BYTES>(&padded_bytes[0])));
}

static constexpr Self random(RandomNumberGenerator& rng) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the constexpr here, because of RNG::randomize().

Suggested change
static constexpr Self random(RandomNumberGenerator& rng) {
static Self random(RandomNumberGenerator& rng) {

Comment on lines +475 to +476
for(;;) {
rng.randomize(buf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A broken RNG (e.g. that produces only 1-bits) would send this into an endless loop. Is that something we would want to catch? For instance, with an upper bound of iterations of this for loop? We've had similar instances in the sampling code of variaous PQC algorithms.

Comment on lines +486 to +488
if(!s.value().is_zero()) {
return s.value();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
if(!s.value().is_zero()) {
return s.value();
}
if(!s->is_zero()) {
return s.value();
}

Comment on lines +544 to +572
std::vector<uint8_t> serialize_to_vec(bool compress) const {
if(compress) {
const auto b = this->serialize_compressed();
return std::vector(b.begin(), b.end());
} else {
const auto b = this->serialize();
return std::vector(b.begin(), b.end());
}
}

constexpr std::array<uint8_t, Self::BYTES> serialize() const {
std::array<uint8_t, Self::BYTES> r = {};
BufferStuffer pack(r);
pack.append(0x04);
pack.append(x().serialize());
pack.append(y().serialize());
return r;
}

constexpr std::array<uint8_t, Self::COMPRESSED_BYTES> serialize_compressed() const {
const bool y_is_even = y().is_even();
const uint8_t hdr = y_is_even ? 0x02 : 0x03;

std::array<uint8_t, Self::COMPRESSED_BYTES> r = {};
BufferStuffer pack(r);
pack.append(hdr);
pack.append(x().serialize());
return r;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a similar sentiment as for IntMod, this implements the base serialization into a std::span with static extent. It also uses the refactored serialization to save a few copied bytes.

Suggested change
std::vector<uint8_t> serialize_to_vec(bool compress) const {
if(compress) {
const auto b = this->serialize_compressed();
return std::vector(b.begin(), b.end());
} else {
const auto b = this->serialize();
return std::vector(b.begin(), b.end());
}
}
constexpr std::array<uint8_t, Self::BYTES> serialize() const {
std::array<uint8_t, Self::BYTES> r = {};
BufferStuffer pack(r);
pack.append(0x04);
pack.append(x().serialize());
pack.append(y().serialize());
return r;
}
constexpr std::array<uint8_t, Self::COMPRESSED_BYTES> serialize_compressed() const {
const bool y_is_even = y().is_even();
const uint8_t hdr = y_is_even ? 0x02 : 0x03;
std::array<uint8_t, Self::COMPRESSED_BYTES> r = {};
BufferStuffer pack(r);
pack.append(hdr);
pack.append(x().serialize());
return r;
}
constexpr void serialize_to(std::span<uint8_t, Self::BYTES> bytes) const {
BufferStuffer pack(bytes);
pack.append(0x04);
x().serialize_to(pack.next<FieldElement::BYTES>());
y().serialize_to(pack.next<FieldElement::BYTES>());
BOTAN_DEBUG_ASSERT(pack.full());
}
constexpr void serialize_compressed_to(std::span<uint8_t, Self::COMPRESSED_BYTES> bytes) const {
const bool y_is_even = y().is_even();
const uint8_t hdr = y_is_even ? 0x02 : 0x03;
BufferStuffer pack(bytes);
pack.append(hdr);
x().serialize_to(pack.next<FieldElement::BYTES>());
BOTAN_DEBUG_ASSERT(pack.full());
}
constexpr std::array<uint8_t, Self::BYTES> serialize() const {
std::array<uint8_t, Self::BYTES> r;
serialize_to(std::span{r});
return r;
}
constexpr std::array<uint8_t, Self::COMPRESSED_BYTES> serialize_compressed() const {
std::array<uint8_t, Self::COMPRESSED_BYTES> r;
serialize_compressed_to(std::span{r});
return r;
}
template <concepts::resizable_byte_buffer T>
constexpr T serialize(bool compress) const {
T b;
if(compress) {
b.resize(Self::COMPRESSED_BYTES);
this->serialize_compressed_to(std::span<uint8_t, Self::COMPRESSED_BYTES>{b});
} else {
b.resize(Self::BYTES);
this->serialize_to(std::span<uint8_t, Self::BYTES>{b});
}
return b;
}

@reneme
Copy link
Collaborator

reneme commented May 28, 2024

For the record: I'm working on some high-level overview to structure my reading. Perhaps its helpful for others as well:

image
Excalidraw.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants