Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Code golfing] secp256k1 programming language benchmark #285

Open
9 tasks
mratsim opened this issue Oct 17, 2023 · 0 comments
Open
9 tasks

[Code golfing] secp256k1 programming language benchmark #285

mratsim opened this issue Oct 17, 2023 · 0 comments
Labels
enhancement :shipit: New feature or request performance 🏁

Comments

@mratsim
Copy link
Owner

mratsim commented Oct 17, 2023

Following https://forum.nim-lang.org/t/10550#70556, this outlines step to put Nim at the top of https://programming-language-benchmarks.vercel.app/problem/secp256k1

  • Benchmark the top implementations.
    Rust and Go on your machine. Go is using the standard libsecp256k1 library, which is also wrapped in Nim: https://github.com/status-im/nim-secp256k1
    Difficulty: Very easy
  • Implement addition chain for 21
    Here:
    func `*=`*(a: var FF, b: static int) =
    ## Multiplication by a small integer known at compile-time
    # Implementation:
    # We don't want to go convert the integer to the Montgomery domain (O(n²))
    # and then multiply by ``b`` (another O(n²)
    #
    # So we hardcode addition chains for small integer
    #
    # In terms of cost a doubling/addition is 3 passes over the data:
    # - addition + check if > prime + conditional substraction
    # A full multiplication, assuming b is projected to Montgomery domain beforehand is:
    # - n² passes over the data, each of 5~6 elementary addition/multiplication
    # - a conditional substraction
    #

    This is blocking for projective coordinates benchmarking due to this:
    t2 *= b3 # 21. t₂ <- 3b t₂ t₂ = 3bZ₁Z₂

    For secp256k1, curve equation y² = x³ + ax + b, with a=0 and b=7.
    - Optimal shortest addition chains can be found here: https://wwwhomes.uni-bielefeld.de/achim/addition_chain.html
    Difficulty: Very easy
  • Benchmark Constantine on the same computer to have a baseline.
    CC=clang nimble bench_ec_g1_scalarmul
    Projective coordinates will need to be commented out if previous step has been skipped, endomorphism acceleration will need to be commented out.
    Difficulty: Very easy
  • Add endomorphism acceleration for secp256k1.
    The constants can be generated from https://github.com/mratsim/constantine/blob/master/sage/derive_endomorphisms.sage. Note there is no G1 or G2 for non-pairing-friendly curves like secp256k1 or Banderwagon so the Sage code needs to be modified to handle both case. The constants can then be added to https://github.com/mratsim/constantine/tree/master/constantine/math/constants.
  • Add Fixed-base scalar mul via LSB set encoding #73
  • Add Finite field computation for moduli of special form #11
    • This will require refactoring of the core Fp type to indicate if they are generic or use a special-form. This may use either separate FpGeneric and FpPseudoMersenne or a static enum:
      type
      Fp*[C: static Curve] = object
      ## All operations on a Fp field are modulo P
      ## P being the prime modulus of the Curve C
      ## Internally, data is stored in Montgomery n-residue form
      ## with the magic constant chosen for convenient division (a power of 2 depending on P bitsize)
      # TODO, pseudo mersenne primes like 2²⁵⁵-19 have very fast modular reduction
      # and don't need Montgomery representation
      mres*: matchingBigInt(C)
      Fr*[C: static Curve] = object
      ## All operations on a field are modulo `r`
      ## `r` being the prime curve order or subgroup order
      ## Internally, data is stored in Montgomery n-residue form
      ## with the magic constant chosen for convenient division (a power of 2 depending on P bitsize)
      mres*: matchingOrderBigInt(C)
      FF*[C: static Curve] = Fp[C] or Fr[C]
    • type FpGeneric[C: static Curve] = object
        mres*: matchingBigInt(C)
      type FpPseudoMersenne[C: static Curve] = object
        mres*: matchingBigInt(C)
      
      type Fp[C: static Curve] = FpGeneric[C] or FpPseudoMersenne[C]
      or
      type SpecialPrime = object
        kGeneric
        kGeneralizedMersenne
        kPseudoMersenne
        kGolden
      
      type Fp[C: static Curve, SP: static SpecialPrime] = object
        mres*: matchingBigInt(C)
      Additionally a special pseudo-mersenne reduction need to be added, similar to
      func partialReduce_1305[N1, N2: static int](r: var Limbs[N1], a: Limbs[N2]) =
      ## The prime 2¹³⁰-5 has a special form 2ᵐ-c
      ## called "Crandall prime" or Pseudo-Mersenne Prime
      ## in the litterature
      ## which allows fast reduction from the fact that
      ## 2ᵐ-c ≡ 0 (mod p)
      ## <=> 2ᵐ ≡ c (mod p) [1]
      ## <=> a2ᵐ+b ≡ ac + b (mod p)
      ##
      ## This partially reduces the input in range [0, 2¹³⁰)
    • Difficulty: Medium on math, Hard on refactoring
    • Expected speedup:
      Currently, most implementation splits the field Fp over 5x52-bit limbs or 10x26-bit limbs. This predates the existence of MULX, ADOX, ADCX instructions introduced in 2015 for accelerating bigint multiplication. The issue is that cost scales with the square of number of limbs, with by about 3n² so between 4 and 5 limbs, it's 48 vs 75 operations.
      Constantine already provides fast broadwell multiplication and special prime reductions are significantly faster than Montgomery reductions.
      25% over baseline
    • Reference papers:
  • Fused multiplication + special reduction for secp256k1
    This fuses multiplication and reduction by moduli of special form
    • Difficulty: Very Hard, need to translate algorithm to assembly and debug assembly
    • Expected speedup: 15% to 20% over special reduction.
    • Reference paper:
      Efficient Arithmetic In (Pseudo-)Mersenne Prime Order Fields
      Kaushik Nath and Palash Sarkar
      https://eprint.iacr.org/2018/985
  • Implement the high-level API for private key to public key for secp256k1
    • Difficulty: very easy
    • For the programming language benchmark competition, only private to public key is needed, which is the same as BLS signature, a scalar multiplication:
      func derivePubkey*[Pubkey, SecKey](pubkey: var Pubkey, seckey: SecKey) =
      ## Generates the public key associated with the input secret key.
      ##
      ## The secret key MUST be in range (0, curve order)
      ## 0 is INVALID
      const Group = Pubkey.G
      type Field = Pubkey.F
      const EC = Field.C
      var pk {.noInit.}: ECP_ShortW_Jac[Field, Group]
      pk.fromAffine(EC.getGenerator($Group))
      pk.scalarMul(seckey)
      pubkey.affine(pk)
    • And high-level wrapper:
      func derive_pubkey*(public_key: var PublicKey, secret_key: SecretKey) {.libPrefix: prefix_ffi.} =
      ## Derive the public key matching with a secret key
      ##
      ## The secret_key MUST be validated
      public_key.raw.derivePubkey(secret_key.raw)
  • Stretch, implement the full libsecp256k1 API
    • Difficulty: medium
    • All the primitives of ECDSA are present and the API is well known, so the difficulty lies in combining the building blocks and doing tests.
@mratsim mratsim added enhancement :shipit: New feature or request performance 🏁 labels Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement :shipit: New feature or request performance 🏁
Projects
None yet
Development

No branches or pull requests

1 participant