Implement initial ReLU/sign function via polynomial approximation #658

j2kun · 2024-04-29T18:05:02Z

Given that ReLU(x) = x (0.5 + 0.5 sgn(x)), this reduces to approximating the sign function, and this paper appears to have the state of the art: https://eprint.iacr.org/2020/834

The text was updated successfully, but these errors were encountered:

j2kun · 2024-04-29T18:05:33Z

Also cf. https://openreview.net/pdf?id=Hq16Jk2bVlp and https://eprint.iacr.org/2021/1688 which use these approximations.

j2kun · 2024-04-29T23:57:04Z

The paper linked above 2020/834 does not release source code, but there is a reference implementation in LattiGo: https://github.com/tuneinsight/lattigo/blob/4cce9a48c1daaa2dd122921822f5ad70cd444156/he/hefloat/minimax_composite_polynomial.go#L124

j2kun · 2024-04-30T16:52:23Z

The paper https://eprint.iacr.org/2019/1234 is a precursor to https://eprint.iacr.org/2020/834, but also seems to explain more of the motivation behind the composite polynomials.

j2kun · 2024-05-01T03:54:13Z

An example of generating a well-fitting polynomial using lolremez: samhocevar/lolremez#28 (comment)

Another tool: https://github.com/pychebfun/pychebfun

j2kun · 2024-05-01T13:58:28Z

Outline sketch:

Use an existing Remez solver to find any old approximation to sgn or max of some arbitrary degree, e.g., from Implement initial ReLU/sign function via polynomial approximation #658 (comment)
Implement a lowering to that fixed polynomial, and run an e2e test
Use a Paterson-Stockmeyer approach to minimize mul ops (https://www.csd.uwo.ca/~mmorenom/HPCA-ACA-2017/Sivan_Toledo.ACA-2017-Talk.pdf has some notes, looking for a better source)

Various improvements based on more recent research that would be worth splitting into separate tickets.

Multi-interval Remez solver for a better sgn approximation: Algorithm 2 of https://eprint.iacr.org/2020/834
Domain extension polynomial to improve sqrt(n) muls to log(n): https://eprint.iacr.org/2022/280

j2kun · 2024-05-02T02:59:35Z

Thanks to Seonhong Min for sending me https://eprint.iacr.org/2018/462, in which it shows that BFV achieves polynomial approximations via a fixed-point approximation, not a floating point one. I think there is also some complexity there in that evaluating a fixed point polynomial approximation also requires a rounding step, but not always, see sec 2.5

Maokami · 2024-05-02T12:01:28Z

I also had an interest in this issue a while back, so I know a paper worth sharing: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10155408
It's a follow-up paper by the authors of https://eprint.iacr.org/2020/834, and there's an implementation: https://github.com/snu-ccl/approxCNN/tree/main.
In the repo, you'll find that they've hardcoded the coefficients of polynomial approximations of sign function from alpha=4 to 14!

Maokami · 2024-05-02T12:24:34Z

The paper linked above 2020/834 does not release source code, but there is a reference implementation in LattiGo: https://github.com/tuneinsight/lattigo/blob/4cce9a48c1daaa2dd122921822f5ad70cd444156/he/hefloat/minimax_composite_polynomial.go#L124

Oh, I didn't know there was an implementation in Lattigo! In that case, these hardcoded coefficients might not be necessary.

j2kun · 2024-05-02T18:07:09Z

After starting an implementation in #665 (with a fixed approximation polynomial) and discussing in the HEIR meeting today, I have a few kinks to work out:

I want to separate the task of choosing a polynomial approximation from the optimizations around evaluating it. This implies:
1. I need a floating-point representation of a polynomial in the IR, but PolynomialAttr currently only supports integer coefficients
2. I need a new op, say poly_ext.eval whose operands are the polynomial to apply and its input
3. The approximation itself is for sign, but these operations are actually applied to max(0, x) = (x + x * sign(x)) / 2, which means we should support some internal polynomial arithmetic to construct these from the checked-in approximation. We meant to do this to support constant folding in the polynomial dialect, but never got around to it.
(1) has a twist in that many more advanced polynomial approximations are not represented literally, but implicitly as a composition of smaller degree polynomials. This implies I will need a polynomial.compose op, or else an attribute that supports composite sub-polynomials, and cannot limit (1.ii) above to a single static polynomial. I think I will start with a single static polynomial but try to avoid making it difficult to upgrade to a composite polynomial.
The approximate polynomial itself has a few quirks, because its coefficients further need to be encoded in a scheme-specific fashion. For CKKS this is relatively straightforward, but introduces additional error. For BGV/BFV this seems much harder, in part because the encodings are fixed-point and hence require rounding during polynomial evaluation, but rounding itself is hard (see above). There is also a question about which basis the polynomial is expressed in, cf. https://discord.com/channels/901152454077452399/1235349479482196049 for more on this
The above points expose a problem with "lowering a ReLU": at the tosa level we don't yet know what scheme will be chosen, so the choice of polynomial approximation can't be scheme-aware or encoding-aware. I think the right solution here will be to include some extra metadata on the polynomial to express what function is being approximated, so that we can re-approximate it at lower levels if necessary.

j2kun · 2024-05-08T17:02:21Z

These folks do something slightly different, which is more holistic in re NN training: https://github.com/EfficientFHE/SmartPAF

They pick small degree polynomial approximations and then do a variety of network fine-tuning to adapt the network to the replaced operations. This seems out of scope of the compiler, since it would require training data to be included.

j2kun changed the title ~~Implement ReLU/sign function via polynomial approximation~~ Implement initial ReLU/sign function via polynomial approximation May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement initial ReLU/sign function via polynomial approximation #658

Implement initial ReLU/sign function via polynomial approximation #658

j2kun commented Apr 29, 2024

j2kun commented Apr 29, 2024

j2kun commented Apr 29, 2024

j2kun commented Apr 30, 2024

j2kun commented May 1, 2024 •

edited

j2kun commented May 1, 2024 •

edited

j2kun commented May 2, 2024

Maokami commented May 2, 2024 •

edited

Maokami commented May 2, 2024

j2kun commented May 2, 2024

j2kun commented May 8, 2024

Implement initial ReLU/sign function via polynomial approximation #658

Implement initial ReLU/sign function via polynomial approximation #658

Comments

j2kun commented Apr 29, 2024

j2kun commented Apr 29, 2024

j2kun commented Apr 29, 2024

j2kun commented Apr 30, 2024

j2kun commented May 1, 2024 • edited

j2kun commented May 1, 2024 • edited

j2kun commented May 2, 2024

Maokami commented May 2, 2024 • edited

Maokami commented May 2, 2024

j2kun commented May 2, 2024

j2kun commented May 8, 2024

j2kun commented May 1, 2024 •

edited

j2kun commented May 1, 2024 •

edited

Maokami commented May 2, 2024 •

edited