Improvements for Halide floating-point math functions. #8199

zvookin · 2024-04-17T17:35:57Z

Halide provides multiple versions of routines found the libm math library such as transcendental functions. E.g. for exp there are:

Plain exp, which is intended to map to the platform support for e^x on the appropriate data type, generally float and double.
halide_exp which is implemented in Halide, supports vectorization, and is intended to be a consistent implementation with a good tradeoff between accuracy and performance.
fast_exp which is implemented in Halide, supports vectorization, and is intended to be optimized for speed with somewhat less accuracy and no support for NaNs an Infs.

Numerous improvements are needed:

The above information should be expanded slightly and put into the Halide header documentation (IROperator.h) to make the differences between these functions clearer to users.
The supported types and some indication of accuracy and range should be provided for the halide_ and fast_ versions. The unadorned versions should be marked as not vectorized in most situations.
In strict_float mode, the halide_ versions should likely handle NaNs and Infs. It seems counter to the intention of the fast_ versions to support NaNs and Infs.
We should consider providing access to vectorized math libraries such as Intel's MKL via some mechanism. Possibilities include a library that surfaces them, adding new functions to Halide's IROperator.h, a target flag that retargets the existing unadorned function names to such a library, etc.
We should expand the set of functions supported. Specifically trigonometric and hyperbolic trigonometric functions, per the latter tanh is used in ML activation implementation.

Specifics that (will) have subissues of their own:

The halide_ and fast_ operators typically only have support for 32-bit float. They are slower than they might be for 16-bit floating point and less accurate than they need to be for 64-bit floating point.
The halide_ routines should deliver accurate results for the full input range, in as much as best current knowledge allows without detrimental impact to efficiency.
New functions that need to be provided such as sin, cos, tanh, etc.

The text was updated successfully, but these errors were encountered:

steven-johnson · 2024-04-17T17:39:40Z

Does halide_sin exist? I wasn't aware of it

zvookin · 2024-04-17T20:25:28Z

Per Steven's comment, rewrote the issue to use exp instead of sin.

Provide feedback