Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Folding integer additions with operands of mixed bit widths #228

Open
bjacob opened this issue May 12, 2020 · 1 comment
Open

Folding integer additions with operands of mixed bit widths #228

bjacob opened this issue May 12, 2020 · 1 comment

Comments

@bjacob
Copy link

bjacob commented May 12, 2020

ARM NEON has pairwise-folding addition instructions where pairs of narrow (e.g. 8-bit) input lanes are added together and accumulated into wider (e.g. 16-bit) integer lanes. For example SADALP, SADDLP.

This is in addition to plain pairwise-folding additions with all operands of the same bit width, like SADDP.

An extreme case of such folding is the dot-product instructions (SDOT, See PR #127) where the folding addition is performed 4-fold. When one of the source operands has all lanes set to 1's, this acts as a 4-fold addition of 8bit values into 32bit accumulators.

This combination of folding behavior and mixing different bit widths allows to maximize the number of scalar operations done per instruction.

This is very widely used in any integer arithmetic application. For example in matrix multiplication kernels using plain NEON without SDOT, based on the idea of multiplying 8bit input values into 16bit local products (see Issue #226), then pairwise-folding those 16bit products into 32bit accumulators:
https://github.com/google/ruy/blob/808ff748e0c7dc746a413fe45fa022d63e6253e8/ruy/kernel_arm64.cc#L1233

@Maratyszcza
Copy link
Contributor

This is particularly covered by Extended Pairwise Addition instructions (#380)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants