Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CPU-specific optimisations #15

Open
pjbgf opened this issue Dec 9, 2022 · 1 comment
Open

Add support for CPU-specific optimisations #15

pjbgf opened this issue Dec 9, 2022 · 1 comment
Assignees

Comments

@pjbgf
Copy link
Owner

pjbgf commented Dec 9, 2022

The cgo version of sha1cd currently servers two purposes:

  • Provides an easy way to assure 100% fidelity with the upstream implementation.
  • Provides best performance for larger payloads.

Once SIMD is implemented, use cases such as go-git can solely rely on the Pure Go implementation and no longer have a requirement to build using CGO_ENABLED=1.

@pjbgf pjbgf self-assigned this Dec 9, 2022
@pjbgf pjbgf changed the title Add support for CPU optimisations (SIMD) Add support for CPU-specific optimisations Feb 25, 2023
@pjbgf
Copy link
Owner Author

pjbgf commented Feb 25, 2023

The performance improvements introduced in the generic implemementation (#34) should make an impact across all platforms. In arm64 this is the delta:

goos: linux
goarch: arm64
pkg: github.com/pjbgf/sha1cd/test
                               │       before       │                    after                    │
                               │       sec/op       │       sec/op         vs base                │
Hash8Bytes/sha1cd-4                   23.050µ ±  3%         3.526µ ±   5%  -84.70% (p=0.000 n=10)
Hash320Bytes/sha1cd-4                 111.24µ ±  3%         19.60µ ±   4%  -82.38% (p=0.000 n=10)
Hash1K/sha1cd-4                       293.45µ ±  2%         53.37µ ±   4%  -81.81% (p=0.000 n=10)
Hash8K/sha1cd-4                       2191.2µ ±  3%         402.3µ ±   5%  -81.64% (p=0.000 n=10)
HashWithCollision/sha1cd-4                                  61.41µ ±   4%


                               │     before      │             after            │
                               │       B/s       │      B/s        vs base      │
Hash8Bytes/sha1cd-4                336.9Ki ±  1%   2216.8Ki ±  4%  +557.97% (p=0.000 n=10)
Hash320Bytes/sha1cd-4              2.742Mi ±  3%   15.569Mi ±  3%  +467.83% (p=0.000 n=10)
Hash1K/sha1cd-4                    3.328Mi ±  2%   18.296Mi ±  4%  +449.71% (p=0.000 n=10)
Hash8K/sha1-4                      27.26Mi ±  2%    27.17Mi ±  1%         ~ (p=0.956 n=10)
HashWithCollision/sha1cd-4                          9.937Mi ±  4%

Making the generic version good enough to be the default, as it beats the cgo implementation for some payload sizes:

BenchmarkHash8Bytes/sha1-4     	                  470190	      2550 ns/op	   3.14 MB/s	       0 B/op	       0 allocs/op
BenchmarkHash8Bytes/sha1cd-4      	          345133	      3504 ns/op	   2.28 MB/s	       0 B/op	       0 allocs/op
BenchmarkHash8Bytes/sha1cd_cgo-4         	   84704	     14858 ns/op	   0.54 MB/s	    2688 B/op	       1 allocs/op
BenchmarkHash320Bytes/sha1-4             	   78384	     13745 ns/op	  23.28 MB/s	       0 B/op	       0 allocs/op
BenchmarkHash320Bytes/sha1cd-4           	   63459	     19532 ns/op	  16.38 MB/s	       0 B/op	       0 allocs/op
BenchmarkHash320Bytes/sha1cd_cgo-4       	   51583	     25543 ns/op	  12.53 MB/s	    2688 B/op	       1 allocs/op
BenchmarkHash1K/sha1-4                   	   31378	     38622 ns/op	  26.51 MB/s	       0 B/op	       0 allocs/op
BenchmarkHash1K/sha1cd-4                 	   22035	     53162 ns/op	  19.26 MB/s	       0 B/op	       0 allocs/op
BenchmarkHash1K/sha1cd_cgo-4             	   29412	     39308 ns/op	  26.05 MB/s	    2688 B/op	       1 allocs/op
BenchmarkHash8K/sha1-4                   	    4258	    293192 ns/op	  27.94 MB/s	       0 B/op	       0 allocs/op
BenchmarkHash8K/sha1cd-4                 	    3087	    391253 ns/op	  20.94 MB/s	       0 B/op	       0 allocs/op
BenchmarkHash8K/sha1cd_cgo-4             	    5084	    211308 ns/op	  38.77 MB/s	    2688 B/op	       1 allocs/op
BenchmarkHashWithCollision/sha1cd-4      	   18715	     62284 ns/op	  10.28 MB/s	       0 B/op	       0 allocs/op
BenchmarkHashWithCollision/sha1cd_cgo-4  	   28069	     44771 ns/op	  14.29 MB/s	    2688 B/op	       1 allocs/op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant