Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use aligned loads in get_partial_safe #12

Open
bill-myers opened this issue Nov 17, 2023 · 3 comments
Open

Use aligned loads in get_partial_safe #12

bill-myers opened this issue Nov 17, 2023 · 3 comments

Comments

@bill-myers
Copy link

In get_partial_safe, it's possible to use aligned loads by declaring the buffer as MaybeUninit<State> and then casting the pointer for std::ptr::copy and zeroing the rest of the buffer, instead of declaring the buffer as an u8 array.

let mut buffer = [0i8; VECTOR_SIZE];
// Copy data into the buffer
std::ptr::copy(data as *const i8, buffer.as_mut_ptr(), len);
// Load the buffer into a __m256i vector
let partial_vector = _mm_loadu_epi8(buffer.as_ptr());

@ogxd
Copy link
Owner

ogxd commented Nov 17, 2023

From what I can see there is a compiler optimization that stack allocates [0i8; VECTOR_SIZE] instead of heap allocating (probably because VECTOR_SIZE is a constant), so MaybeUninit<State> may not be faster.

@ogxd
Copy link
Owner

ogxd commented Dec 23, 2023

About to close this one unless someone has some snippet to propose?

@notsatvrn
Copy link
Contributor

I tried using a struct which contains only the byte array and is marked as #[repr(align(16))]. I have not tested the performance yet, but this should still allocate on the stack and force 16-byte alignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants