Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpufeatures: ARM support #378

Closed
tarcieri opened this issue Apr 21, 2021 · 17 comments · Fixed by #393
Closed

cpufeatures: ARM support #378

tarcieri opened this issue Apr 21, 2021 · 17 comments · Fixed by #393

Comments

@tarcieri
Copy link
Member

It would be nice to be able to do ARM feature detection.

A notable use case would be detecting the ARMv8 Cryptography Extensions.

@newpavlov
Copy link
Member

newpavlov commented Apr 24, 2021

Unfortunately, it's not possible to do properly without RFC 2725. AFAIK ARM does not have a dedicated CPUID-like (BTW RISC-V is similar in this regard, while it has a CPUID-like instruction, it requires privileged mode), on come CPUs it's possible to detect capabilities by reading system registers, but I am not sure about portability of such solution. Currently, the best solution is to rely on a platform-dependent feature detection as we do in sha-1 and sha2. But I am not sure if we should place this code into cpuid-bool.

@tarcieri
Copy link
Member Author

on come CPUs it's possible to detect capabilities by reading system registers

Curious about this

@newpavlov
Copy link
Member

See the top answer here.

@tarcieri
Copy link
Member Author

These seem potentially relevant:

IID_AA64_ISAR{0,1}_EL1 have information about which instructions are implemented on the device: CRC. SHA, Atomic, random numbers, etc.

@newpavlov
Copy link
Member

Can you check if it works in user space on your M1 Mac? Later I can check your snippet on a RPi4 board.

@tarcieri
Copy link
Member Author

Sure, I can give it a try

@tarcieri
Copy link
Member Author

tarcieri commented Apr 26, 2021

It seems those registers are not accessible from userspace at all 😢

However, now I realize that we can pretty safely assume that cfg(all(target_arch = "aarch64", target_os = "macos")) means the ARMv8 Cryptography Extensions will be available.

Not sure what to do on Linux. Seems we may need to use something like /proc?

@tarcieri
Copy link
Member Author

tarcieri commented Apr 26, 2021

It looks like if we were to use the libc crate (or potentially emit our own system call instruction), we could use libc::getauxval(AT_HWCAP) to obtain the data from these registers in userspace on Linux:

https://www.kernel.org/doc/html/latest/arm64/elf_hwcaps.html

Namely:

  • HWCAP_AES: Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001.
  • HWCAP_SHA1: Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001.
  • HWCAP_SHA2: Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001.

@newpavlov
Copy link
Member

It looks like if we were to use the libc crate (or potentially emit our own system call instruction), we could use libc::getauxval(AT_HWCAP) to obtain the data from these registers in userspace on Linux

Yes, it's what I meant by "platform-dependent feature detection" and it's exactly that we currently do in sha1 and sha-2.

@tarcieri
Copy link
Member Author

Aha! Neat!

What do you think about moving that functionality into this crate, and having an abstraction that works across both Linux and macOS?

@newpavlov
Copy link
Member

newpavlov commented Apr 26, 2021

I think cpuid-bool is a bit misleading name for a crate with ARM target feature detection support, but I guess creating a separate crate or renaming it is not worth the trouble. Though I am not sure about hard-coding "ARM + macOS = Crypto extension support". Can you check if the AT_HWCAP code works on macOS or if there is a similar solution? Also I wonder if AT_HWCAP would work on other *NIX targets.

On ARM and non-supported OSes/enviroments we probably should simply fall back to the compile-time target feature detection.

@tarcieri
Copy link
Member Author

tarcieri commented Apr 26, 2021

Here's a guide:

https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code

They suggest using sysctl(3) which appears to be available via the libc crate.

It looks like the feature detection flags reside in the hw.optional namespace:

hw.optional.floatingpoint: 1
hw.optional.watchpoint: 4
hw.optional.breakpoint: 6
hw.optional.neon: 1
hw.optional.neon_hpfp: 1
hw.optional.neon_fp16: 1
hw.optional.armv8_1_atomics: 1
hw.optional.armv8_crc32: 1
hw.optional.armv8_2_fhm: 1
hw.optional.armv8_2_sha512: 1
hw.optional.armv8_2_sha3: 1
hw.optional.amx_version: 2
hw.optional.ucnormal_mem: 1
hw.optional.arm64: 1

Somewhat curiously AES, SHA-1, and SHA-256 aren't listed on my Mac Mini M1.

Edit: here is the Golang-related discussion about this particular issue.

This comment in particular suggests it is and will always be safe to assume aes, sha1, and sha2 are available on Apple aarch64 targets, citing clang/llvm contributions by Apple engineers as precedent.

There's also a followup comment by David Benjamin from the BoringSSL team saying they do the same thing and offers this rationale:

My impression so far (prior to ARM macOS) was that they largely used static armcaps, dispatching with universal binaries and the app store. Now they're on another generation of aarch64 features and macOS has an ARM port, it makes sense that they'd would shift more to sysctlbyname beyond that initial baseline.

@tarcieri
Copy link
Member Author

tarcieri commented Apr 26, 2021

@newpavlov

I think cpuid-bool is a bit misleading name for a crate with ARM target feature detection support, but I guess creating a separate crate or renaming it is not worth the trouble.

FWIW I just registered the cpufeatures crate if you'd like a more generic name (#381)

@tarcieri tarcieri changed the title cpuid-bool: ARM support cpufeatures: ARM support Apr 29, 2021
@newpavlov
Copy link
Member

This comment in particular suggests it is and will always be safe to assume aes, sha1, and sha2 are available on Apple aarch64 targets, citing clang/llvm contributions by Apple engineers as precedent.

In this case I think Rust should enable the relevant target features itself for these targets, similarly to how SSE2 is enabled by default for x86_64-* targets. It means that the proposed fallback to compile-time detection outside of Linux should do the thing for us.

@tarcieri
Copy link
Member Author

tarcieri commented Apr 30, 2021

It should, and in that case ideally the feature detection macro will optimize away.

However, ideally I think we shouldn't need to gate CPU feature detection on OS, and the OS specifics should be handled in this crate.

That means on MacOS there will need to be a set of "armcaps" that are always available, and others that need to be queried via sysctl(3).

@newpavlov
Copy link
Member

newpavlov commented Apr 30, 2021

ideally I think we shouldn't need to gate CPU feature detection on OS

Yes and it's exactly the purpose of RFC 2725.

and the OS specifics should be handled in this crate

No, ideally it should be handled either by std or in no_std contexts by third-party platform-specific crates à la allocator crates. Unfortunately, Rust is simply underdeveloped in this area at the moment.

@tarcieri
Copy link
Member Author

tarcieri commented Apr 30, 2021

Well sure, this crate is a stopgap. But it doesn't seem like there has been a whole lot of progress on RFC 2725?

In the meantime it seems like we should be working on something with a similar API shape at least.

tarcieri added a commit that referenced this issue May 26, 2021
All Apple ARM64 hardware has the same baseline set of statically known
capabilities which can be assumed on all iOS (and macOS) platforms.

This commit adds support for those statically known capabilities on iOS.

Unfortunately it does not appear to be possible to access the
`sysctl(3)` namespace on iOS in order to determine the availability of
other CPU features which aren't part of this baseline set the same way
we can on macOS, so a static capability set is the best we can do.

See this issue for more information:

#378
tarcieri added a commit that referenced this issue May 26, 2021
All Apple ARM64 hardware has the same baseline set of statically known
capabilities which can be assumed on all iOS (and macOS) platforms.

This commit adds support for those statically known capabilities on iOS.

Unfortunately it does not appear to be possible to access the
`sysctl(3)` namespace on iOS in order to determine the availability of
other CPU features which aren't part of this baseline set the same way
we can on macOS, so a static capability set is the best we can do.

See this issue for more information:

#378
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants