Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv: do not attempt CSE of int constants #10608

Closed
wants to merge 2 commits into from
Closed

Conversation

nojb
Copy link
Contributor

@nojb nojb commented Sep 5, 2021

This PR fixes a performance bug in the RISC-V code generator which can cause long compilation times under certain circumstances. As reported in #10591 this situation is triggered when compiling driver/main_args.ml.

Typically CSE is not attempted for integer constants, but the bit of code removed in this PR was actually directing the CSE pass to apply to integer constants outside the range in which they can be used as immediate operands. The lifetime intervals of temporaries holding such constants were then, in some cases, made significantly longer which taxed the register allocator.

@nojb
Copy link
Contributor Author

nojb commented Sep 5, 2021

For the record: in order to debug this problem I found the RISC-V VM too slow. Instead I copied the riscv architecture-dependent files to asmcomp (proc.ml, arch.ml, etc), then built the bytecode version of ocamlopt and proceeded to build the file I was interested in (driver/main_args.ml) with -stop-after scheduling added to the usual command line, eg

./boot/ocamlrun ./ocamlopt -stop-after scheduling -I stdlib -nostdlib -I utils -I driver -I parsing -c driver/main_args.ml

This allowed me to investigate compilation time, all the intermediate languages, etc in an amd64 machine.

Just in case this is useful in other situations where it is not convenient to debug "in-host".

@xavierleroy
Copy link
Contributor

Congratulations on your detective work!

Let's take a few days to think about the real fix. The problem (of CSE working too hard to factor out integer constants) could happen with target architectures other than RISC-V. It's just that RISC-V has the most restrictive view of what constitutes a "cheap" integer constant of all the target architectures, so it's the first to exhibit the problem, but the solution should take all the target architectures into account.

@edwintorok
Copy link
Contributor

edwintorok commented Sep 5, 2021

On RISC-V another possibility is to put non-immediate integer constants into a special .srodata section that can be loaded by using gp-relative addressing in a single instruction. The space here is limited (just a 4K page IIUC), but constants are deduped, and until it runs out it might be a good alternative (I would assume that most programs would use a small number of unique constants).
Because this page is accessed frequently it has a reasonable chance of being in one of the L* caches too, so should be cheaper than a regular load from a rip-relative address (like the compiler emits for floating-point constants), which on first use would always be a cache miss.
See https://www.sifive.com/blog/all-aboard-part-3-linker-relaxation-in-riscv-toolchain, and https://gcc.godbolt.org/z/xvjM7zEEv for an example of what GCC does (Clang doesn't implement this yet). I think if the linker relaxation method is used then the linker will take care of putting the constant into the appropriate section if possible and just fallback to a regular load otherwise. Not sure whether the added complexity would be worth it.

@xavierleroy
Copy link
Contributor

For RISC-V we currently use the li assembler pseudo-instruction, which expands to 1 to 8 (!) arithmetic instructions. However, 32-bit signed integer constants produce at most 2 instructions, so it's only for very big constants that the gp-relative load is preferable.

@nojb
Copy link
Contributor Author

nojb commented Sep 8, 2021

Closing in favor of #10615

@nojb nojb closed this Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants