Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] Use common CodeGen::genHomeRegisterParams #101288

Merged
merged 16 commits into from Apr 27, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 8 additions & 6 deletions src/coreclr/jit/abi.cpp
Expand Up @@ -258,14 +258,16 @@ bool ABIPassingInformation::HasExactlyOneStackSegment() const
//
bool ABIPassingInformation::IsSplitAcrossRegistersAndStack() const
{
bool anyReg = false;
bool anyStack = false;
for (unsigned i = 0; i < NumSegments; i++)
if (NumSegments < 2)
return false;

bool isFirstInReg = Segments[0].IsPassedInRegister();
for (unsigned i = 1; i < NumSegments; i++)
{
anyReg |= Segments[i].IsPassedInRegister();
anyStack |= Segments[i].IsPassedOnStack();
if (isFirstInReg != Segments[i].IsPassedInRegister())
return true;
}
return anyReg && anyStack;
return false;
}

//-----------------------------------------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/jit/abi.h
Expand Up @@ -51,8 +51,8 @@ struct ABIPassingInformation
// - On loongarch64/riscv64, structs can be passed in two registers or
// can be split out over register and stack, giving
// multiple register segments and a struct segment.
unsigned NumSegments = 0;
ABIPassingSegment* Segments = nullptr;
unsigned NumSegments;
ABIPassingSegment* Segments;

ABIPassingInformation(unsigned numSegments = 0, ABIPassingSegment* segments = nullptr)
: NumSegments(numSegments)
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/codegen.h
Expand Up @@ -274,7 +274,7 @@ class CodeGen final : public CodeGenInterface
void genEnregisterOSRArgsAndLocals();
#endif

void genHomeSwiftStructParameters(bool handleStack);
void genHomeSplitStructParameters(bool handleStack);
tomeksowi marked this conversation as resolved.
Show resolved Hide resolved

void genCheckUseBlockInit();
#if defined(UNIX_AMD64_ABI) && defined(FEATURE_SIMD)
Expand Down
21 changes: 10 additions & 11 deletions src/coreclr/jit/codegencommon.cpp
Expand Up @@ -2802,7 +2802,6 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
*/

#if !defined(TARGET_RISCV64)
struct RegNode;

struct RegNodeEdge
Expand Down Expand Up @@ -3346,8 +3345,6 @@ void CodeGen::genHomeRegisterParams(regNumber initReg, bool* initRegStillZeroed)
}
}

#endif

// -----------------------------------------------------------------------------
// genGetParameterHomingTempRegisterCandidates: Get the registers that are
// usable during register homing.
Expand Down Expand Up @@ -4122,32 +4119,34 @@ void CodeGen::genEnregisterOSRArgsAndLocals()
}
}

#ifdef SWIFT_SUPPORT
#if defined(SWIFT_SUPPORT) || defined(TARGET_RISCV64)

//-----------------------------------------------------------------------------
// genHomeSwiftStructParameters:
// Reassemble Swift struct parameters if necessary.
// genHomeSplitStructParameters:
// Reassemble split struct parameters if necessary.
//
// Parameters:
// handleStack - If true, reassemble the segments that were passed on the stack.
// If false, reassemble the segments that were passed in registers.
//
void CodeGen::genHomeSwiftStructParameters(bool handleStack)
void CodeGen::genHomeSplitStructParameters(bool handleStack)
{
for (unsigned lclNum = 0; lclNum < compiler->info.compArgsCount; lclNum++)
{
#ifdef SWIFT_SUPPORT
if (lclNum == compiler->lvaSwiftSelfArg)
{
continue;
}
#endif

LclVarDsc* dsc = compiler->lvaGetDesc(lclNum);
if ((dsc->TypeGet() != TYP_STRUCT) || compiler->lvaIsImplicitByRefLocal(lclNum) || !dsc->lvOnFrame)
{
continue;
}

JITDUMP("Homing Swift parameter V%02u: ", lclNum);
JITDUMP("Homing split parameter V%02u: ", lclNum);
const ABIPassingInformation& abiInfo = compiler->lvaGetParameterABIInfo(lclNum);
DBEXEC(VERBOSE, abiInfo.Dump());

Expand Down Expand Up @@ -4177,7 +4176,7 @@ void CodeGen::genHomeSwiftStructParameters(bool handleStack)
else
{
var_types loadType = TYP_UNDEF;
switch (seg.Size)
switch (RISCV64_ONLY(RoundUpToPower2)(seg.Size))
tomeksowi marked this conversation as resolved.
Show resolved Hide resolved
{
case 1:
loadType = TYP_UBYTE;
Expand Down Expand Up @@ -4224,7 +4223,7 @@ void CodeGen::genHomeSwiftStructParameters(bool handleStack)
}
}
}
#endif
#endif // defined(SWIFT_SUPPORT) || defined(TARGET_RISCV64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this going to result in a lot of extra stack stores for RISCV?
At this point it seems like just factoring parts of this helper out and calling it from the same place as genHomeRegisterParams on the split stack segment will end up being a simpler change overall.

Copy link
Contributor Author

@tomeksowi tomeksowi Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't make unnecessary stores from what I examined the diffs but you're right, a RISC-V specific routine (I tried to phrase it so LoongArch can also reuse) turned out much handier.


/*-----------------------------------------------------------------------------
*
Expand Down Expand Up @@ -5523,7 +5522,7 @@ void CodeGen::genFnProlog()
intRegState.rsCalleeRegArgMaskLiveIn &= ~RBM_SWIFT_ERROR;
}

genHomeSwiftStructParameters(/* handleStack */ false);
genHomeSplitStructParameters(/* handleStack */ false);
}
#endif

Expand Down
8 changes: 4 additions & 4 deletions src/coreclr/jit/codegenlinear.cpp
Expand Up @@ -386,13 +386,13 @@ void CodeGen::genCodeForBBlist()
compiler->compCurStmt = nullptr;
compiler->compCurLifeTree = nullptr;

#ifdef SWIFT_SUPPORT
// Reassemble Swift struct parameters on the local stack frame in the
#if defined(SWIFT_SUPPORT) || defined(TARGET_RISCV64)
// Reassemble split struct parameters on the local stack frame in the
// scratch BB right after the prolog. There can be arbitrary amounts of
// codegen related to doing this, so it cannot be done in the prolog.
if (compiler->fgBBisScratch(block) && compiler->lvaHasAnySwiftStackParamToReassemble())
if (compiler->fgBBisScratch(block) && compiler->lvaHasAnyStackParamToReassemble())
{
genHomeSwiftStructParameters(/* handleStack */ true);
genHomeSplitStructParameters(/* handleStack */ true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling genHomeSplitStructParameters only with handleStack=true feels a bit hackish but it does solve the problem and generates the same instructions as previously.

@jakobbotsch Let me know if you think 06e9d65 is the right approach for split structs in general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it only for the stack segments makes sense to me. We should be able to do that for Swift as well now that genHomeRegisterParams is sufficiently general (no need to include that change as part of this PR).

I'm surprised you need this very general treatment for RISC-V. This handling is needed for the Swift ABI because we cannot rebuild the struct layout simply by storing parts passed in registers below the parts passed on stack. The layout does not match and there can be intervening return address on the stack for x64.
For our other ABIs (windows arm64 with varargs, arm32, my impression is LA64 too) the stack segments of the split parameters will always be at SP on entry to the function. Spilling the register parts right below that will result in the correct struct layout. So there is no need to move the stack segments around at all. Is it not also the case for RISC-V? The diff in #101340 has some more context that may shed some light.

For example: this test has a split parameter V04 on windows-arm64:

Parameter V00 ABI info: [00..08) reg x0
Parameter V01 ABI info: 2 segments
  [0] [00..08) reg x1
  [1] [08..16) reg x2
Parameter V02 ABI info: 2 segments
  [0] [00..08) reg x3
  [1] [08..16) reg x4
Parameter V03 ABI info: 2 segments
  [0] [00..08) reg x5
  [1] [08..16) reg x6
Parameter V04 ABI info: 2 segments
  [0] [00..08) reg x7
  [1] [08..16) stack @ +00
Parameter V05 ABI info: [00..16) stack @ +08
Parameter V06 ABI info: [00..16) stack @ +24
Parameter V07 ABI info: [00..16) stack @ +40
Parameter V08 ABI info: [00..16) stack @ +56
Parameter V09 ABI info: [00..16) stack @ +72
Parameter V10 ABI info: [00..16) stack @ +88

To reassemble the struct we allocate V04 at the very top of the frame and store just the register part there:

;  V00 VarArgHandle [V00    ] (  1,  1   )    long  ->  [fp+0x20]  do-not-enreg[V] ld-addr-op
;  V01 arg0         [V01    ] (  1,  1   )  struct (16) [fp+0x28]  do-not-enreg[XSA] multireg-arg addr-exposed <NativeVarargTest.TwoLongStruct>
;  V02 arg1         [V02    ] (  1,  1   )  struct (16) [fp+0x38]  do-not-enreg[XSA] multireg-arg addr-exposed <NativeVarargTest.TwoLongStruct>
;  V03 arg2         [V03    ] (  1,  1   )  struct (16) [fp+0x48]  do-not-enreg[XSA] multireg-arg addr-exposed <NativeVarargTest.TwoLongStruct>
;  V04 arg3         [V04    ] (  1,  1   )  struct (16) [fp+0x58]  do-not-enreg[XSA] multireg-arg addr-exposed <NativeVarargTest.TwoLongStruct>
;  V05 arg4         [V05    ] (  1,  1   )  struct (16) [fp+0x68]  do-not-enreg[XS] addr-exposed <NativeVarargTest.TwoLongStruct>
;  V06 arg5         [V06    ] (  1,  1   )  struct (16) [fp+0x78]  do-not-enreg[XS] addr-exposed <NativeVarargTest.TwoLongStruct>
;  V07 arg6         [V07    ] (  1,  1   )  struct (16) [fp+0x88]  do-not-enreg[XS] addr-exposed <NativeVarargTest.TwoLongStruct>
;  V08 arg7         [V08    ] (  1,  1   )  struct (16) [fp+0x98]  do-not-enreg[XS] addr-exposed <NativeVarargTest.TwoLongStruct>
;  V09 arg8         [V09    ] (  1,  1   )  struct (16) [fp+0xA8]  do-not-enreg[XS] addr-exposed <NativeVarargTest.TwoLongStruct>
;  V10 arg9         [V10    ] (  1,  1   )  struct (16) [fp+0xB8]  do-not-enreg[XS] addr-exposed <NativeVarargTest.TwoLongStruct>
;  V11 loc0         [V11    ] (  1,  1   )    long  ->  [fp+0x18]  do-not-enreg[] must-init
;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;
; Lcl frame size = 16

G_M48227_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x60]!
            stp     x0, x1, [sp, #0x20]
            stp     x2, x3, [sp, #0x30]
            stp     x4, x5, [sp, #0x40]
            stp     x6, x7, [sp, #0x50]     ; extra gunk generated for varargs that we never bothered optimizing
            mov     fp, sp
            str     xzr, [fp, #0x18]    // [V11 loc0]
            str     x0, [fp, #0x20]     // [V00 VarArgHandle]
            str     x1, [fp, #0x28]     // [V01 arg0]
            str     x2, [fp, #0x30]     // [V01 arg0+0x08]
            str     x3, [fp, #0x38]     // [V02 arg1]
            str     x4, [fp, #0x40]     // [V02 arg1+0x08]
            str     x5, [fp, #0x48]     // [V03 arg2]
            str     x6, [fp, #0x50]     // [V03 arg2+0x08]
            str     x7, [fp, #0x58]     // [V04 arg3]               ; register part of split parameter
                                                ;; size=60 bbWeight=1 PerfScore 14.50
G_M48227_IG02:  ;; offset=0x003C

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised you need this very general treatment for RISC-V. This handling is needed for the Swift ABI because we cannot rebuild the struct layout simply by storing parts passed in registers below the parts passed on stack. The layout does not match and there can be intervening return address on the stack for x64.

I may not need it that general for RISC-V, I went for sharing the Swift code because I'm not that familiar with the common parts of the codegen and it did sth similar to what we had for handling of split structs here in genHomeRegisterParams. For prolog of method_1I4S in test struct16args it used to generate a sequence like this:

G_M40687_IG01:  ;; offset=0x0000
            addi           sp, sp, -224
            sd             ra, 0(sp)
            sd             fp, 8(sp)
            addi           fp, sp, 8
            addi           t0, fp, 72
            sd             zero, 8(t0)
            sd             zero, 0(t0)
            addi           t0, t0, 16
            sd             zero, 8(t0)
            sd             zero, 0(t0)
            addi           t0, t0, 16
            sd             zero, 8(t0)
            sd             zero, 0(t0)
            addi           t0, t0, 16
            sd             zero, 8(t0)
            sd             zero, 0(t0)
            addi           t0, t0, 16
            sd             zero, 0(t0)
            sw             a0, 212(fp)
            sd             a1, 192(fp)
            sd             a2, 200(fp)
            sd             a3, 176(fp)
            sd             a4, 184(fp)
            sd             a5, 160(fp)
            sd             a6, 168(fp)
            sd             a7, 144(fp)  # first part of struct
            ld             t0, 224(sp)  # loading second part of struct
            sd             t0, 152(fp)  # and placing it after a7

The common genHomeRegisterParams doesn't generate the last load-store.

For our other ABIs (windows arm64 with varargs, arm32, my impression is LA64 too) the stack segments of the split parameters will always be at SP on entry to the function. Spilling the register parts right below that will result in the correct struct layout. So there is no need to move the stack segments around at all. Is it not also the case for RISC-V? The diff in #101340 has some more context that may shed some light.

Hm, that would be reconstructing the struct cross-frame, interesting. If I understand correctly, "Spilling the register parts right below" in the above asm example would be sd a7 216(sp) instead of sd a7, 144(fp) (and the following load-store)? If so, I need to check if it's possible to have a contiguous space for the struct that way. Where are the spilling instructions normally generated for other archs?

For example: this test has a split parameter V04 on windows-arm64:

Varargs are now unsupported on RISC-V but the above quoted test is similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, "Spilling the register parts right below" in the above asm example would be sd a7 216(sp) instead of sd a7, 144(fp) (and the following load-store)? If so, I need to check if it's possible to have a contiguous space for the struct that way. Where are the spilling instructions normally generated for other archs?

Right, exactly. It would be possible if the stack segment is always guaranteed to start at sp+0 when the split happens. I think the calling conventions are generally designed to allow this because it makes handling of varargs possible (the callee can spill all parameters and know that everything is laid out contiguously on stack).

arm64 handles this in genHomeRegisterParams; it falls out naturally due to how things work out. arm32 has its own mechanism calls "prespills" where it will push registers on entry to the function to make this the case. It is effectively the same thing but represented a different way in the JIT, so I would like to unify it with arm64 at some point (arm32 was the first non-x86-x64 target, so it has a lot of special behavior. We have been working to get rid of it.)

lvaAssignVirtualFrameOffsetsToArgs is responsible for assigning these offsets. It allocates offsets relative to the "virtual 0", which is the first offset of space passed on the stack by the caller. For your split parameter I think it should already be allocating it at address -8, which is what is expected to handle the split properly without moving the stack segment. I am not sure what ends up overwriting this assignment. (FWIW I am rewriting it in #101340 to be based on the new ABI info, so it may be simpler to read the diff there. That PR will definitely allocate your split parameter at -8.)

Varargs are now unsupported on RISC-V but the above quoted test is similar.

Right, I didn't mean to imply that this was a varargs thing, it was just an example (win-arm64 splits parameters, but only if there is also varargs. arm32 splits parameters always, so I could have used some cases from there as an example too).

Copy link
Contributor Author

@tomeksowi tomeksowi Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lvaAssignVirtualFrameOffsetsToArgs is responsible for assigning these offsets. It allocates offsets relative to the "virtual 0", which is the first offset of space passed on the stack by the caller. For your split parameter I think it should already be allocating it at address -8, which is what is expected to handle the split properly without moving the stack segment. I am not sure what ends up overwriting this assignment.

lvaAssignVirtualFrameOffsetsToArgs looks healthy, V04 is at -8 like you expected. But then looks like lvaAllocLocalAndSetVirtualOffset called here is setting the stack offset to -0x48(-72) which ends up being 144 after delta bump.

*************** In lvaAssignFrameOffsets(FINAL_FRAME_LAYOUT)
Set V04 to offset -8
Assign V00 arg0, size=4, stkOffs=-0x4
Pad V01 arg1, size=16, stkOffs=-0x8, pad=4
Assign V01 arg1, size=16, stkOffs=-0x18
Assign V02 arg2, size=16, stkOffs=-0x28
Assign V03 arg3, size=16, stkOffs=-0x38
Assign V04 arg4, size=16, stkOffs=-0x48
Assign V05 loc0, size=4, stkOffs=-0x4c
Assign V06 loc1, size=4, stkOffs=-0x50
Assign V07 loc2, size=4, stkOffs=-0x54
Assign V08 loc3, size=4, stkOffs=-0x58
Assign V09 loc4, size=4, stkOffs=-0x5c
Assign V10 loc5, size=4, stkOffs=-0x60
Assign V11 loc6, size=4, stkOffs=-0x64
Assign V12 loc7, size=4, stkOffs=-0x68
Assign V13 loc8, size=4, stkOffs=-0x6c
Assign V14 loc9, size=4, stkOffs=-0x70
Assign V15 loc10, size=4, stkOffs=-0x74
Assign V16 loc11, size=4, stkOffs=-0x78
Assign V17 loc12, size=4, stkOffs=-0x7c
Assign V18 loc13, size=4, stkOffs=-0x80
Assign V19 loc14, size=4, stkOffs=-0x84
Assign V20 loc15, size=4, stkOffs=-0x88
Assign V21 loc16, size=4, stkOffs=-0x8c
Assign V22 loc17, size=4, stkOffs=-0x90
Assign V24 tmp1, size=4, stkOffs=-0x94
Assign V25 tmp2, size=4, stkOffs=-0x98
Assign V26 tmp3, size=4, stkOffs=-0x9c
Assign V27 tmp4, size=4, stkOffs=-0xa0
Assign V28 tmp5, size=4, stkOffs=-0xa4
Assign V29 tmp6, size=4, stkOffs=-0xa8
Assign V30 tmp7, size=4, stkOffs=-0xac
Assign V31 tmp8, size=4, stkOffs=-0xb0
Assign V32 tmp9, size=4, stkOffs=-0xb4
Assign V33 tmp10, size=4, stkOffs=-0xb8
Assign V34 tmp11, size=4, stkOffs=-0xbc
Assign V35 tmp12, size=4, stkOffs=-0xc0
Assign V36 tmp13, size=4, stkOffs=-0xc4
Assign V37 tmp14, size=4, stkOffs=-0xc8
Assign V38 tmp15, size=4, stkOffs=-0xcc
Assign V39 tmp16, size=4, stkOffs=-0xd0
--- delta bump 216 for FP frame
--- virtual stack offset to actual stack offset delta is 216
-- V00 was -4, now 212
-- V01 was -24, now 192
-- V02 was -40, now 176
-- V03 was -56, now 160
-- V04 was -72, now 144

Not sure if V04 shouldn't fall into one of the continue statements before lvaAllocLocalAndSetVirtualOffset is called...

EDIT: ... maybe here?

if (varDsc->lvIsParam)
{
#ifdef TARGET_ARM64
if (info.compIsVarArgs && varDsc->lvIsRegArg &&
(varDsc->GetArgReg() != theFixedRetBuffReg(info.compCallConv)))
{
// Stack offset to varargs (parameters) should point to home area which will be preallocated.
const unsigned regArgNum = genMapIntRegNumToRegArgNum(varDsc->GetArgReg(), info.compCallConv);
varDsc->SetStackOffset(-initialStkOffs + regArgNum * REGSIZE_BYTES);
continue;
}
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see other stuff allocated on the frame first, before the parameters, so the frame bump can probably not be done in the loop like I suggest and instead needs handling before we go into the loop over the locals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to add some handling in lvaParamHasLocalStackSpace, to return false for the split case. But you will also need to make sure it allocates space for the register segments in the caller for this case (I think you can use lvaIncrementFrameSize).

Makes sense, thank you. I'll try to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the split argument at the top of the frame requires moving everything down the stack, including stack pointer and return address, basically what arm does for varargs (for RISC-V we could limit that only to the last argument register). This is a bit more involved than I anticipated. Since this is just an optimization, I think it's better to revisit it after #101340 is merged to have a cleaner slate to work on.

Meanwhile, could we review & merge this PR as is (temporarily using Swift code for split arg reconstruction)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's more involved than anticipated then I think it's fine to just go with this approach permanently. No need to spend more effort to make the split parameters contiguous on my behalf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I like the idea but there's still a bit work to do on RISC-V before we get to optimizations like this.

}
#endif

Expand Down