Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT/linux-arm64] AF: codeManager->IsUnwindable(pvAddress) || runtime->IsConservativeStackReportingEnabled() #101896

Closed
jkotas opened this issue May 6, 2024 · 12 comments · Fixed by #101927
Labels

Comments

@jkotas
Copy link
Member

jkotas commented May 6, 2024

Native AOT outer loop tests are hitting this assert very frequently:

Debug Assertion Violation

Expression: 'codeManager->IsUnwindable(pvAddress) || runtime->IsConservativeStackReportingEnabled()'

File: /__w/1/s/src/coreclr/nativeaot/Runtime/thread.cpp, Line: 689

Examples:
https://github.com/dotnet/runtime/pull/85694/checks?check_run_id=24608826010
https://github.com/dotnet/runtime/pull/101858/checks?check_run_id=24608136220
https://github.com/dotnet/runtime/pull/101767/checks?check_run_id=24511487322

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 6, 2024
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member Author

jkotas commented May 6, 2024

@VSadov Could you please take a look? We seem to be hitting it for calls right before epilogs:

Example:

   0xaaaacff3f0f8 <System.Collections.Generic.List`1<System___Canon>___ctor_1+376>:     ldr     x0, [x29, #24]
   0xaaaacff3f0fc <System.Collections.Generic.List`1<System___Canon>___ctor_1+380>:     adrp    x11, 0xaaaad018c000
   0xaaaacff3f100 <System.Collections.Generic.List`1<System___Canon>___ctor_1+384>:     add     x11, x11, #0x300
   0xaaaacff3f104 <System.Collections.Generic.List`1<System___Canon>___ctor_1+388>:     ldr     x30, [x11]
   0xaaaacff3f108 <System.Collections.Generic.List`1<System___Canon>___ctor_1+392>:     blr     x30 // RhpInterfaceDispatch8
   0xaaaacff3f10c <System.Collections.Generic.List`1<System___Canon>___ctor_1+396>:     ldp     x21, x22, [sp, #64] <- IP that we are crashing on
   0xaaaacff3f10c <System.Collections.Generic.List`1<System___Canon>___ctor_1+396>:     ldp     x21, x22, [sp, #64]
   0xaaaacff3f110 <System.Collections.Generic.List`1<System___Canon>___ctor_1+400>:     ldp     x19, x20, [sp, #48]
   0xaaaacff3f114 <System.Collections.Generic.List`1<System___Canon>___ctor_1+404>:     ldp     x29, x30, [sp], #80
   0xaaaacff3f118 <System.Collections.Generic.List`1<System___Canon>___ctor_1+408>:     ret

@jkotas jkotas added the blocking-clean-ci-optional Blocking optional rolling runs label May 6, 2024
@VSadov
Copy link
Member

VSadov commented May 6, 2024

The ip seems to be in epilog. There should not be interruptible points in epilog. That is what could trigger the assert.
I will take a look. Must be some recent change.

@VSadov
Copy link
Member

VSadov commented May 6, 2024

The problem is not with the call being right before an epilog, but with detecting epilogs on linux-ARM64 in general.

The #101647 started using LR as a general purpose register when doing indirect calls and that broke our logic that detects whether a thread is in an epilog on linux-arm64

Using LR as a GPR is not illegal, just something that we did not do before, so once we see LR (or FP) loaded with some value, the epilog detection logic assume that we are in an epilog.

Long-term (and assuming that some platforms will never learn how to unwind in epilogs), I think we should encode the epilog ranges, in GC info, I suppose. It will not be trivial as there could be more than one epilog in a method, so encoding them all and then searching through them could be a bit tricky.
However, it will be more portable approach.

For now we can have a simpler fix - we do not have to use LR for indirect calls, In fact REG_INDIRECT_CALL_TARGET_REG seems like a better fit for this anyways.

@jkotas
Copy link
Member Author

jkotas commented May 6, 2024

The #101647 started using LR as a general purpose register when doing indirect call

IIRC, we have been using LR as a general-purpose register for a long time, it just did not happen very often in practice.

@VSadov
Copy link
Member

VSadov commented May 6, 2024

IIRC, we have been using LR as a general-purpose register for a long time, it just did not happen very often in practice.

We would have seen this assert once in a while then.

I think one of the reasons why #101647 could unconditionally use LR for indirect calls register is that LR is not generally used as GPR. (that is not a case on arm32 though, where registers are in a short supply and LR is used as a scratch)

We need to confirm this until we have a better solution (i.e. #101932)

@VSadov
Copy link
Member

VSadov commented May 6, 2024

It looks like LSRA excludes LR from availableIntRegs

#ifdef TARGET_ARM64
availableIntRegs = (RBM_ALLINT & ~(RBM_PR | RBM_FP | RBM_LR) & ~compiler->codeGen->regSet.rsMaskResvd);
#elif defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)

Not sure if that is sufficient to prevent its use as a GPR. Looks like that is the intent there.
If yes, we may want to add a comment about NativeAOT dependency, while there is a dependency.

@VSadov
Copy link
Member

VSadov commented May 6, 2024

@jakobbotsch @AndyAyersMS - Is the exclusion of LR from availableIntRegs in LSRA sufficient for preventing its use as a GPR?

(just want to make sure the current assumption is safe)

@jakobbotsch
Copy link
Member

I think one of the reasons why #101647 could unconditionally use LR for indirect calls register is that LR is not generally used as GPR.

I picked LR because the upcoming bl is going to unconditionally change it anyway (setting it to the return address), so using it for something is guaranteed to impact nothing (well... that was my thinking). Any other register could potentially be used as an argument register for non-standard calling conventions.

@jakobbotsch @AndyAyersMS - Is the exclusion of LR from availableIntRegs in LSRA sufficient for preventing its use as a GPR?

Yes, that should be sufficient (I verified this as well).

@jakobbotsch
Copy link
Member

I'm not sure why exactly REG_LR is excluded, but I don't think it's deliberately because of this particular issue. It seems like it was done in dotnet/coreclr#14606 at which point REG_LR was not contained in REG_VAR_ORDER, yet #45135 added it into REG_VAR_ORDER without adding it to availableIntRegs.

@VSadov
Copy link
Member

VSadov commented May 6, 2024

I'm not sure why exactly REG_LR is excluded

ABI seems to permit using LR as a scratch, but I would not be surprised if using LR as GPR could break something. Some asm stub could be forgetting to treat is as a scratch or some tooling issue. It could be just a defensive change while dealing with other bugs.

I will add a comment about the dependency that we know about. We can consider allowing using LR as scratch once #101932 is addressed.

@jkotas
Copy link
Member Author

jkotas commented May 6, 2024

if using LR as GPR could break something

It would be a bug that we would want to fix. (I have seen it used as a scratch in C/C++ compiled code too.)

@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants