Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure on switch statement with case label in nested block #522

Open
dkolsen-pgi opened this issue Mar 24, 2024 · 5 comments
Open

Comments

@dkolsen-pgi
Copy link
Collaborator

ClangIR hits an assertion failure when a switch statement contains a case label that is within a nested block statement.

int g(int x) {
  switch (x) {
    case 1:
      return -1;
    {
      case 2:
        return -20;
    }
  }
  return x;
}
clang++: .../clang/lib/CIR/CodeGen/CIRGenStmt.cpp:300: 
mlir::LogicalResult cir::CIRGenFunction::buildSimpleStmt(const clang::Stmt*, bool): 
Assertion `0 && "Should not get here, currently handled directly from SwitchStmt"' failed.

While code like this is unlikely to appear in production and will usually be found in test suites that try to break the compiler, it is legal code in both C and C++ and should not trigger an internal compiler error.

@bcardosolopes
Copy link
Member

While code like this is unlikely to appear in production

Famous "last words" haha, I just hit this in prod! Some "temporary" workaround is using this.

Now that @gitoleg work has landed we probably have the machinery to solve this. Any chance (or any plans) you might work on this soon @gitoleg? (cc. @wenpen)

@gitoleg
Copy link
Collaborator

gitoleg commented May 10, 2024

I will take a look in the next few days, sure

@gitoleg
Copy link
Collaborator

gitoleg commented May 14, 2024

Now that @gitoleg work has landed we probably have the machinery to solve this.

This is all about codegen, so I would say the previous work has nothing to do with it.

Speaking about the current issue, it's still not obvious for me how to handle it properly, though I think I can suggest an approach.
First of all, I'm not sure we do the proper thing when we attach random statements to the previous case (or I miss something important here):

   for (auto *c : compoundStmt->body()) {
      if (auto *switchCase = dyn_cast<SwitchCase>(c)) {
        res = buildSwitchCase(*switchCase, condType, caseAttrs);
      } else if (lastCaseBlock) {  // !!! This clause
        // This means it's a random stmt following up a case, just
        // emit it as part of previous known case.
        mlir::OpBuilder::InsertionGuard guardCase(builder);
        builder.setInsertionPointToEnd(lastCaseBlock);
        res = buildStmt(c, /*useCurrentScope=*/!isa<CompoundStmt>(c));
      } else {
        llvm_unreachable("statement doesn't belong to any case region, NYI");
      }

      lastCaseBlock = builder.getBlock();

      if (res.failed())
        break;
    }

The original codegen even doesn't emit any random code as far I can see, and I think we need to do the same. We could extend the current issue example (will refer it as example 1) to make it a little more interesting (example 2):

int g(int x) {
  int y = 1;
  switch (x) {
    case 1:
      return -1;
    {
      y = 42;
      case 2:
        return y;
    }
  }
  return x;
}

The y = 42 statement is not taken in account and g(2) returns exactly 1. So I would say, we need to ignore this statement as well or at least place it in some unreachable scope.

Now, what I would do. I would create a small class for the switch stmt processing with several fields like case attributes, maybe switch variable type - and maintain a pointer to it in the CIRGenFunction.h. Once we enter buildSwitchStmt in the CIR codegen, we would save the current pointer in the temp var, create a new instance of this class, store the pointer into the member field. Once we leave the buildSwitchStmt we restore the pointer. Almost the same is done in the original codegen.

So we will able to handle nested switch statements and we can fix the example 1 .
The second example is less trivial ( and one need to take it into account when will fix the issue, even if it will be me :) ) . But I believe it's doable as well.

@bcardosolopes what do you think? Is it a good/workable solution? Or I miss some details here?

@bcardosolopes
Copy link
Member

This is all about codegen, so I would say the previous work has nothing to do with it.

Right, I wasn't implying that, sorry for the misunderstanding. My initial thinking is that we could probably map these weird inner cases with synthetic gotos/labels, since they violate some scope control-flow.

Speaking about the current issue, it's still not obvious for me how to handle it properly, though I think I can suggest an approach.

Thanks for looking into this.

First of all, I'm not sure we do the proper thing when we attach random statements to the previous case (or I miss something important here):
...
The original codegen even doesn't emit any random code as far I can see, and I think we need to do the same. We could extend the current issue example (will refer it as example 1) to make it a little more interesting (example 2):
..
The y = 42 statement is not taken in account and g(2) returns exactly 1. So I would say, we need to ignore this statement as well or at least place it in some unreachable scope.

If the original codegen is doing that (I haven't checked), I agree we should do the same - It'd probably be nice to maintain the statement for the sake of providing unrecheable diagnostics later on, e.g. even though we ignore the emission, we could emit an empy scope with source loc on a basic block that is unrecheable. But just initially supporting lowering works for me. I also don't see any current warnings for these types of unrecheable statements in clang.

Now, what I would do. I would create a small class for the switch stmt processing with several fields like case attributes, maybe switch variable type - and maintain a pointer to it in the CIRGenFunction.h. Once we enter buildSwitchStmt in the CIR codegen, we would save the current pointer in the temp var, create a new instance of this class, store the pointer into the member field. Once we leave the buildSwitchStmt we restore the pointer. Almost the same is done in the original codegen.

Yep, we currently track some of it already as part of LexicalScopes, we could increment it with more needed info.

So we will able to handle nested switch statements and we can fix the example 1. The second example is less trivial ( and one need to take it into account when will fix the issue, even if it will be me :) ) . But I believe it's doable as well.
@bcardosolopes what do you think? Is it a good/workable solution? Or I miss some details here?

Sounds great to me, thanks for sharing a solution. Do you have any interest to work on this?

@gitoleg
Copy link
Collaborator

gitoleg commented May 16, 2024

@bcardosolopes

Sounds great to me, thanks for sharing a solution. Do you have any interest to work on this?

I do) But I can't promise I'll do it soon. So if someone wants to do it earlier, just let me know, e.g. right here. Otherwise I'll return to this issue later and keep you informed about the progress/problems etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants