Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an off-by-one error in interpreter's do_invoke #54443

Merged
merged 2 commits into from
May 17, 2024

Conversation

fatteneder
Copy link
Contributor

@fatteneder fatteneder commented May 11, 2024

Fix #54054

@vtjnash vtjnash added status:merge me PR is reviewed. Merge when all tests are passing backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 labels May 11, 2024
src/interpreter.c Outdated Show resolved Hide resolved
@giordano
Copy link
Contributor

Does this need a test?

@giordano
Copy link
Contributor

Especially since there have been multiple pushes after the pull request was accepted, showing that (1) that version wasn't ready and (2) there are no tests making sure the code works as expected.

@DilumAluthge DilumAluthge added needs tests Unit tests are required for this change and removed status:merge me PR is reviewed. Merge when all tests are passing labels May 11, 2024
@fatteneder
Copy link
Contributor Author

fatteneder commented May 11, 2024

The previous version did OOB indexing and so was UB. IDK what kind of test would examine that behavior reliably. Do you have an idea for a test? So there is no reliable test to examine that problem.

@Seelengrab
Copy link
Contributor

One possible test could try to call do_invoke with a very large nargs so that the OOB access would write across a page boundary and cause a segfault.

@fatteneder
Copy link
Contributor Author

Here is an attempt to generate a test case:

# segfault.jl

macro generate_test_case(n)
  as = [ Symbol("a$i") for i in 1:n ]
  compute = Meta.parse(join(as, '+'))
  return esc(quote
    @noinline function sum_54443($(as...))
      return $compute
    end
    @noinline function wrapped_sum_54443($(as...))
      return sum_54443($(as...))
    end
  end)
end

macro call_test_case(n)
  unpack_as = [ :(as[$i]) for i in 1:n ]
  return esc(quote
    as = [ randn() for _ in 1:$n ]
    wrapped_sum_54443($(unpack_as...))
  end)
end

@generate_test_case(10)
@call_test_case(10)

This generates what I think is a problematic invoke call

julia> code_typed(wrapped_sum_54443, NTuple{10,Int64}; optimize=true) # note optimize=true!!
1-element Vector{Any}:
 CodeInfo(
1%1 = invoke Main.sum_54443(a1::Int64, a2::Int64, a3::Int64, a4::Int64, a5::Int64, a6::Int64, a7::Int64, a8::Int64, a9::Int64, a10::Int64)::Int64
└──      return %1
) => Int64

However, I have the problem that I don't even know how to call into do_invoke.
I tested with adding an artificial jl_error(); to do_invoke() and then run the above as julia --compile=min segfault.jl, but without failure.

Adding more printfs tells me that the interpreter is still doing something in jl_eval_value, but just never ends up in do_invoke.
Could it be that the interpreter just sees the CodeInfo generated with optimize=false which contains no explicit :invoke?

julia> code_typed(wrapped_sum_54443, NTuple{10,Int64}; optimize=false)
1-element Vector{Any}:
 CodeInfo(
1 ─      nothing::Core.Const(nothing)
│   %2 = Main.sum_54443(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10)::Int64
└──      return %2
) => Int64

I am really just guessing here ...

@fatteneder
Copy link
Contributor Author

Adding more printfs tells me that the interpreter is still doing something in jl_eval_value, but just never ends up in do_invoke

It actually ends up in do_call doing the computation, which IIUC is the dynamic dispatch and do_invoke would be static dispatch. So is there a way to force static dispatch in the interpreter?

@fatteneder
Copy link
Contributor Author

One possible test could try to call do_invoke with a very large nargs so that the OOB access would write across a page boundary and cause a segfault.

According to the embedding section of the docs the JL_GC_PUSH... macros do no heap allocation, but instead store the elements on the C stack, cf. https://docs.julialang.org/en/v1/manual/embedding/#Memory-Management. So I think this strategy can't be used either to trigger a segfault.


@vtjnash May I ask you for another review and a decision on whether this needs a test or not?
If it needs a test, do you have any suggestions on how to approach this?

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 14, 2024

OOB indexing is not UB. Anyways, this version also looks fine

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 14, 2024

In theory we would use a static analyzer type of test here, but I don't think we provide the bounds information to the analyzer right now, or if it would try to reason about the symbolic bound here if we did

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 14, 2024

The ASAN build tests in CI also should fail if this code ever got used, but we have no tests currently written for it specifically

@fatteneder
Copy link
Contributor Author

OOB indexing is not UB.

Why?
Isn't the Access out of bounds examples from https://en.cppreference.com/w/c/language/behavior equivalent to the problem here?


The ASAN build tests in CI also should fail if this code ever got used, but we have no tests currently written for it specifically

Nice. So I just need to figure out how to call into do_invoke.

@oscardssmith
Copy link
Member

Julia unlike C bounds-checks by default, so it's just an error, not UB.

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 14, 2024

Access is UB, but indexing is generally valid (and, as a special case, indexing one past the last element is always considered valid)

@vtjnash vtjnash added status:merge me PR is reviewed. Merge when all tests are passing and removed needs tests Unit tests are required for this change labels May 14, 2024
@fatteneder
Copy link
Contributor Author

Thanks. Very interesting, I did not know about that.
Here is also a SO post with a quote from the C99 standard that clarifies this: https://stackoverflow.com/a/3145001

@Seelengrab
Copy link
Contributor

Access is UB, but indexing is generally valid

Deref when i is OOB is UB. I don't follow what you mean with "access" and "indexing".

and, as a special case, indexing one past the last element is always considered valid

That's the first I'm hearing of this; could you share the relevant section of the standard? To clarify, are you talking about something like foo = &arr[i] (which is defined not to semantically deref in C, so isn't UB), or are you talking about foo = arr[i], which does semantically deref (and is hence UB if i would be OOB)?

Here is also a SO post with a quote from the C99 standard that clarifies this: https://stackoverflow.com/a/3145001

Note that the SO post is specifically about &arr[i], i.e. taking the address of a would-be array element. The quoted section of the standard clarifies that this is the same as &(*(arr + i)), which is defined to be the same as just arr + i, i.e. there is no dereference taking place semantically (which is what would trigger the UB).

See also this SO post about the situation in C++.

@aviatesk aviatesk merged commit cf94072 into JuliaLang:master May 17, 2024
11 checks passed
@inkydragon inkydragon removed the status:merge me PR is reviewed. Merge when all tests are passing label May 17, 2024
@fatteneder fatteneder deleted the fa/fix-do-invoke branch May 17, 2024 18:36
@gbaraldi
Copy link
Member

This fixed #54054 apparently if you want to use it as a test.

@giordano
Copy link
Contributor

Ah, the test checking #54054 already exists, but no one noticed it because aarch64-linux-gnu is allowed to fail (and it has been failing for way too long).

@aviatesk aviatesk removed the backport 1.11 Change should be backported to release-1.11 label May 18, 2024
@aviatesk aviatesk removed the backport 1.10 Change should be backported to the 1.10 release label May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opaque_closure test segfaulting on aarch64-linux-gnu
9 participants