Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib-dynlink-private/test.ml failing on the debug runtime #11016

Open
abbysmal opened this issue Feb 15, 2022 · 10 comments
Open

lib-dynlink-private/test.ml failing on the debug runtime #11016

abbysmal opened this issue Feb 15, 2022 · 10 comments

Comments

@abbysmal
Copy link
Contributor

Hello,

I am currently investigating the testsuite and I found this case.

The culprit is lib-dynlink-private/test.ml.

To reproduce: OCAMLRUNPARAM=v=0 OCAML_CFLAGS=-g USE_RUNTIME=d make one TEST=tests/lib-dynlink-private/test.ml

Reason for the failure: assertion triggered at memory.c:214: https://github.com/ocaml/ocaml/blob/trunk/runtime/memory.c#L214

After debugging for a little while, I found that this happens during the test_cow_repeated segment of the testcase

Assert is tripped when the second run of test_cow_repeated happens.

Breakdown:

On the second run caml_initialize then fails the assert check, because

  • c is not a minor value
  • it is neither Val_unit nor any uninitialized debug canary. (Debug_uninit_major or Debug_uninit_minor)
  • Rather it is exactly the same value c was initialized with during the previous test_cow_repeated call. (so, 42, which trigger the assert a memory.c:214)

I am not extremely well versed into dynlink's deeper intricacies, so I ended up opening this issue with the following question:

Is this behavior normal (is it expected for c to end up at the same addresse, with an already initialized state from a previous call to Dynlink), or should the assertion be relaxed to account for what could be a specific trait of dynlink?

Thank you,

@xavierleroy
Copy link
Contributor

Thanks for the detective work. This happens in native-code, not in bytecode, right?

We don't fully control what dlopen does. But it seems like if dlopen is called twice to load the same shared library, even in RTLD_LOCAL mode, the actual loading happens only once, so the second time we get the same already-initialized data segment as the first time.

I wonder whether the assertion in caml_initialize is not too clever for its own good... At least there is nothing like that in OCaml 4.

Symmetrically, I wonder whether we can break the GC invariants here. On the second initialization, we're overwriting values computed during the first initialization, but not going through caml_modify, so the old value is not communicated to the GC.

@abbysmal
Copy link
Contributor Author

It's indeed a native code issue, not bytecode.

I'm also investigating similar failures (still in the debug runtime) with the tail modulo cons test directory.

It appears that the same assert triggers (on tmc/partial_application.ml and tmc/tupled_functions) in some conditions when TRMC is used.

I have a rough understanding of TRMC, but looking at #9760, quoting:

This is not well-typed OCaml code (we use in-place mutation on a constructor block), the transformation actually happens on the Lambda intermediate representation rather at source level.

The proper use of "mutable" blocks and "initializing" writes in the generated code makes the TRMC transformation perfectly compatible with Flambda (and Flambda 2, 3, etc.) and Multicore.

I assume this assertion may be a bit too strict in regards to TRMC as well?

@gasche
Copy link
Member

gasche commented Feb 15, 2022

I assume this assertion may be a bit too strict in regards to TRMC as well?

I guess you are correct. I didn't know of this assertion!
I think we could make TRMC compliant by using the correct Debug_uninit_{major,minor} for our mutable holes.

@xavierleroy
Copy link
Contributor

I think we could make TRMC compliant by using the correct Debug_uninit_{major,minor} for our mutable holes.

Val_unit should work fine, if I read the assertion correctly.

The problem with dynlink is that we initialize a location multiple times.

@gasche
Copy link
Member

gasche commented Feb 16, 2022

Indeed, but we used another value than Val_unit to make debugging easier, which also seems to be the purpose of Debug_uninit_.... (But I need to figure out how to describe those values from Lambda, which is doable but work.)

@xavierleroy
Copy link
Contributor

We could also relax the assertion in caml_initialize to check that the old value is tagged 1. It's overwriting a pointer that could cause a problem with the GC. But this is not going to solve the multiple dynlink issue.

@kayceesrk
Copy link
Contributor

Symmetrically, I wonder whether we can break the GC invariants here. On the second initialization, we're overwriting values computed during the first initialization, but not going through caml_modify, so the old value is not communicated to the GC.

Wouldn't this be a problem with OCaml 4 as well, which issues a plain write for initialisation and not caml_initialize or caml_modify?

@xavierleroy
Copy link
Contributor

Wouldn't this be a problem with OCaml 4 as well, which issues a plain write for initialisation and not caml_initialize or caml_modify?

That's very likely. I'm trying to come up with an example (of double dynlinking) that would crash the GC, but I'll probably need @damiendoligez 's help to construct it.

@abbysmal
Copy link
Contributor Author

abbysmal commented Feb 21, 2022

I've been trying for a few days to come up with an example to crash it (at least on 5.0), it is quite tricky to exercise.

@mshinwell
Copy link
Contributor

As per #11527, double dynlinking can crash the GC on OCaml 4 during compaction, because the dynamic global roots list will contain duplicate entries. I think this is especially likely with Flambda which registers a lot of GC roots per compilation unit. The compactor can't cope with duplicate roots because it will traverse its own encoded pointers without realising they are such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants