lib-dynlink-private/test.ml failing on the debug runtime #11016

abbysmal · 2022-02-15T16:36:44Z

Hello,

I am currently investigating the testsuite and I found this case.

The culprit is lib-dynlink-private/test.ml.

To reproduce: OCAMLRUNPARAM=v=0 OCAML_CFLAGS=-g USE_RUNTIME=d make one TEST=tests/lib-dynlink-private/test.ml

Reason for the failure: assertion triggered at memory.c:214: https://github.com/ocaml/ocaml/blob/trunk/runtime/memory.c#L214

After debugging for a little while, I found that this happens during the test_cow_repeated segment of the testcase

Assert is tripped when the second run of test_cow_repeated happens.

Breakdown:

First run of test_cow_repeated, during entry, caml_initialize sets c to 42. (current value, 1 like Val_unit) (https://github.com/ocaml/ocaml/blob/trunk/testsuite/tests/lib-dynlink-private/plugin2/cow.ml#L3)
Second run of test_cow_repeated, c is set to 42, except it share the same address at the previous c. Meaning that the address storing c is changed from 42 to 42.

On the second run caml_initialize then fails the assert check, because

c is not a minor value
it is neither Val_unit nor any uninitialized debug canary. (Debug_uninit_major or Debug_uninit_minor)
Rather it is exactly the same value c was initialized with during the previous test_cow_repeated call. (so, 42, which trigger the assert a memory.c:214)

I am not extremely well versed into dynlink's deeper intricacies, so I ended up opening this issue with the following question:

Is this behavior normal (is it expected for c to end up at the same addresse, with an already initialized state from a previous call to Dynlink), or should the assertion be relaxed to account for what could be a specific trait of dynlink?

Thank you,

The text was updated successfully, but these errors were encountered:

xavierleroy · 2022-02-15T18:15:15Z

Thanks for the detective work. This happens in native-code, not in bytecode, right?

We don't fully control what dlopen does. But it seems like if dlopen is called twice to load the same shared library, even in RTLD_LOCAL mode, the actual loading happens only once, so the second time we get the same already-initialized data segment as the first time.

I wonder whether the assertion in caml_initialize is not too clever for its own good... At least there is nothing like that in OCaml 4.

Symmetrically, I wonder whether we can break the GC invariants here. On the second initialization, we're overwriting values computed during the first initialization, but not going through caml_modify, so the old value is not communicated to the GC.

abbysmal · 2022-02-15T21:35:11Z

It's indeed a native code issue, not bytecode.

I'm also investigating similar failures (still in the debug runtime) with the tail modulo cons test directory.

It appears that the same assert triggers (on tmc/partial_application.ml and tmc/tupled_functions) in some conditions when TRMC is used.

I have a rough understanding of TRMC, but looking at #9760, quoting:

This is not well-typed OCaml code (we use in-place mutation on a constructor block), the transformation actually happens on the Lambda intermediate representation rather at source level.

The proper use of "mutable" blocks and "initializing" writes in the generated code makes the TRMC transformation perfectly compatible with Flambda (and Flambda 2, 3, etc.) and Multicore.

I assume this assertion may be a bit too strict in regards to TRMC as well?

gasche · 2022-02-15T22:15:58Z

I assume this assertion may be a bit too strict in regards to TRMC as well?

I guess you are correct. I didn't know of this assertion!
I think we could make TRMC compliant by using the correct Debug_uninit_{major,minor} for our mutable holes.

xavierleroy · 2022-02-16T08:47:47Z

I think we could make TRMC compliant by using the correct Debug_uninit_{major,minor} for our mutable holes.

Val_unit should work fine, if I read the assertion correctly.

The problem with dynlink is that we initialize a location multiple times.

gasche · 2022-02-16T09:25:22Z

Indeed, but we used another value than Val_unit to make debugging easier, which also seems to be the purpose of Debug_uninit_.... (But I need to figure out how to describe those values from Lambda, which is doable but work.)

xavierleroy · 2022-02-16T18:27:39Z

We could also relax the assertion in caml_initialize to check that the old value is tagged 1. It's overwriting a pointer that could cause a problem with the GC. But this is not going to solve the multiple dynlink issue.

kayceesrk · 2022-02-18T03:25:05Z

Symmetrically, I wonder whether we can break the GC invariants here. On the second initialization, we're overwriting values computed during the first initialization, but not going through caml_modify, so the old value is not communicated to the GC.

Wouldn't this be a problem with OCaml 4 as well, which issues a plain write for initialisation and not caml_initialize or caml_modify?

xavierleroy · 2022-02-18T16:36:51Z

Wouldn't this be a problem with OCaml 4 as well, which issues a plain write for initialisation and not caml_initialize or caml_modify?

That's very likely. I'm trying to come up with an example (of double dynlinking) that would crash the GC, but I'll probably need @damiendoligez 's help to construct it.

abbysmal · 2022-02-21T15:14:09Z

I've been trying for a few days to come up with an example to crash it (at least on 5.0), it is quite tricky to exercise.

mshinwell · 2022-08-31T17:18:00Z

As per #11527, double dynlinking can crash the GC on OCaml 4 during compaction, because the dynamic global roots list will contain duplicate entries. I think this is especially likely with Flambda which registers a lot of GC roots per compilation unit. The compactor can't cope with duplicate roots because it will traverse its own encoded pointers without realising they are such.

This was referenced Feb 23, 2022

runtime: introduce Debug_uninit_tmc in misc.h #11055

Closed

Relax caml_initialize assertion in the debug runtime #11078

Merged

abbysmal mentioned this issue Mar 3, 2022

Disable lib-dynlink-private testcase under the debug runtime #11081

Closed

abbysmal mentioned this issue Aug 31, 2022

Unexpected behaviour when dynlinking the same module twice #11527

Open

damiendoligez added dynlink runtime-system labels Oct 18, 2023

damiendoligez self-assigned this Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib-dynlink-private/test.ml failing on the debug runtime #11016

lib-dynlink-private/test.ml failing on the debug runtime #11016

abbysmal commented Feb 15, 2022

xavierleroy commented Feb 15, 2022

abbysmal commented Feb 15, 2022

gasche commented Feb 15, 2022

xavierleroy commented Feb 16, 2022

gasche commented Feb 16, 2022

xavierleroy commented Feb 16, 2022

kayceesrk commented Feb 18, 2022

xavierleroy commented Feb 18, 2022

abbysmal commented Feb 21, 2022 •

edited

mshinwell commented Aug 31, 2022

lib-dynlink-private/test.ml failing on the debug runtime #11016

lib-dynlink-private/test.ml failing on the debug runtime #11016

Comments

abbysmal commented Feb 15, 2022

xavierleroy commented Feb 15, 2022

abbysmal commented Feb 15, 2022

gasche commented Feb 15, 2022

xavierleroy commented Feb 16, 2022

gasche commented Feb 16, 2022

xavierleroy commented Feb 16, 2022

kayceesrk commented Feb 18, 2022

xavierleroy commented Feb 18, 2022

abbysmal commented Feb 21, 2022 • edited

mshinwell commented Aug 31, 2022

abbysmal commented Feb 21, 2022 •

edited