Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend ocamlc -output-complete-obj to build .exe #8872

Merged
5 commits merged into from
Sep 25, 2019

Conversation

nojb
Copy link
Contributor

@nojb nojb commented Aug 13, 2019

This PR extends ocamlc -output-complete-obj to build self-contained .exe's for bytecode programs containing both the runtime and any linked C stubs.

The difference with -custom is that here we embed the C bytecode directly in the generated C code file so the resulting binary is more robust than the one produced by -custom (eg it can be stripped).

Note that contrary to binaries produced by -custom the resulting executable cannot be used with tools that need access to the bytecode/symbol table information (eg ocamldebug).

The idea is to deprecate -custom in a follow-up PR.

The code in this PR is based on a patch written by @glondu that modified -custom to obtain essentially the same effect, but it was decided to leave -custom alone because of backwards-compatibility issues (see previous version of this issue description for more information).

@nojb nojb requested a review from a user August 13, 2019 15:21
@nojb nojb mentioned this pull request Aug 13, 2019
@alainfrisch
Copy link
Contributor

This new implementation requires a native toolchain, contrary to the current one. This might exclude some use cases.

@glondu
Copy link
Contributor

glondu commented Aug 13, 2019

This new implementation requires a native toolchain, contrary to the current one.

What do you mean by "native"? Isn't a toolchain already needed to build the custom runtime, no matter how you "link" it with the bytecode?

@nojb
Copy link
Contributor Author

nojb commented Aug 14, 2019

This new implementation requires a native toolchain, contrary to the current one. This might exclude some use cases.

As far as I can see, both implementations make the same call to the native toolchain using Ccomp.call_linker (via Bytelink.build_custom_runtime). Am I missing something?

@nojb
Copy link
Contributor Author

nojb commented Aug 14, 2019

One issue is that thanks to putting the bytecode TOC at the end of the bytecode, tools that work with bytecode could work equally well with -custom and usual bytecode programs, as they can jump over the custom runtime code seamlessly.

After this PR, this is no longer the case. For example, ocamldebug, ocamlobjinfo and cmpbyt (used by ocamltest) no longer work with -custom binaries.

@ghost
Copy link

ghost commented Aug 14, 2019

If we want to keep this feature, one idea would be to put a marker before the bytecode so that tools can extract it.

@alainfrisch
Copy link
Contributor

Yep, sorry, I was thinking of the mode that allows building a custom runtime, and then concatenating some bytecode program to it (without the need to have a C compiler/linker at that stage).

@glondu
Copy link
Contributor

glondu commented Aug 16, 2019

I was just reminded this bug. It might be useful to read in the context of this PR.

@glondu
Copy link
Contributor

glondu commented Aug 16, 2019

Another Debian bug related to the new behaviour of -custom. I remember now why I made the behaviour Debian-specific...

@glondu
Copy link
Contributor

glondu commented Aug 18, 2019

I went through test failures.

Many of them compare the outputs of ocamlc.byte and ocamlc.opt using a custom tool called cmpbyt that parses bytecodes and ignores debug sections. This obviously fails with the new -custom behaviour. But it turns out that the outputs are identical, so adding a whole-file comparison to cmpbyt as done in glondu@bf9434d fixes these tests.

Another test fails because of a missing C include: glondu@ad34e4c.

The last three fail because they try to link in a custom runtime C code that defines a main() function. I didn't even know that was possible, and wonder what the semantics could be.

@gasche
Copy link
Member

gasche commented Aug 18, 2019

I find some of the limitations mentioned on the Debian bugtracker (having ocamlrun and ocamldebug stopping working on -custom bytecode executables) are rather severe (as mentioned in this message, a priori fixing stripping of bytecode executables may not be an aspect that users care about as much). Does making this change upstream allow to consider fixing these limitations? Maybe ocamlrun/ocamldebug and other bytecode-consuming programs could be changed to work correctly with that new representation?

@glondu
Copy link
Contributor

glondu commented Aug 18, 2019

I find some of the limitations mentioned on the Debian bugtracker (having ocamlrun and ocamldebug stopping working on -custom bytecode executables) are rather severe

I consider the ability to run ocamlrun and ocamldebug on -custom bytecode executables with the old behaviour a fortunate anomaly: it means -custom was not needed in the first place.

@gasche
Copy link
Member

gasche commented Aug 18, 2019

Yes, but this anomaly may still be relied upon by an unknown amount of users whose workflow would break with the change -- which does give me cold feet.

@glondu
Copy link
Contributor

glondu commented Aug 18, 2019

The workaround in this specific case is easy.

@xavierleroy
Copy link
Contributor

ocamldebug is perfectly able to work with "true" custom bytecode executables, containing a non-standard runtime:

~/tmp$ cat foo.ml
let _ =
Printf.printf "%f\n" (Unix.gettimeofday())
~/tmp$ ocamlc -custom -g -o foo.exe unix.cma foo.ml
~/tmp$ ocamldebug foo.exe
	OCaml Debugger version 4.07.1

(ocd) go 100
Loading program... done.
Time: 100 - pc: 140316 - module CamlinternalFormat
272   <|b|>buffer_check_size buf str_len;
(ocd) 

Don't break this just to please Debian's obsession with stripped binaries.

@ghost
Copy link

ghost commented Aug 19, 2019

It works with ocamldebug but not with ocamlrun. Looking at htop, I'm guessing that ocamldebug simply starts the program and instructs the runtime via an environment variable to run in debug mode and connect to the ocamldebug process via a socket. Is that correct?

In that case, whatever we do here is unlikely to break ocamldebug working with custom built bytecode executables.

@xavierleroy
Copy link
Contributor

In that case, whatever we do here is unlikely to break ocamldebug working with custom built bytecode executables.

ocamldebug needs to 1- start the program instructing the VM to connect to the debugging socket, and 2- read symbol table and debug information from the bytecode executable. Your current proposal does not break 1, but breaks 2 as far as I understand. Please don't.

@ghost
Copy link

ghost commented Aug 20, 2019

I suggest to put a marker at the beginning of the symbol table and debug information. Then ocamldebug could scan the binary looking for these markers.

@xavierleroy
Copy link
Contributor

I suggest to put a marker at the beginning of the symbol table and debug information.

Apologies if I sound negative, but I think this is a bigger, uglier hack than the one it replaces.

@glondu
Copy link
Contributor

glondu commented Aug 20, 2019

I agree with @xavierleroy here.

I think we can cleanly locate the ELF symbols corresponding to the OCaml [symbol table and debug information] using libbfd or such. Or have the runtime transmit [them] over the debugging socket at startup.

@ghost
Copy link

ghost commented Aug 20, 2019

Both of @glondu ideas seem good to me. The ELF symbol idea seems more generic if we want other tools to be able to easily extract these values. Plus it's the method already used by objinfo_helper.

@xavierleroy
Copy link
Contributor

MacOS and Windows don't use ELF.

@ghost
Copy link

ghost commented Aug 20, 2019

I had a feeling that wouldn't be portable... I thought libbfd might but I wasn't sure. That's why I suggested my ugly hack 🙈

Transmitting the info over the debugging socket should be fine, right?

@alainfrisch
Copy link
Contributor

Another thing that works today is to build a -custom executable, but pass it to an explicit custom runtime (built by ocamlc -make-runtime). The number of use cases that could be broken by the suggested change makes me a bit nervous about it.

IIUC, the proposed new behavior is very similar to using -output-complete-obj foo.{so,dll}. What about extending -output-complete-obj to support producing programs instead of shared libraries (or introducing a new flag with the same effect), and keep -custom unchanged?

@dra27
Copy link
Member

dra27 commented Aug 22, 2019

Rather than scanning binaries at all, given that we already have a special way of invoking the runtime to attach to a debugger, can’t we embed the bytecode as here but also have special ways of invoking the custom runtime binary which cause it to emit its bytecode (for ocamlrun) and symbol and debugging info (for ocamldebug) - wouldn’t the only challenge then be having both ocamlrun and ocamldebug spot the difference between a bytecode image and a custom runtime image?

@dra27
Copy link
Member

dra27 commented Aug 22, 2019

(A similar mechanism might allow the case when header.c is used instead of shebang to survive stripping, not that that affects Debian)

@ghost
Copy link

ghost commented Aug 22, 2019

Another thing that works today is to build a -custom executable, but pass it to an explicit custom runtime (built by ocamlc -make-runtime). The number of use cases that could be broken by the suggested change makes me a bit nervous about it.

@alainfrisch do you expect anyone to rely on this behavior? It's difficult to imagine a use case for constructing both a custom executable and a custom runtime. It seems more natural to choose one or the other.

@alainfrisch
Copy link
Contributor

One could imagine building a custom bytecode executable, but sometimes execute it with a custom runner built with the debug runtime (or another kind of runtime variant, e.g. with extra hooks). Another use case would be creating programs which could also be executed either in a stand-alone way or through a native host program (that would include the OCaml runtime), without having to create a new process.

@gasche
Copy link
Member

gasche commented Sep 24, 2019

Sorry if the question is obvious, but why not just use -output-complete-exe <any-name> to produce an executable instead of an object file?

@nojb
Copy link
Contributor Author

nojb commented Sep 24, 2019

Sorry if the question is obvious, but why not just use -output-complete-exe <any-name> to produce an executable instead of an object file?

Why not? Are there any objections?

@dbuenzli
Copy link
Contributor

At that rate why simply not simply add -output-complete-exe and reuse the -o we all know of ?

@xavierleroy
Copy link
Contributor

So what's the story now ?

There are three properties for ocamlc-generated executables, each being desirable to a subset of OCaml users:

  1. Static linking: a single, native executable contains the bytecode, the virtual machine, and C stub code.
  2. Support for ocamldebug debugging and everything that can be done today with pure bytecode executables.
  3. Resistance to strip, meaning that the generated executable should either be a #! script plus bytecode, or a proper native executable, but not a combination of both.

So far we have two solutions that satisfy two of these properties:

  • ocamlc + dynamic loading of C stub code: 2 and 3
  • ocamlc -custom: 1 and 2.

The proposal in this PR is to have a third solution that satisfies 1 and 3, as a way to please our Dune constituency and our Debian constituency.

For the record, I still think that dynamic loading of C stub code is the way to go. Static linking is so 20th century... So I don't care about 1 and would gladly kill ocamlc -custom and the proposal in this PR. Others disagree.

@gasche
Copy link
Member

gasche commented Sep 24, 2019

@dbuenzli: yes of course, let me adopt your proposal as if I had made it in the first place. (To my defense, I've essentially never used any of these options before.)

@nojb
Copy link
Contributor Author

nojb commented Sep 24, 2019

OK, I changed the implementation to use the new option -output-complete-exe (in combination with -o), as suggested, and updated the manual. This is ready for review again.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ghost ghost merged commit 3aff514 into ocaml:trunk Sep 25, 2019
@ghost
Copy link

ghost commented Sep 25, 2019

Thanks @nojb for this work and everybody else for their input!

raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Nov 24, 2019
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Nov 24, 2019
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Nov 24, 2019
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Nov 28, 2019
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Dec 7, 2019
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
@nojb nojb mentioned this pull request Jan 9, 2020
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Jan 10, 2020
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Jan 21, 2020
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Feb 5, 2020
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
@nojb nojb deleted the glondu_custom_bytecode branch February 23, 2020 09:29
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Aug 12, 2020
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
raspbian-autopush pushed a commit to raspbian-packages/ocaml that referenced this pull request Aug 12, 2020
Origin: ocaml/ocaml#8872

Gbp-Pq: Name 0008-Reimplement-custom-without-hacks.patch
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants