Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast native toplevel using JIT #15

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Fast native toplevel using JIT #15

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 6, 2020

Fast native toplevel using JIT

Overview

We (Jane Street + OCL/Tarides) would like to make the native toplevel faster and more self-contained.

At the moment, the native toplevel works by calling the assembler and linker for each phrase. This makes it slow and dependent on an external toolchain which is not great for deployment.

To reach this goal, we would simply like to bring this work up to date and merge it in the compiler.

Motivation

This work would provide a simple way to compile and execute OCaml code at runtime, which would unlock a lot of new possibilities to develop great new tools.

Coupled with the fact that we can already embed cmi files into an executable, this work would make it possible to distribute a self-contained binary that can evaluate OCaml code at runtime. This would make it simple and straightforward to use OCaml as an extension language.

Verified examples in documentation comment

We are particularly interested in this feature for the mdx tool. More precisely, we are currently working on a feature allowing verified toplevel snippets in mli files. For instance:

(** [chop_prefix s ~prefix] returns [s] without the leading [prefix]. *)

    {[
      # chop_prefix "abc" ~prefix:"a";
      - : string option = Some "bc"
      # chop_prefix "abc" ~prefix:"x";
      - : string option = None
    ]}
*)
val chop_prefix : string -> prefix:string -> string

In the above example, the {[ ... ]} would be kept up to date by mdx to ensure that the document stays in sync when the code changes. In fact, the user would initially only write the # lines and mdx would insert the results just as with expectation tests.

The change in detail

This change would add JIT code generation for x86 architectures as described in the paper. For other architectures, we would still rely on the portable method of calling the assembler and linker. The main additions to the compiler code base would be:

  • some code in the backend to do the assembly in process
  • a few more C functions to glue things together

The paper mentions that it adds 2300 lines of OCaml+C code to the compiler code base.

One detail to mention: IIUC the JIT ocamlnat from the paper goes directly from the linear form to binary assembly. Now that we have a symbolic representation of the assembly we could also go from the symbolic assembly in order to share more logic between normal compilation and JIT.

We discussed with @alainfrisch and @nobj since LexiFi has been using an in-memory assembler in production for a while. They mentioned that they would be happy to open-source the code if they can, which means that we could be using code that has been running in production for a long time and is likely to be well tested and correct.

LexiFi's binary emitter is about 1800 lines of code including comments and newlines. This looks a bit smaller than the JIT part of the JIT ocamlnat, so we would still be adding approximately the same amount of code if we went this way.

Drawback

This is one more feature to maintain in the compiler and it comes with a non-negligible amount of code. However, and especially if we can reuse LexiFi in-memory assembler, most of the additions would come from well tested code. @alainfrisch and @nobj also mentioned that this code was very low-maintenance and had pretty much not changed in 5 years.

Alternatives

For the mdx case, we considered a few alternatives.

Using a bytecode toplevel

Mdx currently uses a bytecode toplevel where everything is compiled and executed in byte-code. This includes:

  • code coming from user libraries
  • the full compilation of the toplevel phrases

as a result, mdx is currently very slow and the round-trip time between the user saving a file and seeing the result easily climbs in the tens of seconds.

In the case of Jane Street, we have one more difficulty with this method: a lot of our code doesn't work at all in bytecode because we never use bytecode programs.

Staging the build

Given that mdx is a build tool, one alternative is to redesign the interaction between mdx and the build system. For instance, it could done in stages with a first step where mdx generates some code that is then compiled and executed normally by the build system. This is how the cinaps tool works for instance.

However, it is difficult to faithfully reproduce the behavior of the toplevel with this method. What is more, such a design is tedious and requires complex collaboration between the tool and the build system.

Going through this amount of complexity for every build tool that wants to compile OCaml code on the fly doesn't feel right.

Using a mixed native/bytecode mode

One idea we considered is using a mixed mode where a native application can execute bytecode. This would work well for us as the snippets of code we evaluate on the fly are always small and fast.

However, it is completely new work while the native JIT has already been done. What is more, while it would work for us it might not work for people who care about the performance of the code evaluated on the fly.

A native JIT would likely benefit more people.

Signed-off-by: Jeremie Dimino <jeremie@dimino.org>
@gasche
Copy link
Member

gasche commented Apr 6, 2020

To summarize, your proposal is as follows:

  1. Integrate Lexifi's work on direct binary generation in the compiler upstream.
  2. Add the necessary linker logic to use it from ocamlnat.

This sounds like a very reasonable approach to me. (I had this in mind when I replied to your earlier emails but never formulated it clearly, sorry.)

Minor comment: The way this RFC references earlier work by Marcell Fischbach and Benedikt Meurer is slightly confusing; I'm not sure you would reuse much of their work (except the parts that have already been upstreamed, typically the linear-scan register allocator). In particular their suggestion to have a jit.ml that duplicates each emit.mlp file is not convincing for long-term maintenance but you also don't need it now that the x86_64 backend has an abstract assembler representation: you should be able to call emit, and generate code directly from there (I guess this is what the Lexifi patch does).

@ghost
Copy link
Author

ghost commented Apr 6, 2020

Indeed. I guess the only code part of Marcell Fischbach and Benedikt Meurer's work we would reuse is the C code, which I'm assuming is independent of how the assembly is generated.

@alainfrisch
Copy link
Contributor

To clarify: what we have is a way to generate machine code (+ relocation information) from the "x86 assembly AST" (introduced to share code between the two supported assembly syntaxes). Currently, we dump this machine code with a COFF emitter to produce .obj files, but for the use case discussed here, we'd need to write some dynamic code loader directly from the generated machine code + relocations (i.e. put the code in executable pages and apply the relocation) and symbol tables. This should be rather simple I think (and is perhaps covered by the "C code" from Marcell Fischbach and Benedikt Meurer's work).

rfcs/ocamlnat.md Outdated Show resolved Hide resolved
rfcs/ocamlnat.md Outdated Show resolved Hide resolved
rfcs/ocamlnat.md Outdated Show resolved Hide resolved
Co-Authored-By: Nicolás Ojeda Bär <n.oje.bar@gmail.com>
@ghost
Copy link
Author

ghost commented Apr 21, 2020

We discussed this quickly at the last OCaml developer meeting. There are a few questions around the portability of writing to executable memory.

We are now going to build a prototype using LexiFi binary code emitter and test it on various platforms (Linux, OSX, BSD and Windows) in order to get a clearer picture of the difficulties. Once this is done, we will discuss this proposal further with the rest of the dev team.

@lefessan
Copy link

To the best of my knowledge, the "LexiFi binary code emitter" was, in a large part, written by me at OCamlPro, for LexiFi. It extends the COFF linker written by Alain with an x86/amd64 in-memory assembler (i.e. Intel Symbolic Assembly 32/64 bits to binary code) and an ELF linker for Linux. The code emitter was also included in ocp-memprof and ocpwin, to generate OCaml native code in a cross-toolchain way.
It would probably be more efficient to ask all the authors of the original work if the decision is taken to include this work in OCaml.

@ghost
Copy link
Author

ghost commented Sep 22, 2020

Hi Fabrice, happy to discuss. I'm going to follow up by email to find a time.

@qubit55
Copy link

qubit55 commented Mar 11, 2021

Hello, are there any updates on the progress?

@yawaramin
Copy link

@entrust1234 hi, please don't post 'any update?' comments on issues, it spams everyone who is subscribed. You can subscribe to the issue to receive updates. Thanks!

@ghost
Copy link
Author

ghost commented Mar 30, 2021

FTR, I'm no longer driving the project. My colleague @mshinwell took over. I'll let him and/or @NathanReb comment, but what I heard from them about the JIT was positive :)

@NathanReb
Copy link

A quick update on the JIT for the native toplevel:

We have a working prototype, implemented as a library outside of the compiler. It requires a couple simple hooks to be added to Opttoploop (soon to be the unified Toploop) and to expose some of the existing types and functions defined there but it is all fairly minimal.
Except for LexiFi x86 binary emitter it's about 1 -1.5k lines of code at the moment.

The library provides a Jit.init_top : unit -> unit that uses the above mentioned hooks to set up the JIT in the native toplevel. You can then use Opttoploop or Opttopmain as you normally would and will benefit from the JIT instead of using the external assembler and linker + dynlink.

We're now working on a branch of MDX using the JIT so we can test it on real world use cases such as RealWorldOcaml or on JaneStreet's internal code base, making sure it works as intended and that the performance gain is what we expect.

If that goes well, we'll move on and start upstreaming the changes we need in the native toplevel, hopefully making the JIT available for OCaml 4.13!

@gasche
Copy link
Member

gasche commented Mar 31, 2021

Thanks for the change! I still think this is a very nice project and I'm glad to get the update.

In the interest of starting the bikeshedding early: I'm not sure about the "Jit" name because (1) today people associate JITs with dynamic-recompiling implementations, and not just on-demand code emission, so it comes with a lot of assumptions/associations that are not realized here, and (2) the previous toplevel was already "just in time" in the same sense as your prototype, the main difference is whether you go through external tools or emit binary (encoded assembly) directly. I don't think we should debate this right now, but maybe in the next few weeks/months you may have ideas for alternate names.

@bikallem
Copy link

bikallem commented Apr 28, 2021

If that goes well, we'll move on and start upstreaming the changes we need in the native toplevel, hopefully making the JIT available for OCaml 4.13!

Does this mean native toplevel will be as usable as the bytecode one?

@EduardoRFS
Copy link

@NathanReb is code unload part of the current JIT implementation?

@mshinwell
Copy link

It won't be, although I've had some thoughts as to how to do it.

@EduardoRFS
Copy link

EduardoRFS commented Apr 28, 2021

@mshinwell if you have time, please share, I'm interested on it for Tezos and I got an example working but only if the code has no data reference(I can ensure it by validating the cmm)

https://github.com/EduardoRFS/ocaml-jit-example

@mshinwell
Copy link

I haven't thought about this for literally years, so my memory is hazy. However the general idea was the following.

The most difficult problem is probably that, before unloading, you need to make sure there aren't any left-over code pointers into the dynamically-loaded/generated code. I think the problematic places these could occur are on the stack, in live physical registers or in the OCaml heap.

The stack (and all thread stacks) could be scanned to ensure there are no references into the relevant (i.e. dynamically-loaded/generated) code areas before unloading; if a reference is found, the unloading could be tried later. I think there are various different cases here:

  • there might have been a register spill of one of the relevant code pointers (unlikely but has been seen to happen in the past)
  • there might be a return address on the stack pointing into the relevant area
  • the program counter might actually be in one of the functions in the relevant code areas.

Live physical registers could be scanned in a similar way, using the existing liveness information.

For the heap, the places the code pointers might occur (assuming no Obj.magic tricks etc) are in blocks with tag Closure_tag. I was thinking of having some means by which we could determine when these, for the given dynamically-loaded/generated unit, have become unreachable in the heap (discarded at a minor GC, or swept at a major GC). This is tricky but I wondered if it could be done by instrumenting the compiler's code generation for closures in these compilation units, such that there is always an extra environment field pointing at a unique block (one per dynamically-loaded/generated unit), with something like a finaliser on that unique block. The aim would be to arrange that when the finaliser is called, it is safe to unload the relevant code immediately.

The other area of difficulty concerns statically-allocated data, as you mentioned. Maybe we could just not statically allocate anything for these dynamically-loaded modules.

I tend to think there is probably a more general solution involving specific GC regions for each dynamically loaded/generated compilation unit, though the GC doesn't support anything like this at present.

@mshinwell
Copy link

P.S. In fact the code pointers scheme above relies on all closures in the dynamically-loaded/generated units being dynamically allocated, otherwise the finaliser will never be called.

@gasche
Copy link
Member

gasche commented Sep 1, 2021

@NathanReb would you by chance have some information on the current status of the native-toplevel revival? The "unify the toploop implementations" part was done (in large part) in #10124. Were people able to test the native toplevel inside mdx?

@NathanReb
Copy link

NathanReb commented Sep 2, 2021

We indeed tested it. The work is available on github and is briefly documented here.
It relies on a few forks atm:

I tried to provide clear information on how to set all this up in the various repos so you should be able to try it fairly easily. Please reach out to me if anything needs to be clarified!

While working on this we also spotted differences between the native and bytecode toplevels that need to be fixed on the native toplevel side. These are showcased in the ocaml-jit test suite.

There also seems to be an issue with how .cmxs are built by default that caused trouble when trying to dynamically load them. We use a patched version of dune to work around it but according to @mshinwell and @jeremiedimino this fix belongs directly in the compiler rather than in dune.

Next steps from here are a few patch to the compiler and toplevel libraries

  • Build and install the native toplevel libraries (not ocamlnat) by default
  • Add the required hooks to the toplevel
  • Fix .cmxs building
  • Fix the native toplevel to bring it back in line with the bytecode toplevel

I'll be working on those very shortly as we'd very much like to get this into 4.14!

@gasche
Copy link
Member

gasche commented Jul 16, 2022

I'm curious about the current status of the project. Any news?

@dra27
Copy link
Member

dra27 commented Jul 17, 2022

The compiler work for this was released with 4.14:

Additionally there were two fixes to bring the behaviour of ocaml and ocamlnat closer together, both of which were identified from testing the new ocaml-jit inside Jane Street:

All this work, I believe, is being used internally at Jane Street (rebased onto OCaml 4.12) with a customised version of the mdx tool. I believe @NathanReb and @Leonidas-from-XIV are aiming to release a version of mdx using ocaml-jit (and so using native mode to interpret mdx documents) in the next couple of months.

@gasche
Copy link
Member

gasche commented Jul 17, 2022

Thanks for the news.

One side-effect benefit I hoped for this project is to get a usable ocamlnat toplevel (installed) for all users (I assume that this means upstreaming the native-code-emission logic at some point, but maybe there is a different way). Is this on the roadmap?

Repository owner closed this by deleting the head repository Aug 22, 2022
@shindere
Copy link

shindere commented Aug 29, 2022 via email

@mshinwell
Copy link

I think it's because Jérémie's github account has been deleted.

@mshinwell mshinwell reopened this Aug 29, 2022
@shindere
Copy link

shindere commented Oct 11, 2022 via email

@gasche
Copy link
Member

gasche commented Oct 11, 2022

My understanding is that currently the project is on hold. @dra27 took care of upstreaming the necessary hooks to be able to implement a JIT outside the compiler, which is enough for the mdx use-case, and there has been no more work intended for upstreaming. (At ICFP last September @dra27 mentioned tweaking the installation status of ocamlnat and support binaries iirc.)

Personally I hope that we will eventually get native binary emission in the compiler upstream (or maybe in a well-identified external library), as we discussed when the RFC was originally written, for example reusing the Lexifi code -- the discussion of this is a large part of the RFC. I think this would be especially useful in combination with MetaOCaml, and in general an excellent contribution for the whole ecosystem, not just mdx. (It also come with delicate questions of code maintenance etc.)

But Jérémie is not working on this anymore, and I don't know if the remaining people are interested in doing the extra work to make the project more widely useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet