Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix compiler-libs, Take 2 #9694

Closed
wants to merge 16 commits into from
Closed

Conversation

nojb
Copy link
Contributor

@nojb nojb commented Jun 18, 2020

This is Take 2 of my previous attempt to prefix compiler-libs, #2218. Building on what was done in #9211, the present PR is simpler, and shorter, than the previous attempt.

The PR is ready for review. It can be read commit-by-commit (but keep in mind that due to the nature of the PR, the intermediate commits are not fully compilable -- most often due to missing dependency information). Note that the PR has already been submitted to precheck, which it passes successfully.

To get a sense of what the PR is doing, you can see a diff in the list of installed files here and a log of the build of the compiler here.

What?

We modify the compilation scheme of compiler-libs so as to nest every one of its modules within just a few top-level modules, corresponding to the existing libraries:

  • Ocamlcommon
  • Ocamlbytecomp
  • Ocamlmiddleend and Ocamloptcomp
  • Ocamltoplevel and Ocamlopttoplevel

(libraries on the same line share common modules.)

For example, Typecore becomes Ocamlcommon.Typecore, Arch becomes Ocamloptcomp.Arch, etc.

The main advantage is that the global scope is no longer (as) polluted.

How?

Using module aliases.

The approach is the one explained in this message by @garrigue:

  1. do not touch the source code or the source file names, which continue to use only "short" names,
  2. use -open to compile using the "short" names, and -o to produce unit names with "long" names, and
  3. post-process the output of ocamldep to rewrite "short" names into "long" names.

Below we give more details.

Per-file compilation rules

For each module compiler-libs we generate a compilation rule using -o targetting a unit name with the corresponding prefix. For example, taking bytecomp/meta as a running example, it is included in the library ocamlbytecomp and the prefix is Ocamlbytecomp__. For this file, we will generate a compilation rule

bytecomp/ocamlbytecomp__meta.cmo: bytecomp/meta.ml
    $(CAMLC) $(CAMLFLAGS) -o $@ -c $<

Note that some modules are included in more than one library: this is the case notably for the modules in ocamlmiddleend, which are also included in ocamloptcomp. In that case, we simply made a choice of prefix (in that case, ocamloptcomp__ was chosen).

The rules for each file are generated using a make macro, from the variables defined in Makefile.compilerlibs.

Map files

In order to map "short" (unprefixed) names to "long" (prefixed) names we generate "map" files (one per library) of the form

(* ocamlbytecomp.mli *)
module Meta = Ocamlbytecomp__meta
...

These files are generated in compilerlibs/ using an awk script, again based on the variables of Makefile.compilerlibs.

The map files are compiled with -no-alias-deps -w -49, do not depend on any other module and every other module depends on them.

Compilation flags

In order to avoid modifying the source files and to keep the existing workflow, the following compilation flags are added to COMPFLAGS.

  • -open Ocamlcommon -open Ocamlbytecomp -open Ocamloptcomp -open Ocamtoplevel -open Ocamlopttoplevel
  • -no-alias-deps.

and INCLUDES gets modified by adding

  • -I compilerlibs

as this is necessary so that map files above can be found.

Postprocessing of .depend

As ocamldep does not have support for prefixing, we postprocess its output to add the prefixes.

The postprocessing is done by a sed script which is itself automatically generated. Concretely, when setting up compilation rules for each module a script compilerlibs/<library name>.sed is generated that takes care of rewriting, say, bytecomp/meta.cmi to bytecomp/ocamlbytecomp__meta.cmi in .depend.

The same script used to post-process .depend for compiler-libs is used for tools, debugger, ocamldoc, etc.

Backwards compatibility

Switching to a prefixed version of compiler-libs will break all existing code using it. The fix is fairly simple (add some -open flags), but we would like to do better. For that, in addition to the prefixing the libraries as above, we generate, for each module, a compatibility shim of the form:

[@@@ocaml.deprecated "Use Ocamlcommon.Meta instead."]
include Ocamlcommon.Meta

These modules are also linked in the corresponding libraries, so that users can continue to use the unprefixed modules (with a deprecation warning). As compiler-libs does not have an official policy of backwards-compatibility, this shim could be removed within one release of the compiler.

Topdirs

There is one module in compiler-libs which could be considered to be part of the "public API" of the compiler: Topdirs. This module aids interaction with the toplevel, and is used in different ways (by findlib, the different toplevels, the debugger, user .ocamlinit scripts, etc.) Some of those uses we control and we could update to use a "prefixed" Topdirs, but others are not within our control and chaning the name of the compilation unit would be breaking change.

For this reason, Topdirs and Opttopdirs (its analog for ocamlnat) are left unprefixed, preserving backwards compatibility.

A variable DO_NOT_PREFIX in Makefile.compilerlibs to used to control which modules are left unprefixed.

ocamldoc for compiler-libs

ocamldoc does not handle prefixed libraries very well; in this PR the documentation generated for compiler-libs is still good thanks to the compatibility shims. But once those shims are gone, the documentation for compiler-libs will not work anymore. I see a couple of options:

  • Extend the hack made for Stdlib in ocamldoc to compiler-libs
  • Stop distributing the documentation for compiler-libs

Performance

As the build system does more work now, it is important that performance is not overly affected.
Below are some informal timings for make -j1 (starting from clean repository, just after ./configure) in my machine

Time
Pre-PR 5m51
Post-PR 6m45
Post-PR (w/o compatibility shim) 6m

That is, the compatibility shims in this PR add about 15% compilation when compiling from scratch with -j1 (the worst case), which is reasonable since there are double the number of files to compile and as many number of calls to an external script to generate the shims in the first place. This price is only paid when compiling from scratch, as afterwards the shims are already generated and recompiled incrementally, as needed. It doesn't seem too bad for a feature that will be removed in the short-ish term.

Outstanding naming issues

  • Ocamlcommon vs Ocaml_common? I chose the first one, as it matches the library name.
  • Currently all the files continue to be installed to +compiler-libs, as before. But we may want to take advantage of the situation to change to per-library directories? This would break backwards compatibility, though.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an impressive amount of work. Thanks for the very detailed description!

I had a few naive questions while reading the code. Toplevel remarks:

  • I wish we spent more time working on "internal suffixes" for compiler unit names, to support clashes of module names, instead of spending this much effort using module aliases as an ugly workaround. (This is not a criticism of the present work, and indeed it is useful today.)

  • Re Ocamlcommon vs. Ocaml_common, I like the fact that Ocaml_common makes the approach compatible to Janestreet's prefixed re-export, https://github.com/janestreet/ocaml-compiler-libs. (In particular, people could write code with explicit prefixing and use ocaml-compiler-libs for backward compatibility with older compiler releases.)

  • I'm not excited by the shiny new awk and sed scripts.

  • You mention a message of Jacques that explained the best approach available before ocamldep grew support for module aliases, in the form of -open and -as-map (I implemented a debugger-specific use of it in Prefix ocamldebug modules to minimize clashes #9621). Did you try using those? Is there a specific reason for doing it the old-style way?

* expect
*)
[@@@alert "-deprecated"]

module L = Longident

[%%expect {|
module L = Longident
module L = Ocamlcommon.Longident
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the output modified here? (Would passing -short-paths avoid this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'm not sure. I will experiment a bit and report back.

@@ -1,6 +1,5 @@
(* TEST
flags = "-I ${ocamlsrcdir}/parsing -I ${ocamlsrcdir}/toplevel"
include ocamlcommon
flags = "-I ${ocamlsrcdir}/parsing -I ${ocamlsrcdir}/compilerlibs -open Ocamlcommon"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need a change here? I thought that include ocamlcommon would imply -I +compilerlibs -open Ocamlcommon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the way * expect is currently implemented, it doesn't fully respect up the include ocamlcommon modifier. It could probably be fixed, but I did not want to increase the diff in this PR too much.

# bytecomp/ocamlcommon_meta.cmx: bytecomp/meta.ml
# $(CAMLC) $(COMPFLAGS) $(OPTCOMPFLAGS) -o $@ -c $<
#
# compilerlibs/meta.ml: compilerlibs/Makefile.compilerlibs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hsould be bytecomp/meta.ml, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so: this is the compatibility/deprecation shim for bytecomp/meta.ml and it is generated inside compilerlibs.

# the following rules:
#
# bytecomp/ocamlcommon__meta.cmo: bytecomp/meta.ml
# $(CAMLC) $(COMPFLAGS) -o $@ -c $<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this does not depend on a .cmi file. Is the idea that the dependency is provided in the .depend file? (A comment could help)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly it. I will add a comment explaining this point.

@nojb
Copy link
Contributor Author

nojb commented Jun 18, 2020

  • Re Ocamlcommon vs. Ocaml_common, I like the fact that Ocaml_common makes the approach compatible to Janestreet's prefixed re-export, https://github.com/janestreet/ocaml-compiler-libs. (In particular, people could write code with explicit prefixing and use ocaml-compiler-libs for backward compatibility with older compiler releases.)

That's a fair argument. On the other hand, I wonder why they chose that name to begin with; those tiny differences between module and library names are a headache to deal with.

  • I'm not excited by the shiny new awk and sed scripts.

Indeed. There are three new scripts in this PR:

  1. compilerlibs/generate_deprecated.awk: used to generate deprecation shims.

  2. compilerlibs/generate_module_aliases.awk: used to generate the "map" files mapping short names to long names, and

  3. compilerlibs/compilerlibs.sed: used to post-process .depend. This one is generated automatically by make rules.

From these, 1. supposedly would be going away soon-ish, so I'm not very concerned about. 3. is only used when doing make depend and could be rewritten in OCaml or some other language easily.

As for 2., it is used during compilation. In principle it could be an OCaml script, but any bootstrapping issues need to be examined (at first sight it seems it should work). Another possibility (just brainstorming) would be to commit these map files into the repository and define the variables in Makefile.compilerlibs in terms of them (as they contain more or less the same information as the variables defining the different libraries).

I will investigate rewriting the scripts in OCaml and report back.

  • You mention a message of Jacques that explained the best approach available before ocamldep grew support for module aliases, in the form of -open and -as-map (I implemented a debugger-specific use of it in Prefix ocamldebug modules to minimize clashes #9621). Did you try using those? Is there a specific reason for doing it the old-style way?

Yes, I did :) The issue is that ocamldep does not support prefixing of the "output" names; ie it assumes that the source files are named the same as the compiled versions, see the issue description in #286, from where I quote:

There is no support for prefixing: one is supposed to have the source files name match the compiled version, using copying or symbolic links.

@xavierleroy
Copy link
Contributor

Two quick remarks:

The issue is that ocamldep does not support prefixing of the "output" names; ie it assumes that the source files are named the same as the compiled versions

Why not add this feature to ocamldep, then?

those tiny differences between module and library names are a headache to deal with.

If we're willing to put up with the headache, why not make it look nicer and have the same prefix (say OCamlInternals or OCamlCompiler or maybe even just OCaml) for all modules of all compiler-libs? That these modules are grouped in 5 libraries is an implementation detail, not something that should necessarily be reflected in the sources of client modules.

@nojb
Copy link
Contributor Author

nojb commented Jun 18, 2020

Two quick remarks:

The issue is that ocamldep does not support prefixing of the "output" names; ie it assumes that the source files are named the same as the compiled versions

Why not add this feature to ocamldep, then?

I haven't looked at what is involved. The original PR #286 had such a feature originally, but it was removed before merging due to a lack of consensus. Maybe this PR is a good time to revisit the issue.

those tiny differences between module and library names are a headache to deal with.

If we're willing to put up with the headache, why not make it look nicer and have the same prefix (say OCamlInternals or OCamlCompiler or maybe even just OCaml) for all modules of all compiler-libs? That these modules are grouped in 5 libraries is an implementation detail, not something that should necessarily be reflected in the sources of client modules.

I agree with having a toplevel OCaml module! I think it would be easy to adapt the PR to do this, and in fact it would actually make the patch a bit simpler, since we would only need a single map file mapping short names to long names, instead of one per library.

@Octachron
Copy link
Member

Concerning the documentation, I would not worry too much on long term compatibility with ocamldoc: odoc has now a man page backend (ocaml/odoc#428) and I have a prototype of a latex backend that can build the full standard library. Thus, It seems quite possible that we could switch to building the compiler-libs documentation with odoc when we remove the compatibility layer.

@nojb
Copy link
Contributor Author

nojb commented Jun 18, 2020

I pushed three new commits:

  1. rewrote the awk/sed scripts in OCaml,
  2. use a common prefix for all "long" names (compilerlibs__). This simplifies some of the Makefile macros, and
  3. implement suggestion by @xavierleroy : put everything inside a single top-level module, OCaml (this also allows a further, considerable, simplification of the Makefile macros). There is something undeniably cool about being able to write open OCaml:

asciicast

On the other hand, maybe OCaml (or should it be Ocaml??) is too short/common and something longer would be better.

I haven't updated the issue description nor the tests. I will wait for the discussion to converge to do that.

@Armael
Copy link
Member

Armael commented Jun 19, 2020

Nice work! With regards to naming, I would slightly weight in favor of OCamlInternals or OCamlCompiler rather than just OCaml, just to make it very clear that this is not part of the standard library.

@nojb
Copy link
Contributor Author

nojb commented Jul 2, 2020

I renamed the top-level module Compilerlibs instead of OCaml.

@nojb
Copy link
Contributor Author

nojb commented Nov 26, 2020

Following discussion in the developer's meeting and in the spirit of trying to exercise a bit of PR discipline, I'm shutting down this one down, since it is not clear that it has a viable path forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants