Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the Standard Library prefixing to match Dune's #10169

Merged
merged 3 commits into from Feb 10, 2021

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Jan 21, 2021

In #1010 (OCaml 4.07), the Standard Library modules were re-packed under the Stdlib module using module aliases. The scheme was based on one already in use at Jane Street and adopted by the (at the time previewed to core developers, but not publicly released!) Jbuilder.

A key difference, however, is that Jbuilder (now, of course, Dune), uses the capitalized form of the module so, for example, the actual module for Stdlib.String is Stdlib__String with the sources in string.ml and string.mli. It is very awkward to mix the cases like this in make, and the prefixing was already supremely complicated so the decision was taken in #1010 to use Stdlib__string. Wind forward a bit and #2093 added developer-only support for building the compiler using Dune.

One of the many reasons that the Dune files are for developers only is that Dune uses the capitalized form of the module name, and so generates a Standard Library which is not compatible with that generated by the primary build system.

As part of the flambda 2.0 work, @mshinwell has been overhauling the Dune rules to provide the option to build an installable compiler using Dune. While there's no intention that this be used in upstream OCaml any time soon, it would be supremely helpful if the flambda 2.0 build system generated a compatible Standard Library.

Hence this PR, which alters the encoding scheme to match Dune's.

stdlib/StdlibModules already contains the unprefixed forms of the
filenames in $(STDLIB_MODS)
Now use stdlib__String instead of stdlib__string. Removes a difference
between the stdlib when compiled with Dune vs make.
@dra27
Copy link
Member Author

dra27 commented Jan 21, 2021

There is one particularly strange part of this PR, for which I have done one implementation, but there are various options.

We end up needing rules of the form: stdlib__String.cmo: string.ml - i.e. where both String and string are needed and you can't do that in a pattern rule. I've elected in this PR to do that by munging the .depend file, since we already have to anyway and ocamldep handily always puts entries for files with no dependencies and also starts a new line after the colon. So all of the dependencies in .depend now get the appropriate .ml or .mli file prepended.

The pattern rules then have no dependencies and I then use $(filter to extract the appropriate filename. $^ is all the dependencies - including ones collected elsewhere. $< still means the first dependency, but because there are no dependencies on the pattern rule, that means the first dependency as encountered in the whole Makefile which is difficult to see. Alternatives:

  • Use even more meta-make (i.e. $(eval ). It would be possible to generate stdlib__String.cma: string.ml using STDLIB_MODULES. Pros: generating .depend already used sed -E so wasn't POSIX, but the present implementation uses \l which is even less POSIX. Cons: the meta-generated rules are no less obscure for where the dependency has come from and the code is longer.
  • Use $< instead of $filter( ..., $^). I started with this, but you have to be certain that .depend is read before anything else (cf. the dependency on the compiler which at present comes first). Pros: looks less weird than the $(filter ...). Cons: it's very brittle; you wouldn't break this and not notice, but the error is very strange to work out - you end up seeing, for example, ocamlc -flags -c ../ocamlc
  • Rename all the stdlib files to have capitals - i.e. have String.ml and String.mli instead.

@dra27
Copy link
Member Author

dra27 commented Jan 21, 2021

I should also highlight from the Changes entry the fact that any code which naughtily uses Stdlib__string will break. There are still instances where the names leak into backtraces (I'm not sure if those are technically bugs?!) so there may also be less naughty instances which are broken by this change.

Overall I haven't marked this change as breaking, because the namespacing trick is supposed to be internal, but that is rather open to discussion.

@mshinwell
Copy link
Contributor

I must thank @dra27 for figuring out how to do this. It would indeed be very useful to make the stdlib's name mangling consistent with the conventions elsewhere (which I think are in the majority).

@gasche
Copy link
Member

gasche commented Jan 21, 2021

Let me summarize what I follow from your second comment.

Before we had a generic makefile rule to build stdlib__%.cmi by compiling %.mli. This does not work anymore given that the case between the two files is different.

Currently stdlib/.depend is generated with a post-processing step after ocamldep, to turn foo.cmi: bar.cmi blah.cmi into stdlib__foo.cmi: stdlib__bar.cmi stdlib__blah.cmi.
You tweak this preprocessing logic to get stdlib__Foo.cmi: foo.mli stdlib__Bar.cmi stdlib__Blah.cmi. This way, you can get what was previously %.mli (but now it's not because the case is different) by filtering on the dependencies of the rules.

Personally I don't like the fact of changing ocamldep's output, and I think that the new postprocessing is worse than the previous postprocessing because it really changes the dependency structure of the result, instead of just translating names -- we are effectively using .depend as a database to store information on interface files. This being said, it's not clear than any of the alternatives you suggest are better:

Use even more meta-make (i.e. $(eval ). It would be possible to generate stdlib__String.cma: string.ml using STDLIB_MODULES. Pros: generating .depend already used sed -E so wasn't POSIX, but the present implementation uses \l which is even less POSIX. Cons: the meta-generated rules are no less obscure for where the dependency has come from and the code is longer.

This avoids the issue with .depend, at the cost of using more complex Make features. I'm not fond of build-system hacks (and at our current complexity level pretty much everything is a hack), but I like even less those that force me to go read the Make documentation to figure out what is going on.

Use $< instead of $filter( ..., $^). I started with this, but you have to be certain that .depend is read before anything else (cf. the dependency on the compiler which at present comes first). Pros: looks less weird than the $(filter ...). Cons: it's very brittle; you wouldn't break this and not notice, but the error is very strange to work out - you end up seeing, for example, ocamlc -flags -c ../ocamlc

I have no problem with filter which reads fairly clearly in the code. I agree that your approach is better than the-other-magic-variable-I-need-to-lookup-documentation-about.

Rename all the stdlib files to have capitals - i.e. have String.ml and String.mli instead.

The main downside is that it will break the muscle habits of maintainers that are used to the current layout. This is not clearly better than your approach -- it's cleaner but more people will complain.

.

I agree that the change is desirable, because it puts us back on the path of least surprise: use the same prefixing layout as everyone else. The patch is never going to look pretty, so it's okay if there are changes I'm not fond of (the comments above should not block inclusion).

Naive question: instead of hacking the .depend, could we use something like

stdlib__%.cmi:
  .... $(shell extract-the-right-stuff.sh $MAGIC-VARIABLE-FOR-THE-TARGET-NAME).mli ...

?

@lpw25
Copy link
Contributor

lpw25 commented Jan 21, 2021

How crazy would this be:

stdlib__A%.cmi: a%.mli
stdlib__B%.cmi: b%.mli
stdlib__C%.cmi: c%.mli
stdlib__D%.cmi: d%.mli
...

Probably too crazy right?

@lthls
Copy link
Contributor

lthls commented Jan 21, 2021

Has anyone considered patching dune to add an option to use the lower-case version ? I don't think there's any particular reason why this wouldn't work, but I couldn't find the place in the dune sources where the object names for wrapped modules are generated, so I couldn't actually try it.

@nojb
Copy link
Contributor

nojb commented Jan 21, 2021

Has anyone considered patching dune to add an option to use the lower-case version ? I don't think there's any particular reason why this wouldn't work, but I couldn't find the place in the dune sources where the object names for wrapped modules are generated, so I couldn't actually try it.

cc @jeremiedimino

@ghost
Copy link

ghost commented Jan 21, 2021

No need to add an option for this. The stdlib already receives special treatment from Dune (you can see it from the (stdlib ...) field in stdlib/dune), so we can just use lower cases for the stdlib specifically.

@mshinwell
Copy link
Contributor

@lthls I hadn't specifically thought about what @jeremiedimino just suggested, but had thought previously about altering Dune; I concluded that it would be better to have consistency everywhere.

@dra27
Copy link
Member Author

dra27 commented Jan 21, 2021

@lpw25 unless - I like the principle, but only one pattern rule can match for a given file, so you have to duplicate the recipe too! You can see this having to be done slightly weirdly for the COMPILER_DEPS part in stdlib/Makefile

@dra27
Copy link
Member Author

dra27 commented Jan 21, 2021

@gasche - I agree with your summary!

Naive question: instead of hacking the .depend, could we use something like

stdlib__%.cmi:
  .... $(shell extract-the-right-stuff.sh $MAGIC-VARIABLE-FOR-THE-TARGET-NAME).mli ...

?

The problem here is that you lose the fact that the .cmi file depends on a source file at all. IIRC macro expansion occurs after dependency resolution, so you can't do stdlib__%.cmi: $(shell do-the-right-thing %) either.

@dra27
Copy link
Member Author

dra27 commented Jan 21, 2021

I'm completely neutral on how this gets "fixed" - the only thing I would say is that if GNU make had a way of writing a pattern which did this, we would have used it in #1010 (i.e. the current behaviour is because of make's limitations, it's not the natural way I think you'd name it)

@dra27
Copy link
Member Author

dra27 commented Jan 21, 2021

This PR is definitely breaking - it will require a minor change to ocamlfind (@kit-ty-kate's tests are ongoing, delayed by three large opam-repository PRs hogging our cluster, one of which I'm responsible for too!). See dra27/ocamlfind@af90f8d

@kit-ty-kate
Copy link
Member

opam-health-check finished checking this PR on the whole of opam-repository (including tests) and the only two failures were:

For my part, this PR looks good to me.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval in principle. See comments for potential change suggestions.

(I agree with @mshinwell's point that using the same convention as everyone else is better than supporting a non-conventional setup.)

stdlib/Makefile Show resolved Hide resolved
stdlib/Makefile Outdated
's#(^| )(${subst ${SPACE},|,${UNPREFIXED_OBJS}})[.]#\1stdlib__\2.#g' \
sed -E \
-e 's#(^| )(${STDLIB_NAMESPACE_MODULES})[.]#\1stdlib__\u\2.#g' \
-e 's/^stdlib__(.)(.*)(\.[^i]*)(i?) :/stdlib__\1\2\3\4 : \l\1\2.ml\4/' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: here is a rough idea for an alternative:

  • first add the .mli files as first dependencies (this requires no case-change magic)
  • then generate the rest of the sed script by iterating on STDLIB_NAMESPACE_MODULES and producing a textual replacement instead (or we could, if we fill less meta-programming-y, just use a shell loop with sed -i).

I will let you judge whether this can work and is better than the current approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure (yet) about the loop, but changing the order of those two expressions is definitely a good idea, if only to remove the evil of \l\1 which certainly in the fixed fonts in my editor/browser is not immediately clear to be a lowercase "l" and a "1"!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rebased version swaps the two statements, which simplifies the second one quite considerably

@lthls
Copy link
Contributor

lthls commented Jan 22, 2021

No need to add an option for this. The stdlib already receives special treatment from Dune (you can see it from the (stdlib ...) field in stdlib/dune), so we can just use lower cases for the stdlib specifically.

I have a small patch to dune that does just that, although I haven't tested it. Should I make a pull request ?

@lthls
Copy link
Contributor

lthls commented Jan 22, 2021

A friend suggested me that renaming all files that will be wrapped to their capitalized versions (e.g. list.ml -> List.ml) would also make the Makefile and dune builds agree on object names. It could be annoying for people rebasing stdlib changes though.

@dra27
Copy link
Member Author

dra27 commented Jan 22, 2021

A friend suggested me that renaming all files that will be wrapped to their capitalized versions (e.g. list.ml -> List.ml) would also make the Makefile and dune builds agree on object names. It could be annoying for people rebasing stdlib changes though.

So did I in #10169 (comment) 🙂 I think Git's renaming detection would catch that for most/all rebasing but muscle-memory might be more irritating on a more regular basis, as in #10169 (comment)

@gasche
Copy link
Member

gasche commented Jan 23, 2021

I would propose to polish this PR and merge it, and then discuss renaming stdlib files. Here's the thinking. An annoying file renaming just to please Dune-build users? Nah. An annoying file renaming to simplify our build system? Easier sell.

@lthls
Copy link
Contributor

lthls commented Jan 23, 2021

I would propose to polish this PR and merge it, and then discuss renaming stdlib files. Here's the thinking. An annoying file renaming just to please Dune-build users? Nah. An annoying file renaming to simplify our build system? Easier sell.

I would have thought adding complexity to our build system just to please Dune-build users would be harder to sell than renaming files just to please Dune-build users, but I seem to be in the minority here so go ahead.

By the way, if we're renaming files, we might as well rename foo.ml directly to stdlib__Foo.ml. This would probably allow us to remove the hacks around ocamldep's output completely.

@dra27
Copy link
Member Author

dra27 commented Feb 8, 2021

Rebased to remove conflicts and address @gasche's comment.

@lthls - there was already complexity in the awk scripts which was Dune-specific which gets moved and made non-Dune-specific. The complexity of post-processing ocamldep's output was already decided when this was originally done in #1010. At least at present there's post-processing of an internal module name, rather than actually committing the "hacked namespace" prefix to the files themselves.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this PR again and I stand by my initial judgment, which is that the added-complexity costs are relatively small (now with copious documentation wherever it happens, thanks), and I can see clear benefits to the consistency. (Not just for Dune users, but also to everyone who gets to see these mangled name in backtraces etc., and does not have to be surprised at the use of a different mangling technique than what they usually observe.)

@@ -64,6 +64,13 @@ Working version
- #9582: Add Array.{find_opt,find_map,split,combine}.
(Nicolás Ojeda Bär, review by Daniel Bünzli and Gabriel Scherer)

- #10169: Use capitalized module names in the Standard Library prefixing scheme
to match Dune, e.g. Stdlib__String instead of Stdlib__string. This is a
breaking change only to code which attempted to use the internal names before.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change only to code which attempted to use the internal names before.

Is there code like that in the wild? If not, we can remove the mention (it's relatively obvious). If yes, we can use a * marker for the entry.

Maybe we should try to get opam-repository results for the change? (cc @Octachron)
(Hmm, this patchset might be painful to rebase on 4.12.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my full report for opam-repository in my last comment above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry! (It's there.) I had read it before, but on re-reviewing I forgot about this context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - I have pushed the asterisk! I have also opened a PR against ocamlfind, so it's not a tedious blocker at the start of the 4.13 testing cycle.

@dra27
Copy link
Member Author

dra27 commented Feb 8, 2021

Having checked back through the conversation, I think that this is OK to merge pending CI. It is also going through precheck#540

I rebased only to squash the last edits, as I'd like to preserve the history.

@dra27 dra27 added merge-me and removed merge-me labels Feb 8, 2021
@dra27
Copy link
Member Author

dra27 commented Feb 8, 2021

OK, so now it's going through precheck#541. You see, undefine was added to GNU make in October 2009, 3.5 years after the version which ships with macOS...

@dra27 dra27 merged commit a0f1c39 into ocaml:trunk Feb 10, 2021
@dra27 dra27 deleted the stdlib-file-casing branch February 10, 2021 10:03
@nojb nojb mentioned this pull request Mar 3, 2021
@shindere
Copy link
Contributor

shindere commented Mar 3, 2021

We now have a Make warning about `stdlib/StdlibModules' saying that the ' '
variable is undefined at line 56.

Could this warning have something to do with this PR?

Use make --warn-undefined-variables to reproduce.

@gasche
Copy link
Member

gasche commented Mar 3, 2021

@shindere: I believe that @nojb and @Octachron are working on this issue, see
#10268

@shindere
Copy link
Contributor

shindere commented Mar 3, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants