-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop relying on location to track usage #8934
Conversation
typing/types.ml
Outdated
| Predef of string | ||
|
||
let mk = | ||
let id = ref (-1) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I think the generator should be reset (or otherwise made local to the compilation_unit), so that compiling multiple units in a single invocation of the compiler produces the same .cmi files.
-
It would be better to provide custom equality and hash functions rather than relying on the generic operations(through Hashtbl in Env).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack for reset
, I'll push something for that.
AS for custom operations: I'm fine with providing a custom equality function, but for hashing, I can't see myself doing anything better the generic hash
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, there should be local compare
and equal
functions and an appropriate application of Identifiable.Make
, and use the result of that in Env
.
I personally would prefer if ids were local to a compilation unit.
Do you have an opinion on whether to make a shortcut for Uid.mk ~current_unit:(Env.get_unit_name ())
?
if we ignore wanting the freedom to give custom semantics and a different operation set, why are |
@Drup : there is more than one way in which to understand your question. I'll try to answer some of them.
There are actually several reasons why that doesn't work. A very boring one is because when you use Another way to understand your question could be:
Someone has actually asked me something along these lines offline. And the answer here is: I could do that (use the |
I was thinking about the former, yes. The remark was that The problem then is that we basically have to carry two ids everywhere. I'm not sure how I feel about this. |
I had a brief (offline) discussion with @alainfrisch about this PR, where he suggested using an "abstract location" (IIUC that would reflect a position in the AST rather than in the source) instead of an integer for these uids. What I told him then was roughly:
And I still stand by that last point. |
Quoting @damiendoligez from #8987 (comment):
seems like the right way to revive this PR! Would someone be willing to review this? @alainfrisch, @Drup: would one of you care to volunteer? If you still have doubts / questions, perhaps we could meet in person to discuss them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, another PR that I though we had merged already. :)
I like the general approach and I think it works well. The code is clearly cleaner in many places, and when it isn't, it's orthogonal and will be addressed in later PRs. I'm pretty sure we could also reuse those Uid
s for other purposes as well. I think we should merge this.
@@ -648,7 +647,8 @@ let strengthen = | |||
aliasable:bool -> t -> module_type -> Path.t -> module_type) | |||
|
|||
let md md_type = | |||
{md_type; md_attributes=[]; md_loc=Location.none} | |||
{md_type; md_attributes=[]; md_loc=Location.none | |||
;md_uid = Uid.internal_not_actually_unique} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could remove this function. It doesn't make much sense anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was there before, and is still used in a few places.
I agree it'd be good to get rid of it, but let's perhaps do that in a separate PR?
typing/types.ml
Outdated
| Predef of string | ||
|
||
let mk = | ||
let id = ref (-1) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, there should be local compare
and equal
functions and an appropriate application of Identifiable.Make
, and use the result of that in Env
.
I personally would prefer if ids were local to a compilation unit.
Do you have an opinion on whether to make a shortcut for Uid.mk ~current_unit:(Env.get_unit_name ())
?
exp_extra = []; | ||
exp_type = ty; | ||
exp_env = env } | ||
) body tunpacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function is 1) not pretty 2) a Frankenstein of the old wrap_unpacks
and the relevant part of type_expect_
. Fortunately, you already fixed that in #8935 ! At least the correctness is rather easy, since it's mostly copy-pasting, so we can push those consideration for later PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, it would be nice to at least avoid the code duplication if possible.
I just rebased this on top of trunk, there were some minor conflicts on
Done.
Not really, the current code seems fine to me. |
This allows us to give the same uid to the module bound in the guard, and the one bound in the rhs.
Seems I had
Anyway, I've now added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a cursory look and have a small suggestion, but really I'm approving on behalf of @Drup.
…effects when merging ocaml#8934
…effects when merging ocaml#8934
Back in march, I announced:
It was a very long week, but the non-prototype version is finally appearing as PR!
Motivation
Whilst not relying on location to track usage would allow us to freely update locations to get better error messages (cf. #7859, #1737) without triggering spurious warnings (cf. #7852), it is not the main motivation for this work!
The real reason is that locations are a terrible way to identify declarations: you can declare many different things, with the same name, at the same location! Ppxes do it all the time, and back in the days camlp4 used to do it too (the compiler still bears the scars).
Examples of missing warnings in the ppx era
First an innocuous one:
ppx_bin_prot
generates things like:where
buf
is unused in first function, but shares the location of thebuf
in the second one, which is used.That's not too bad, since the actual fix was to rename the first one to
_buf
.Slightly worse, here is an example from
ppx_variants_conv
which is due to a usingString.lowercase
instead ofString.uncapitalize
:generates:
One can guess which constructor the value
c_a
refers to, but one shouldn't have to guess. Also, notice thatVariants.map
has two parameters with the same name and ignores one of them.Finally, and probably the worse one, it is also possible to make the compiler believe that a value (or constructor is used).
For instance, at Jane Street we have a ppx which given:
generates something that looks like this:
Here both
Foo
s are declared at the same location, and since the second one (The_ppx_result.Foo
) is used, then so is the first one!In this PR
The basic idea is to assign a unique identifier to each declaration, and use this uid to track usage.
However, it is not completely straightforward: there are situations where the typechecker generates several declarations based on the same item in the source (for instance declaring a class will declare a class, a class type, a type). And situations where the typechecker generates declaration for internal use only, never exposing them to the user (when approximating a module type for example).
In the first case, it makes sense to give the same id to all the declarations: they are all generated by the same source declaration, using any of them means that the source declaration is used.
In the second case, we never want to report any warning on these, we don't intend for them to be used by the user, so we mark them as internal.
There are things one can turn to in the existing code when deciding whether to mint a new id or not: what location is attached to the declaration? Do we attach some checks to the declaration?
These aren't necessarily always right (in the sense that even though we might not want to emit a warning, we might still want to properly identify a declaration), but some preliminary PRs (#8885, #8891) should have made the system more consistent, and this PR easier to review.
Third party uses of uids
Using these new ids, it becomes fairly straightforward to build a tool that will do an usage analysis at the level of a library / binary, or even of a workspace.
(Note: I have heard rumors of teams at big tech/advertising companies doing such analyses by concatenating all their ml files into one, and relying on the usual compiler warnings.)
We built a prototype of such a tool at Jane Street (where the "concatenate all the files" approach clearly wouldn't have worked), using the initial version of the patch presented in this PR.
Since then, the Trustworthy Refactoring team published Characterising Renaming within Ocaml's Module System, which is a very nice presentation (and formalisation) of (essentially) the same problem. Their presentation identifies declarations with a location, but it is my impression that these are actually closer to the ids introduced by this PR, than to our
Location.t
(which, as discussed above, can't be trusted to identify a declaration).It also seems fair to assume that their tool (rotor) could be simplified in places by using these ids; this essentially amounts to letting the compiler do the "binding resolution", which they currently have to recompute.
Keeping these use cases in mind can also help decide whether minting a fresh id would make sense even though the location is
Location.none
Note: this is based on #8891 (itself based on #8908).