Stop relying on location to track usage #8934

trefis · 2019-09-11T18:18:14Z

Back in march, I announced:

I wrote a prototype [...] based on 4.07 last week [...] which I plan to rebase
onto trunk (and submit a PR) next week.

It was a very long week, but the non-prototype version is finally appearing as PR!

Motivation

Whilst not relying on location to track usage would allow us to freely update locations to get better error messages (cf. #7859, #1737) without triggering spurious warnings (cf. #7852), it is not the main motivation for this work!

The real reason is that locations are a terrible way to identify declarations: you can declare many different things, with the same name, at the same location! Ppxes do it all the time, and back in the days camlp4 used to do it too (the compiler still bears the scars).

Examples of missing warnings in the ppx era

First an innocuous one: ppx_bin_prot generates things like:
```
let read_tag buf ~pos_ref:_ vint = vint in
let read_whatever buf ~pos_ref = Bytes.get buf ~pos_ref in
...
```
where buf is unused in first function, but shares the location of the buf in the second one, which is used.
That's not too bad, since the actual fix was to rename the first one to _buf.
Slightly worse, here is an example from ppx_variants_conv which is due to a using String.lowercase instead of String.uncapitalize:
```
type t = C_a | C_A [@@deriving variants]
```
generates:
```
val c_a : t = C_A

module Variants :
  sig
    val c_a : t Variant.t Variant.t
    (* ... *)
    val map : t -> c_a:'a -> c_a:(t Variant.t Variant.t -> 'b) -> 'b
    (* ... *)
  end
```
One can guess which constructor the value c_a refers to, but one shouldn't have to guess. Also, notice that Variants.map has two parameters with the same name and ignores one of them.
Finally, and probably the worse one, it is also possible to make the compiler believe that a value (or constructor is used).
For instance, at Jane Street we have a ppx which given:
```
type t = Foo of int [@@deriving the_ppx]
```
generates something that looks like this:
```
module The_ppx_result = struct
  type t = Foo of The_ppx_runtime_lib.some_type

  let something_something =
    Foo (The_ppx_runtime_lib.some_value)
end
```
Here both Foos are declared at the same location, and since the second one (The_ppx_result.Foo) is used, then so is the first one!

In this PR

The basic idea is to assign a unique identifier to each declaration, and use this uid to track usage.

However, it is not completely straightforward: there are situations where the typechecker generates several declarations based on the same item in the source (for instance declaring a class will declare a class, a class type, a type). And situations where the typechecker generates declaration for internal use only, never exposing them to the user (when approximating a module type for example).

In the first case, it makes sense to give the same id to all the declarations: they are all generated by the same source declaration, using any of them means that the source declaration is used.
In the second case, we never want to report any warning on these, we don't intend for them to be used by the user, so we mark them as internal.

There are things one can turn to in the existing code when deciding whether to mint a new id or not: what location is attached to the declaration? Do we attach some checks to the declaration?

These aren't necessarily always right (in the sense that even though we might not want to emit a warning, we might still want to properly identify a declaration), but some preliminary PRs (#8885, #8891) should have made the system more consistent, and this PR easier to review.

Third party uses of uids

Using these new ids, it becomes fairly straightforward to build a tool that will do an usage analysis at the level of a library / binary, or even of a workspace.
(Note: I have heard rumors of teams at big tech/advertising companies doing such analyses by concatenating all their ml files into one, and relying on the usual compiler warnings.)

We built a prototype of such a tool at Jane Street (where the "concatenate all the files" approach clearly wouldn't have worked), using the initial version of the patch presented in this PR.

Since then, the Trustworthy Refactoring team published Characterising Renaming within Ocaml's Module System, which is a very nice presentation (and formalisation) of (essentially) the same problem. Their presentation identifies declarations with a location, but it is my impression that these are actually closer to the ids introduced by this PR, than to our Location.t (which, as discussed above, can't be trusted to identify a declaration).
It also seems fair to assume that their tool (rotor) could be simplified in places by using these ids; this essentially amounts to letting the compiler do the "binding resolution", which they currently have to recompute.

Keeping these use cases in mind can also help decide whether minting a fresh id would make sense even though the location is Location.none

Note: this is based on #8891 (itself based on #8908).

alainfrisch · 2019-09-13T15:18:19Z

typing/types.ml

+    | Predef of string
+
+  let mk =
+    let id = ref (-1) in


I think the generator should be reset (or otherwise made local to the compilation_unit), so that compiling multiple units in a single invocation of the compiler produces the same .cmi files.

It would be better to provide custom equality and hash functions rather than relying on the generic operations(through Hashtbl in Env).

Ack for reset, I'll push something for that.

AS for custom operations: I'm fine with providing a custom equality function, but for hashing, I can't see myself doing anything better the generic hash function.

I agree, there should be local compare and equal functions and an appropriate application of Identifiable.Make, and use the result of that in Env.

I personally would prefer if ids were local to a compilation unit.

Do you have an opinion on whether to make a shortcut for Uid.mk ~current_unit:(Env.get_unit_name ()) ?

Drup · 2019-10-09T09:47:20Z

if we ignore wanting the freedom to give custom semantics and a different operation set, why are Ident.t insufficient for your purposes ? They are supposed to be unique per compilation unit (and they already have some handling of predef/persistent ids).

trefis · 2019-10-09T13:24:35Z

@Drup : there is more than one way in which to understand your question. I'll try to answer some of them.
One is:

Why aren't we just using Ident.t as keys for the "usage tables" instead of Location.t * string?

There are actually several reasons why that doesn't work. A very boring one is because when you use Foo.Bar.baz, which you want to count as an use of baz, you don't have baz' ident at hand, you have Foo's.
More fundamentally: there isn't such a thing as "baz' ident".

Another way to understand your question could be:

Why do you need a different uid, why can't you just take the ident that was first used to enter the declaration in the environment, and put it inside the declaration (like you're currently putting Uid.ts)?

Someone has actually asked me something along these lines offline. And the answer here is: I could do that (use the Ident.t that is), it would indeed work but... why would I do that?
I don't think it'd come out as cleanly. Also, one would have to be careful that Subst doesn't touch these particular idents.

Drup · 2019-10-09T13:33:42Z

I was thinking about the former, yes. The remark was that Uid.t and Ident.t are basically the same datastructure, and we already have some machinery for the idents. Your point about subst (and other related ident-modifying passes) is valid though. Having to deal with two "use" of idents, and how to make sure we only touch the rights one sound like a nightmare.

The problem then is that we basically have to carry two ids everywhere. I'm not sure how I feel about this.

trefis · 2019-10-09T13:48:49Z

I had a brief (offline) discussion with @alainfrisch about this PR, where he suggested using an "abstract location" (IIUC that would reflect a position in the AST rather than in the source) instead of an integer for these uids.
The idea being that uids might be more stable (for instance in the context of merlin) with such a representation.

What I told him then was roughly:

Perhaps. I would have to think about it. However, given that the type is abstract, it's a change that we can make after the fact, and I don't think we should block the PR for this question.

And I still stand by that last point.
(I have however given it a bit more thought, and I don't think that these "abstract locations" would be more stable than an integer. But they will be more costly to build)

trefis · 2020-02-26T13:12:20Z

Quoting @damiendoligez from #8987 (comment):

I would say your choice was the right one and indeed using locations for computing unused variables was a mistake.

seems like the right way to revive this PR!

Would someone be willing to review this? @alainfrisch, @Drup: would one of you care to volunteer? If you still have doubts / questions, perhaps we could meet in person to discuss them.

Drup

Ah, another PR that I though we had merged already. :)

I like the general approach and I think it works well. The code is clearly cleaner in many places, and when it isn't, it's orthogonal and will be addressed in later PRs. I'm pretty sure we could also reuse those Uids for other purposes as well. I think we should merge this.

Drup · 2020-02-26T16:49:40Z

typing/env.ml

@@ -648,7 +647,8 @@ let strengthen =
         aliasable:bool -> t -> module_type -> Path.t -> module_type)

 let md md_type =
-  {md_type; md_attributes=[]; md_loc=Location.none}
+  {md_type; md_attributes=[]; md_loc=Location.none
+  ;md_uid = Uid.internal_not_actually_unique}


I think you could remove this function. It doesn't make much sense anyway.

It was there before, and is still used in a few places.
I agree it'd be good to get rid of it, but let's perhaps do that in a separate PR?

Drup · 2020-02-26T16:56:28Z

typing/types.ml

+    | Predef of string
+
+  let mk =
+    let id = ref (-1) in


I agree, there should be local compare and equal functions and an appropriate application of Identifiable.Make, and use the result of that in Env.

I personally would prefer if ids were local to a compilation unit.

Do you have an opinion on whether to make a shortcut for Uid.mk ~current_unit:(Env.get_unit_name ()) ?

Drup · 2020-02-26T17:00:37Z

typing/typecore.ml

+      exp_extra = [];
+      exp_type = ty;
+      exp_env = env }
+  ) body tunpacks


That function is 1) not pretty 2) a Frankenstein of the old wrap_unpacks and the relevant part of type_expect_. Fortunately, you already fixed that in #8935 ! At least the correctness is rather easy, since it's mostly copy-pasting, so we can push those consideration for later PRs.

That being said, it would be nice to at least avoid the code duplication if possible.

trefis · 2020-02-27T16:53:28Z

I just rebased this on top of trunk, there were some minor conflicts on typecore, and some bigger conflicts on predef.
The testsuite passes so I think I resolved everything correctly, but perhaps it would be sensible to double check the diff on predef.

I agree, there should be local compare and equal functions and an appropriate application of Identifiable.Make, and use the result of that in Env.

Done.

I personally would prefer if ids were local to a compilation unit.

Ident.ts are not, and there should be a lot more Ident.ts than Uid.ts, so is this really necessary?

Do you have an opinion on whether to make a shortcut for Uid.mk ~current_unit:(Env.get_unit_name ()) ?

Not really, the current code seems fine to me.

This allows us to give the same uid to the module bound in the guard, and the one bound in the rhs.

trefis · 2020-03-05T13:26:06Z

Seems I had

misunderstood the comment about making uids local to a compilation unit
misunderstood Ident.reinit when I read it last week.

Anyway, I've now added a reinit function in Types.Uid, and I think this is ready for merging.

damiendoligez

I had a cursory look and have a small suggestion, but really I'm approving on behalf of @Drup.

typing/env.ml

…effects when merging ocaml#8934

trefis mentioned this pull request Sep 11, 2019

An explicit representation for implicit unpacks #8935

Draft

alainfrisch reviewed Sep 13, 2019

View reviewed changes

trefis mentioned this pull request Sep 30, 2019

Make some locations more accurate #8987

Merged

trefis changed the base branch from pr8891 to trunk October 14, 2019 10:13

trefis force-pushed the usage branch from 480a732 to 15813f2 Compare October 14, 2019 10:22

trefis marked this pull request as ready for review October 14, 2019 10:28

Drup approved these changes Feb 26, 2020

View reviewed changes

trefis force-pushed the usage branch from 15813f2 to 56b17bc Compare February 27, 2020 15:50

trefis force-pushed the usage branch from 56b17bc to d719e63 Compare February 27, 2020 16:54

trefis added 8 commits March 5, 2020 13:34

typedtree.mli: un-open Types

9fb4b05

Add a unique id to every signature item

d52dd5c

use uid for usage warnings

e4de6c1

update testsuite

7e37000

typecore: wrap_unpacks => type_unpacks

dacf8b5

This allows us to give the same uid to the module bound in the guard, and the one bound in the rhs.

set_value_used_callback: simplify implementation

8abb519

Changes

6063e73

Types.Uid: include Identifiable

3fe2538

trefis force-pushed the usage branch from d719e63 to 1dd7c4e Compare March 5, 2020 12:40

trefis added 2 commits March 5, 2020 14:20

Uid.reinit

86b33bc

bootstrap

dec26d4

trefis force-pushed the usage branch from 1dd7c4e to dec26d4 Compare March 5, 2020 13:23

damiendoligez approved these changes Mar 5, 2020

View reviewed changes

typing/env.ml Outdated Show resolved Hide resolved

damiendoligez self-assigned this Mar 5, 2020

add forgotten word in comment

b67779b

trefis merged commit c323d11 into ocaml:trunk Mar 6, 2020

trefis deleted the usage branch March 6, 2020 15:49

kit-ty-kate mentioned this pull request May 18, 2020

Support OCaml 4.11 realworldocaml/mdx#261

Merged

emillon mentioned this pull request Mar 29, 2021

Support "Find occurrences" at whole project scale, not just inside the current file ocaml/merlin#377

Open

trefis mentioned this pull request May 10, 2021

Update locations during destructive substitutions #10405

Merged

gasche mentioned this pull request May 10, 2021

Spurious unused value warning with destructive substitution #7852

Closed

EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request May 17, 2021

correct registration of ident_unhandled for Unhandled exception with …

6bea280

…effects when merging ocaml#8934

sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Jan 10, 2022

correct registration of ident_unhandled for Unhandled exception with …

7ab5aef

…effects when merging ocaml#8934

trefis mentioned this pull request Feb 24, 2023

Store uids' declarations instead of node locations #11782

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop relying on location to track usage #8934

Stop relying on location to track usage #8934

trefis commented Sep 11, 2019

alainfrisch Sep 13, 2019

trefis Sep 16, 2019

Drup Feb 26, 2020

Drup commented Oct 9, 2019 •

edited

trefis commented Oct 9, 2019

Drup commented Oct 9, 2019

trefis commented Oct 9, 2019 •

edited

trefis commented Feb 26, 2020

Drup left a comment

Drup Feb 26, 2020

trefis Feb 27, 2020

Drup Feb 26, 2020

Drup Feb 26, 2020

Drup Feb 26, 2020

trefis commented Feb 27, 2020

trefis commented Mar 5, 2020 •

edited

damiendoligez left a comment

Stop relying on location to track usage #8934

Stop relying on location to track usage #8934

Conversation

trefis commented Sep 11, 2019

Motivation

Examples of missing warnings in the ppx era

In this PR

Third party uses of uids

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Drup commented Oct 9, 2019 • edited

trefis commented Oct 9, 2019

Drup commented Oct 9, 2019

trefis commented Oct 9, 2019 • edited

trefis commented Feb 26, 2020

Drup left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trefis commented Feb 27, 2020

trefis commented Mar 5, 2020 • edited

damiendoligez left a comment

Choose a reason for hiding this comment

Drup commented Oct 9, 2019 •

edited

trefis commented Oct 9, 2019 •

edited

trefis commented Mar 5, 2020 •

edited