Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on polymorphic comparisons #9928

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

kit-ty-kate
Copy link
Member

@kit-ty-kate kit-ty-kate commented Sep 19, 2020

This is a new take on #1642 which is a PR @dra27 pointed me towards after explaining a similar idea I had over the holiday.

This PR adds a new builtin alert [@@ocaml.alert polymorphic_comparison] and tag polymorphic comparison functions and operators in the standard library with it. This attribute, much like [@@deprecated] or similar attribute will display a warning when the function tagged is used.
This warning is disabled by default and it won't be enabled by default in dune either.

Incentives

Generic comparison functions are great but they are:

  • Prone to introducing bugs when the type given to them change (for instance adding a new case to a variant). There are countless examples of this in the OCaml ecosystems (for instance here is the more recent example I know of: mcclure/ppx_const@a0e7a67) and I've personally been hit by this multiple times before I made the monomorphic library and later used the containers library that does the same thing since version 2.0. In It is to be noted that base has also the same system where all the polymorphic/generic functions are shadowed.
    • To summarize: Two of the big external libraries already ""banned"" by default the use of generic comparison functions.
  • Slower than specialized functions. The generic functions have an performance overhead by detecting the type of the value at runtime.

Future

In an ideal world, this PR should be the first of four parts:

  1. Add the warning (this is the one)
  2. Add missing modules and functions to use instead of the generic ones.
  3. Add Infix modules to some of the modules of the standard library. Such modules (e.g. Int, Float, …) would have (=), (>=) and so on.
  4. Either wait for modular implicits, or introduce a builtin extension point [%ocaml.compare t] to create comparison functions for big and supposedly straightforward types such as Unix.error or similar. (I'm not totally sold on this idea yet myself)

Regardless of the future plans, I personally think this PR can be merged on its own.

utils/warnings.ml Outdated Show resolved Hide resolved
@kit-ty-kate
Copy link
Member Author

kit-ty-kate commented Sep 19, 2020

Also does anybody have any pointer on the compiler failure? I spent almost 2 hours trying to make it compile unsuccessfully (removing -warn-error A worked to test it), it seems the -w -69 is not taken into account and I'm not sure where to fix this. I didn't see much different compared to the previous PR that added a new warning.

./boot/ocamlrun ./ocamlopt -g -nostdlib -I stdlib -I otherlibs/dynlink -strict-sequence -principal -absname \
    -w +a-4-9-40-41-42-44-45-48-66-69 -warn-error A -bin-annot -safe-string -strict-formats \
    -I utils -I parsing -I typing -I bytecomp -I file_formats -I lambda -I middle_end -I middle_end/closure \
    -I middle_end/flambda -I middle_end/flambda/base_types -I asmcomp -I asmcomp/debug -I driver -I toplevel \
    -function-sections -c bytecomp/dll.ml
File "/home/runner/work/ocaml/ocaml/utils/profile.ml", line 77, characters 22-23:
77 |   if !initial_measure = None then initial_measure := Some start_measure;
                           ^
Error (warning 69 [use-generic-comparison]): Use of generic comparison functions are discouraged.

EDIT: @dra27 helped me found the issue. A large number of modules use [@@@ocaml.warning "+w-..."], they are now individually fixed.

@kit-ty-kate kit-ty-kate force-pushed the use-generic-comparison branch 2 times, most recently from ba1dfbe to 3376507 Compare September 19, 2020 14:12
@alainfrisch
Copy link
Contributor

Cannot we directly use alerts instead?

@kit-ty-kate
Copy link
Member Author

Cannot we directly use alerts instead?

Mh, interesting. I hadn't realized this could be used like this while looking at the code for alerts.

Indeed this could be used instead, I'll look into it sooner rather than later.

@kit-ty-kate
Copy link
Member Author

kit-ty-kate commented Sep 19, 2020

@alainfrisch Does that sound better with f5a4024?
If that sounds good I'll rebase the branch and will remove the first two commits.

The only downside I can see so far is that it's not going to allow people to use it out-of-the-box if they want to keep backward compatibility with OCaml < 4.08 and use ocamlc -alert +generic_comparison.
One thing that can be done though to mitigate that would be to add a field in Dune (e.g. (alerts +generic_comparison)) that would add the parameter when OCaml >= 4.08 only and ignore it before. But maybe just generating a dune file is enough, I guess that's not that big of an issue anyway.

@craigfe
Copy link
Contributor

craigfe commented Sep 24, 2020

@kit-ty-kate: Thanks for this work 🙂

Please forgive the drive-by bikeshedding, but I'm wary of the term "generic comparison" to describe =. My understanding is that the term "generic" in functional programming typically refers to functions that are parametrised by type representations at runtime, e.g.

val equal : 'a typ -> 'a -> 'a -> bool

(These are distinct from parametrically-polymorphic functions because they are not restricted to behaving uniformly on all types.) See for instance, @yallop's writings on the subject. Perhaps the feature could be named polymorphic_comparison or structural_comparison instead?

@kit-ty-kate
Copy link
Member Author

Perhaps the feature could be named polymorphic_comparison or structural_comparison instead?

I used to call them polymorphic comparison operators/functions in kit-ty-kate/ocaml-monomorphic, so polymophic_comparison would be good. I was using generic_comparison because I felt like "polymorphic" was a bit too generic and did not really convey the meaning. I was not sure what to use so I used what #1642 was going for.

Now that I think of it maybe dynamic_comparison would be more descriptive? This would more explicitly say that these functions detect the type of their argument dynamically at runtime.

I'm really not sure what's the best choice here. I was kind of waiting for some better propositions from other people here (you're the first one!) as well as for a better alert message, maybe something more discretive as well.

@dbuenzli
Copy link
Contributor

Now that I think of it maybe dynamic_comparison would be more descriptive?

I don't think it would be. These functions are known as being polymorphic equality/comparison. Searching the interwebs for that will lead you to the controversies that explain you why it's good or bad. I think using another name would be counter productive for the end user.

@jberdine
Copy link
Contributor

FWIW, the manual uses the term "structural equality":

e1 = e2 tests for structural equality of e1 and e2.

I'd default to using the same terminology. I would hesitate to use "polymorphic" since I think it usually is implicitly understood as parametric polymorphic, which is not technically applicable in this case. I have no objection to "generic" as it tends to indicate "not specialized" to me, which is applicable in this case.

@xavierleroy
Copy link
Contributor

"Generic" has been widely used since the 1980's to mean both parametric polymorphism and ad-hoc polymorphism. ADA had "generics"; C++ templates are "generic"; and Cardelli and Wegner's 1985 survey On Understanding Types, Data Abstraction, and Polymorphism writes

We refer to functions like id which require a type parameter before they can be applied to functions of a specific type as generic functions.

In this broad sense, the use of "generic" in the warning discussed here is perfectly appropriate.

The narrower meaning of "generic" as "taking a run-time type description / function dictionary as argument" is more recent and not incompatible, just narrower. That doesn't make it the only meaning.

I would avoid "dynamic comparison", because most comparisons are "dynamic" (take place at run-time rather than compile-time).

"Polymorphic comparisons" is OK, I guess.

"Structural comparison" is by opposition to "pointer comparison", a.k.a. ==, and means recursion over a data structure. A monomorphic, type-specialized comparison function for, say, type int list would still be "structural".

@craigfe
Copy link
Contributor

craigfe commented Sep 24, 2020

In this broad sense, the use of "generic" in the warning discussed here is perfectly appropriate.

The narrower meaning of "generic" as "taking a run-time type description / function dictionary as argument" is more recent and not incompatible, just narrower. That doesn't make it the only meaning.

@xavierleroy: In my defense – and without wishing to be contrarian – in a field in which much terminology has at some point been overloaded by one paper or another, I think it's worth aiming for more than being correct in a "broad sense", particularly given the wide selection of terms to choose from.

Uses of "generic" in the sense of run-time type representations often make an explicit effort to differentiate their usage from the genericity implied by parametric polymorphism. This is especially true in Haskell-oriented literature (e.g. SYB and its descendants) and libraries (e.g. GHC.Generics). In this context, the term "generic" is incompatible with parametric polymorphism and is used for precisely this reason; it creates a meaningful distinction, admittedly using words that are overlapping in their non-technical meanings.

While I agree with you that there are other interpretations of the term "generic" that make it a superset of "parametrically polymorphic", I think it is precisely the more narrow meaning of "polymorphic" that is pertinent w.r.t. incorrect use of = (i.e. that it behaves uniformly for all types and therefore cannot capture semantic equality). There is no inherent problem with equalities that are "generic" in the broad sense of the word (e.g. Java equals()) which – in my opinion – makes this broader notion of "generic" not helpful in this case.

@gasche
Copy link
Member

gasche commented Sep 25, 2020

I am more used to "polymorphic comparison operator" than "generic comparison operator".

(But I don't think either clearly express the fact that the operator is non-parametric.)

The reason why we disallow them is that the notion of equality obtained is imprecise, approximate, compared to the "real" equality predicate that the user has in mind. Maybe we could also say that the parametric (=) is "imprecise", while using, say, String.equal, is precise.

@yallop
Copy link
Member

yallop commented Sep 25, 2020

I'd call these 'representational' comparisons, because they operate on the representation of values rather than on the type structure:

# type t = T : _ -> t;;
type t = T : 'a -> t
# T (Some 3) = T (Ok 3);;
- : bool = true
# T Complex.{re=3.0; im=4.0} = T [|3.0; 4.0|];;
- : bool = true

I think that 'representational' makes it clear both why these operations are useful (because they don't need to be defined afresh for each type) and why they come with a warning (because they don't respect type structure or abstraction boundaries).

@kit-ty-kate
Copy link
Member Author

kit-ty-kate commented Sep 25, 2020

I renamed to polymorphic_comparison as it seems to be more of the consensus (I agree it is more used to describe these).

EDIT: updated the main description of this PR above and its title

@kit-ty-kate kit-ty-kate changed the title Warn on generic comparisons Warn on polymorphic comparisons Sep 25, 2020
@damiendoligez
Copy link
Member

I note that you didn't carry over the compiler change from #1642, which would silence the alert for the built-in operations whenever they are specialized (i.e. replaced by non-generic operations at compile time). Are there any plans to do it? Is there any consensus on this point?

@Octachron
Copy link
Member

It might be simpler to start without this specialization refinement? That would require the use [@alert "-polymorphic_comparison"] to disable this warning when using a polymorphic-but-specialized comparison. But keeping the polymorphic_comparison alert, a simple (user-definable without special compiler support) also has its charm.

@kit-ty-kate kit-ty-kate force-pushed the use-generic-comparison branch 2 times, most recently from d115652 to 312ad56 Compare May 11, 2021 18:37
@kit-ty-kate
Copy link
Member Author

I've rebased the PR. CI is green and this is ready to review.

I've also removed the numerous changes to Makefiles to disable the new alert in favour of regenerating boot/ocamlc as previous versions of the compiler would enable all alerts by default. I feel like this is much simpler and would cause less trouble/merge-conflicts for other PRs.

@damiendoligez
Copy link
Member

It might be simpler to start without this specialization refinement?

I agree, let's start with the simple thing and see how it fares. We can make it more complex later if we decide it's worth the work.

@@ -92,6 +92,16 @@ let alert_attr x =
Some (x, "deprecated", string_of_opt_payload x.attr_payload)
| "ocaml.alert"|"alert" ->
begin match kind_and_message x.attr_payload with
| Some ("polymorphic_comparison" as kind, message) ->
let message = match message with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth special casing this new alert here? I guess this is to avoid duplicating the more verbose message everywhere. But someone explicitly enabling the alert would know what it means, just seeing its name, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it is worth the special case here to avoid duplication.

But someone explicitly enabling the alert would know what it means, just seeing its name, no?

To me, the rational is that if this is enabled in a medium-size or bigger project with several people and someone is contributing for example, they might not be familiar with what the alert means and might be discouraged by only seeing

File "alert_polymorphic_comparison.ml", line 12, characters 9-16:
12 | let _a = compare
              ^^^^^^^
Alert polymorphic_comparison: Stdlib.compare

for example, if it there was no explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. What about only replacing an empty payload with the full message? If one later introduce functions parametrized over a comparison function, one might want to provide explicit messages to the "legacy" version, and adding the generic message would feel weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely convinced by the argument for the special casing. Copying the alert message is certainly boring when writing the PR, but this is an one time issue. I would rather keep the alert as vanilla as possible. If this ends up creating too much maintenance pain, it is probably a sign that we should improve the interface for all alerts (and not only the polymorphic_comparison one).

@alainfrisch
Copy link
Contributor

  • What about the "generic interface" in Hashtbl? The doc doesn't say it explicitly, but it does of course rely on the polymorphic equality. Perhaps just annotating Hashtbl.create is enough?

  • In Expose {Int,Int32,Int64,Nativeint}.{min,max} #10389 (comment), @xavierleroy noted that one should first introduce replacement functions, wait for a few versions, then only deprecate the "dangerous ones". The difference here is that a new alert, disabled by default, is introduced, instead of using the "deprecated" one. But still, it seems useful to introduce rather soon the replacement functions (comparison function in various modules, and variantsof, say, List.assoc taking an explicit comparison function argument).

  • Even when an explicit comparison function exists, say, Int.compare or String.compare, how can we use them to check that an element is, say, less than or equal to another one? The alert would prevent writing String.compare s1 s2 <= 0... Is the plan to expose not only String.compare functions, but also String.leq; or force users to use String.Infix.(s1 <= s2) or String.Infix.( <= ) s1 s2?

@Octachron
Copy link
Member

For comparison functions, the solution used by base or containers is to restrict the type of the binary comparison operators (=,<>,<,<=,>=,>) to int -> int -> bool. We could add a Monomorphic_comparison module or submodule to emulate this behavior.

@kit-ty-kate
Copy link
Member Author

kit-ty-kate commented May 31, 2021

What about the "generic interface" in Hashtbl? The doc doesn't say it explicitly, but it does of course rely on the polymorphic equality. Perhaps just annotating Hashtbl.create is enough?

Thanks a lot, I missed these ones. I've tagged the functions that use polymorphic compare. I hope I haven't missed any.
Tagging Hashtbl.create wouldn't be enough in case the hashtbl is created elsewhere in a library that doesn't use this alert so I preferred to tag every functions according to their implementation.

In #10389 (comment), @xavierleroy noted that one should first introduce replacement functions, wait for a few versions, then only deprecate the "dangerous ones". The difference here is that a new alert, disabled by default, is introduced, instead of using the "deprecated" one. But still, it seems useful to introduce rather soon the replacement functions (comparison function in various modules, and variantsof, say, List.assoc taking an explicit comparison function argument).

I agree, I also said that in the PR description. This PR is still useful as-is and could get the ball rolling for people who want to experiment with it, so we can find where the best place or name is to add those new functions/operators.

Even when an explicit comparison function exists, say, Int.compare or String.compare, how can we use them to check that an element is, say, less than or equal to another one? The alert would prevent writing String.compare s1 s2 <= 0... Is the plan to expose not only String.compare functions, but also String.leq; or force users to use String.Infix.(s1 <= s2) or String.Infix.( <= ) s1 s2?

yes that would be ideal. For the moment people usually default their implementation of the comparison operators to int. This is the case in containers and base/core. In containers some extra operators for float are also available: see CCMonomorphic.ml. In my own ocaml-monomorphic library I have several base module to open, depending on what people want too: see monomorphic.mli

Fair enough. What about only replacing an empty payload with the full message? If one later introduce functions parametrized over a comparison function, one might want to provide explicit messages to the "legacy" version, and adding the generic message would feel weird.

I've updated the PR to not have any special case. I wasn't sure about the default message to use, so I didn't put anything. This can always be improved in the future, but just in case this PR can be merged before the 4.13 branching I prefer not to have to make more choices to avoid points of contention.

I'm not sure what's wrong with the hygiene script, I've added the alert to all the functions in MoreLabels too but it seems to still complain. Does anyone have any clue?

let current =
ref
{
active = Array.make (last_warning_number + 1) true;
error = Array.make (last_warning_number + 1) false;
alerts = (Misc.Stdlib.String.Set.empty, false); (* all enabled *)
alerts = (disabled_alerts, false); (* all enabled expt disabled_alerts *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expt disabled_alertsexcept [disabled_alerts] ?

@@ -88,29 +88,35 @@ val add : ('a, 'b) t -> 'a -> 'b -> unit
(Same behavior as with association lists.) *)

val find : ('a, 'b) t -> 'a -> 'b
[@@ocaml.alert polymorphic_comparison]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source file for this interface is stdlib/templates/hashtbl.template.mli .

@alainfrisch
Copy link
Contributor

yes that would be ideal. For the moment people usually default their implementation of the comparison operators to int. This is the case in containers and base/core. In containers some extra operators for float are also available: see CCMonomorphic.ml. In my own ocaml-monomorphic library I have several base module to open, depending on what people want too: see monomorphic.mli

My concern is that the PR would leave the stdlib in a state which is not really usable when the alert is enabled. People who use a library that shadows the the comparison operator in order to restrict it to type int would benefit less from the PR anyway (only for functions like List.assoc, etc) and we don't have a compelling story for others.

This PR is still useful as-is and could get the ball rolling for people who want to experiment with it, so we can find where the best place or name is to add those new functions/operators.

I see the argument, but I'm not sure people will actually spend time rewriting significant code bases to avoid the marked functions until we provide the "official" target solution (and I'm not sure we want people to create their own solutions, leading to more style fragmentation across the community).

In #1642, there was the idea of allowing some uses of the existing polymorphic functions, with various variants of that idea. I was not personally a big fan, but there seemed to be a rather wide support for the idea, meaning that a lot of code would not need to be rewritten at all (and people knowing that this might happen will not enable the alert). I can see why people are happy continuing writing if x = 0 || y <= 0. then ..., and they would dislike it if we encourage them to avoid that now and tell them later that it's actually ok.

@bobzhang
Copy link
Member

bobzhang commented Jul 6, 2021

In ReScript(was BuckleScript), we had a similar warning, but it only warns when the type specialisation does not happen, so that we can still use "=" for most cases, it seems less intrusive than this proposal

@kit-ty-kate
Copy link
Member Author

Would one of the upstream dev be interested in doing an audio call to get this over the line before 5.01 ? I’ve tried several times to integrate the ideas of #1642 to this PR but I don’t know enough about the compiler internals to get it to work and I need some guidance.

@dbuenzli
Copy link
Contributor

dbuenzli commented Jul 21, 2022

It seems I'm not as much against polymorphic comparison as most people tend to be, but there is one thing I would like to be done, which that the unqualified use of compare (I guess that implies Stdlid.compare too) to be simply deprecated.

Of course that entails that Sys.compare should first be added for this to happen.

I just lost a few hours tracking down a regression in a non-trivial computational geometry algorithm I'm implementing because of the following. I had that:

module M = struct
  let compare v0 v1 =(* M.t specific [compare] *)
  let f=(* uses [compare] *)
end

Now that f wasn't really at the right place so I moved it to another module and you guess what happened.

I'm pretty sure it's not the first time this happens to me.

@c-cube
Copy link
Contributor

c-cube commented Jul 21, 2022

I like @dbuenzli 's idea! After all, polymorphic operators are magic things living in the runtime, Sys might be a good place for them and I think it's legit to reach for them regularly (even if just to define let equal = Sys.(=) locally).

@dbuenzli
Copy link
Contributor

This of course is not on par with what this PR is trying to propose.

But given the M.compare convention established by Set.S and Map.S, unqualified compare is a footgun for everyone, whether you like polymorphic comparison or not.

I think we should remove its use by basic identifier deprecation (this PR wouldn't help since the proposed warning would be disabled by default).

Nothing more is needed for this except to introduce Sys.compare now and wait for a few versions.

@kit-ty-kate
Copy link
Member Author

I’m not convinced that Sys is the right place for that. Why not having a new module Poly like base does?
This way we could also do the same thing for all the polymorphic functions defined in Stdlib ((=), (<), …) and for people who really want to keep using those functions without changing large parts of their code or indroducing conflict it would be as easy as an open Stdlib.Poly at the beginning of the modules who need it.

If you think the name is bad I’m not overly attached to it but i strongly think it should be moved in a module containing only those functions and nothing else, if we’re going this way.

@c-cube
Copy link
Contributor

c-cube commented Jul 21, 2022

@kit-ty-kate how about Sys.Poly ? Sys is already a bit of a natural place for magical functions (such as opaque_identity). I don't know if coopting toplevel Poly is worth it — I could see it used in programs that use polynomials…

@xavierleroy
Copy link
Contributor

For the record: I do not support the proposed deprecation of generic comparisons. But @kit-ty-kate is right that Sys is not the place to put them. Sys is intended as an OS-independent API for system calls, such as listing directory contents or getting timing information, nothing more.

@dbuenzli
Copy link
Contributor

For the record: I do not support the proposed deprecation of generic comparisons.

Note that I'm not proposing to deprecate generic comparisons. The only thing I'm proposing to deprecate is the bare compare identifier which is a footgun.

Sys is intended as an OS-independent API for system calls, such as listing directory contents or getting timing information, nothing more.

I'd say there's already more than that there Sys.opaque_identity, Immediate64, runtime system warnings and information. Personally I rather started to equate it with runtime system (and think that OS independent API system calls belong to Unix (sic)). But in any case I agree it may not be the best place.

To echo @c-cube, Poly could also be a polygon :-). Stdlib.Generic.compare could be an option. Unless people may see another use for such a generic name.

@c-cube
Copy link
Contributor

c-cube commented Jul 21, 2022

To echo @c-cube, Poly could also be a polygon :-). Stdlib.Generic.compare could be an option.

Stdlib.Adhoc.compare 😜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet