Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Mp_present for module aliases #10673

Draft
wants to merge 36 commits into
base: trunk
Choose a base branch
from

Conversation

mrmr1993
Copy link
Contributor

@mrmr1993 mrmr1993 commented Oct 3, 2021

This PR changes the default behaviour of module aliases to make all modules present (Mp_present) by default. This will allow us to support functor aliases, and will simplify the implementation of transparent ascription (previous implementation in PR #10612).

The changes here are based on the discussion in the comments of #10435.

With this PR odule aliases (module Foo = Bar) are now always 'present', in the sense that Bar can be accessed via the alias Foo at runtime, instead of transparently rewriting Foo to Bar in the codegen logic.

Breaking changes

  • This breaks compatibility between module types which had reordering of their module aliases.
    • Arguably, this should never have worked (and I believe it didn't used to in e.g. 4.07). By doing this, we were inadvertently exposing module presence to the user. For example:
module type S = sig
  module M : sig type t end
  module N = M
  module O = M
end
module type S' = sig
  module M : sig type t end
  module O = M
  module N = M
end
type t = (module S)
type t' = (module S')
let compare (t : t) (t' : t') = t = t';; (* Now a unification error *)
  • This changes the behaviour of module recs which contain module aliases. For example:
module rec M : sig
  module N : sig type t end
  module O = N
end = struct
  module N = struct type t end
  module O = N
end;; (* Still works *)
module rec M : sig
  module N : sig type t end
  module O = N
end = M;; (* No longer works, O is an 'unsafe module'. *)
  • This causes external modules referred to via module aliases to be linked into the output binary, even when they are only re-exported and are not used for code within the current module.
    • Aliases to the module names of external modules (e.g. Stdlib, Stdlib__Int, Parsetree, etc.) are still treated as absent aliases when --no-alias-deps is true. This carve-out is deliberately limited to avoid doing anything overly complex, but still allowing dune-style 'library' modules, which declare these aliases before the aliased modules are available and would be broken otherwise.
  • This breaks most uses of aliasing into 'modules without implementation'.
    • Aliases to the toplevel module can avoid this by using --no-alias-deps , but it is no longer possible to alias any submodule.
    • This decision is based on @lpw25's comment here.
  • Named module types always use a present module, so 'modules without implementation' cannot ever be aliased inside them. For example,
module type S = sig
  module P = Parsetree (* From Compiler_libs *)
end
module M : S = struct
  module P = Parsetree
end (* Linker will be unable to find an implementation for Parsetree and will fail *)
let (m : (module S)) = (module struct module P = Parsetree end) (* Linker failure, as above *)

Notes for reviewers

  • The majority of the changes are in typemod.ml, focused on passing around the absent_globals state, and using this state to determine whether aliases should be present or absent.
  • Mtype.make_aliases_absent has been replaced with make_aliases_present, used to convert the module type returned by module type of to one with all present aliases, if it is being used as part of a named signature.
  • Includemod.modtypes now places both modtypes in the environment, so that two Mty_aliases can be expanded correctly to build a coercion between them, now that they may have incompatible runtime layouts.
  • The new testcase in typing-modules/aliases.ml has been added to test the above, extending the previous test to make sure that child modules also behave correctly.
  • Some other tests were broken, and have been tweaked to make them work again.

typing/mtype.ml Outdated Show resolved Hide resolved
@xavierleroy
Copy link
Contributor

This causes external modules referred to via module aliases to be linked into the output binary, even when they are only re-exported and are not used for code within the current module.

Isn't this a show-stopper? The initial motivation for module aliases was precisely to avoid forcing linking of compilation units that would otherwise be unreferenced. I'm confused.

@mrmr1993
Copy link
Contributor Author

mrmr1993 commented Oct 4, 2021

Isn't this a show-stopper? The initial motivation for module aliases was precisely to avoid forcing linking of compilation units that would otherwise be unreferenced. I'm confused.

@xavierleroy I understood the same, and have a version of this PR described here that only makes modules Mp_present when we absolutely have to (in packed modules).

I don't have the data to support a decision either way. @lpw25 in your reply on #10612 it sounded like you might have some numbers on the tradeoffs we'd expect in executable size/runtime cost here vs compilation time with the alternative?

@nojb
Copy link
Contributor

nojb commented Oct 4, 2021

I'm confused.

I was too, but if I understand this correctly the "old" module alias behaviour is unchanged when using --no-alias-deps, so that Dune-style namespacing continues to work.

Copy link
Contributor

@lpw25 lpw25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your decision to restrict the absent aliases to global identifiers is interesting, and probably a good idea. However, I think that it is a little too restrictive in practice. A common practice is to have:

(* a.ml *)
module Bar = Foo__Bar

and:

(* b.ml *)
open! A
module Bar = Bar

where using B is not supposed to cause Foo__Bar to get linked unless you actually use B.Bar. With this patch A.Bar would be absent, since Foo__Bar is a global identifier, but B.Bar would be present since A.Bar isn't global.

Additionally, if you are going to have a restriction based on the destination of the alias, then you can actually allow absent_globals to be true inside the body of functors. The restriction from the body of functors is only needed if you allow all aliases to be absent.

I think there are two reasonable approaches:

  1. Allow absent aliases for all paths when absent_globals is true. Don't allow absent aliases in functor bodies or module types.
  2. Allow absent aliases for paths which normalize to a global identifier. Allow absent aliases in functor bodies. Don't allow absent aliases in module types.

1 was what I was proposing in my previous comments, but I think 2 is probably the better design. I think the restriction will be less surprising for users, and it will work better if we add an explicit syntax for absent aliases.

typing/typemod.ml Outdated Show resolved Hide resolved
typing/typemod.ml Outdated Show resolved Hide resolved
typing/typemod.ml Outdated Show resolved Hide resolved
typing/typemod.ml Outdated Show resolved Hide resolved
typing/includemod.ml Outdated Show resolved Hide resolved
typing/includemod.ml Outdated Show resolved Hide resolved
typing/includemod.ml Outdated Show resolved Hide resolved
Ok Tcoerce_none
else
(* This path contains local references, we may need to build a
coercion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I think we could avoid this problem if we had transparent ascription. The underlying issue is that given:

struct module A : S module B = A end
: sig module A' : S' module B' = A' end

(where ' is just used to distinguish the two sides) we produce a substitution like:

A' :-> A

but the correct substitution is really:

A' :-> (A <: S')

which would automatically give the right coercion in these cases.

Once we have transparent ascription that may be a better approach, although it requires some care around recursive modules (S' might refer to A', but not in aliases, so those needs to be substituted out first using the current substitution) and also some care to ensure efficiency.

@lpw25
Copy link
Contributor

lpw25 commented Oct 4, 2021

The initial motivation for module aliases was precisely to avoid forcing linking of compilation units that would otherwise be unreferenced.

They only have that behaviour in -no-alias-deps mode, and after this patch they would still mostly have that behaviour in -no-alias-deps-mode.

@garrigue
Copy link
Contributor

garrigue commented Oct 5, 2021

I think there are two reasonable approaches:

  1. Allow absent aliases for all paths when absent_globals is true. Don't allow absent aliases in functor bodies or module types.
  2. Allow absent aliases for paths which normalize to a global identifier. Allow absent aliases in functor bodies. Don't allow absent aliases in module types.

1 was what I was proposing in my previous comments, but I think 2 is probably the better design. I think the restriction will be less surprising for users, and it will work better if we add an explicit syntax for absent aliases.

I agree that 2 was the real goal for which absent aliases were introduced. The normal form is always known statically.
I'm not sure of what you mean by "don't allow absent aliases in module types". You certainly want to allow them in mli's. The need is less clear for module type definitions, but the borderline is not so clear. What would be broken by keeping them for all aliases resolving to a global identifier? If compatibility is a problem, we could switch to a format in which absent aliases use a physical (empty) slot.

@lpw25
Copy link
Contributor

lpw25 commented Oct 5, 2021

If you allow them in module type definitions then the difference will be observable by the user because, whilst absent and present aliases are both subtypes of each other, they are not equal types. Everywhere else you are either covariant or contravariant and so the difference is not detectable.

As discussed at the meeting, if you add an actual syntax for absent aliases (I would suggest module M == Foo) then there is nothing wrong with exposing the difference between them. I would still advocate for no-alias-deps not silently producing absent aliases in module type definition though -- the user can just write an absent alias if that is what they really want. I'd like to propose this kind of syntax in a second PR after this one.

Copy link
Contributor

@lpw25 lpw25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small fixes needed. Once they're addressed I think this is ready to go.

@@ -58,11 +58,3 @@ File "w53.ml", line 46, characters 23-36:
46 | module I' = Set.Make [@ocaml.inlined]
^^^^^^^^^^^^^
Warning 53 [misplaced-attribute]: the "ocaml.inlined" attribute cannot appear in this context
File "w53.ml", line 48, characters 23-30:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is revealing an existing issue with detecting incorrectly placed [@@inlined] attributes. The following should warn but doesn't:

# module M = struct type t end[@@inlined];;
module M : sig type t end

Module aliases with -no-alias-deps were are now handled by the same code as this case, and so they have lost their warning.

typing/typemod.ml Outdated Show resolved Hide resolved
typing/typemod.ml Outdated Show resolved Hide resolved
typing/typemod.ml Outdated Show resolved Hide resolved
typing/typemod.ml Outdated Show resolved Hide resolved
typing/typemod.ml Outdated Show resolved Hide resolved
typing/includemod.ml Show resolved Hide resolved
@gasche
Copy link
Member

gasche commented Oct 12, 2021

Once they're addressed I think this is ready to go.

The confusion of @xavierleroy and @nojb suggests that it would be worth taking a step back and:

  • writing some short informal-prose description of the change, that justifies its usefulness but also details the compatibility story
  • writing a couple short examples of things that were possible before with -no-alias-deps, and discussing how they are affected by the change

@mrmr1993
Copy link
Contributor Author

I think that you are attempting to define a semantics along the lines of "if the expression contains no functor applications then the module is physically equal to the result of the expression, otherwise it was merely produced by the given expression". But this semantics is not stable under substitution. For example, you can define a functor like:

I've pushed an implementation that handles this case properly (tested using this tweaked branch, which allows functor aliases in signatures). Using the template:

module G
  (Inputs : sig
    module type S
    module type Res
    module F : functor (X : S) -> Res
  end)
  (X : Inputs.S)
  (Arg : sig
    module A = Inputs.F(X)
    module B = Inputs.F(X)
    module C = B
    module D == B
    module Bar (Y : sig module M = A end) : sig type t end
    module Baz : sig module M = B end
    type t = Bar(Baz).t
  end) = Arg;;

and modifying Bar.Y.M and Baz.M, this implementation gives:

Bar.Y.M Baz.M Typechecks (Baz.M is an instance of Bar.Y.M)
Inputs.F(X) Inputs.F(X) yes
Inputs.F(X) A yes
Inputs.F(X) B yes
Inputs.F(X) C yes
Inputs.F(X) D yes
A Inputs.F(X) no
B Inputs.F(X) no
C Inputs.F(X) no
D Inputs.F(X) no
A B no
A C no
A D no
B C yes
B D yes
C B yes
C D yes
D B yes
D C yes

Of course, once we have this, we can use this second bullet to uplift a = M to an == M, because we know that it really does refer to the same module whenever we check the subtyping.

Are you really sure of that: to uplift, you also need to be sure that M is not the result of a functor application, because otherwise it could have been used as an alias for an another instance of the same functor application (see my example above).

@garrigue I'm fairly confident: if we have an absent alias to a (non-functor) path M, either M is present, or M is an absent alias (and we recurse). This does make the assumption that functor arguments also count as functor paths, but doing so fixes the substitution issues that would otherwise cause.

I'm not sure which example you're referring to, so I'm not clear on whether there are still edge-cases we're missing.

This ensures that we correctly reject e.g.

M = G(A)
((_ : sig module N = F(G(A)) end) : sig module N = F(M))
@mrmr1993
Copy link
Contributor Author

mrmr1993 commented Oct 20, 2021

Pushed another small tweak to make sure that

module G
  (Inputs : sig
    module type S
    module type S2
    module type Res
    module F : functor (X : S) -> Res
    module G : functor (X : S2) -> S
  end)
  (Y : Inputs.S2)
  (Arg : sig
    module X = Inputs.G(Y)
    module A = Inputs.F(X)
    module B = Inputs.F(X)
    module C = B
    module D == B
    module Bar (Y : sig module M = Inputs.F(X) end) : sig type t end
    module Baz : sig module M = Inputs.F(Inputs.G(Y)) end
    type t = Bar(Baz).t
  end) = Arg;;

also doesn't typecheck.

Edit:

Then the semantics for present aliases is not that they indicate the modules are physically equal to each other, but that the modules were produced by the same expression. This results in some behaviour that is a little surprising, but that is also a natural consequence of allowing applicative functors to have side-effects.

With this, the semantics for present aliases is that they indicate that the modules have the same provenance -- ie. they were created by combining the same underlying modules, optionally including functor applications (although not enabled in this PR), and the subtyping rule lets any of the underlying modules be weakened to the functor application that constructed it.

@lpw25
Copy link
Contributor

lpw25 commented Oct 20, 2021

Your tweak is just trying to go back to the approach of attempting to keep track of functor arguments and treat them specially. It will hit all the same issues as that system, and is the very thing we are trying to move away from. The issue here is a fundamental one, not an artifact of the particular choice of normalization algorithm. Whenever we try to use typing rules that are not stable under substitution we end up making something that is fragile and complex, I'm really keen to not make that mistake again.

@mrmr1993
Copy link
Contributor Author

Your tweak is just trying to go back to the approach of attempting to keep track of functor arguments and treat them specially. It will hit all the same issues as that system, and is the very thing we are trying to move away from.

I suspect we have different goals here: I want to support functor aliases, and I want a module type system that still makes sense once they're there. Personally, I'm not trying to move away from anything.

Perhaps we're at an impasse. For me as a user of OCaml, it's not acceptable to have a module type system that confuses the locations of mutable data. I strongly don't want anything that will cause this to compile:

module F(A :S) = struct let x = ref A.x end
module M = F(A)
module N = F(A)
module Do_thing_F (A : sig module N = N end) = struct
  let do_thing () = incr A.N.x
end
module Do_thing = Do_thing_F(struct module N = M end)
let res = Do_thing.do_thing (); !N.x

Whenever we try to use typing rules that are not stable under substitution we end up making something that is fragile and complex, I'm really keen to not make that mistake again.

I have sympathy for this, but I think building a slightly more complex system that supports functor aliases and (ascribed) functor arguments, will end up more maintainable. Concretely: once we have them, we can add tests for the interactions of these in all the various substitution kinds, and then the tests will let us know if something is awry. At the moment, things feel far more fragile because we don't have good ways to test the interactions, and so we have to be cautious that they don't leak in through some edge case.

@lpw25
Copy link
Contributor

lpw25 commented Oct 20, 2021

Having said that the issue with that approach is fundamental, there is a fundamental change which Jacques has previously suggested that would allow it to work. It would also fix the other surprising behaviours that you can get from impure applicative functors like my Choose_an_ordering example above.

The change would be to disallow functor applications from the arguments of other functor applications in paths. So types like Set.Make(Choose_an_ordering(X)).t would be prohibited. If you wrote something like:

module M = Set.Make(Choose_an_ordering(X))

it would be type-checked the same as:

module M = Set.Make(struct include Choose_an_ordering(X) end)

is now.

Similarly, module type expressions of the form:

S with module Foo := Bar(Baz)

would be banned.

Using nested functor applications in paths like this is probably quite rare, but it would definitely require checking how much of opam was broken by the change. I think it might also technically decrease the theoretical expressivity of the module system. For example, it is not clear that my implementation of Girard's paradox would still be possible. In an ideal world, I think we would instead require that applicative functors be pure, and then not have to worry about any of this. Possibly that will be feasible once there is an effect system.

@mrmr1993
Copy link
Contributor Author

The change would be to disallow functor applications from the arguments of other functor applications in paths. So types like Set.Make(Choose_an_ordering(X)).t would be prohibited. If you wrote something like:

module M = Set.Make(Choose_an_ordering(X))

it would be type-checked the same as:

module M = Set.Make(struct include Choose_an_ordering(X) end)

is now.

I think your example above says more about the hazards of un-signalled mutability than anything about the module system. I'm not sure it's something that we need to fundamentally alter the module system to fix. To my mind

  • the value has type M.t, which really is exactly the same as N.t and Set.Make(Choose_an_ordering(X)).t
  • the function M.mem really did come from an instance of Set.Make(Choose_an_ordering(X)), as did N.mem, but they are different code blocks and can't be de-facto expected to do the same thing
  • really what you seem to be after is types parameterised by modules, and a way to make them 'strict' (ie. require a module path, or have no side-effects). If we had type (module M : S) t, and a strict type (module !M : S) t, we could reject the functor application as having no concrete type, as desired.

@lpw25
Copy link
Contributor

lpw25 commented Oct 20, 2021

I think building a slightly more complex system that supports functor aliases and (ascribed) functor arguments, will end up more maintainable.

It's not really about making a slightly more complex system. The issue is that I don't believe it is possible to build a system that distinguishes functor parameters and functor applications that isn't broken. Certainly all previous attempts have been. The idea runs so counter to the underlying ideas behind the module system that there just always ends up being some way around it.

To avoid this I think that whatever rules that we use for functor applications will need to be the rules for all present aliases. What happens if you apply the rules you are proposing for aliases containing functor applications to all present aliases? My understanding is that it would basically mean that this would be an error:

module M = struct end
module N = M
module Foo : sig
  module O = N
end = struct
  module O = M
end

whilst this would work fine:

module M = struct end
module N = M
module Foo : sig
  module O = M
end = struct
  module O = N
end

That's a bit odd, but I think it would probably work fine. It would avoid all the issues with distinguishing functor applications, whilst also ensuring that module M = N means that M is physically equal to N.

@mrmr1993
Copy link
Contributor Author

The issue is that I don't believe it is possible to build a system that distinguishes functor parameters and functor applications that isn't broken. Certainly all previous attempts have been. The idea runs so counter to the underlying ideas behind the module system that there just always ends up being some way around it.

I believe that it ought to be possible, and that we're getting closer to that system with the current version of this PR. I realise that this isn't a particularly useful sentiment, so perhaps I can propose a way forward:

If I can

  • implement functor aliases, transparent ascription, and ascribed aliases to functor arguments on top of this
  • add an extensive testsuite for each
  • add tests for each interaction of the kinds of alias with each of the kinds of substitution
  • justify why the tests cover all of the cases, by describing how each other instance reduces to one or more of the existing tests

does this leave us in a place where we can consider merging this?

Of course, I would rather avoid 'completing the journey' before moving forward with this PR, but maybe it's what needed to figure out if this additional complexity in service of an impossible goal.

@lpw25
Copy link
Contributor

lpw25 commented Oct 20, 2021

I think I would prefer exploring the suggestion in my previous message first. It's a much simpler approach and I think it addresses all the concerns raised so far.

In some sense it is the correct approach if you interpret module M = E as meaning that M was produced by the expression E, as you clearly must do for the case where E is impure. It can seem odd to treat module M = N as if the expression N was impure, but since functor arguments are allowed to be impure expressions, we are actually in a system where module identifiers can stand for impure module expressions and should be treated as such.

@mrmr1993
Copy link
Contributor Author

I'm not keen on this suggestion, on the grounds that it breaks existing code if we can only 'go forward' through aliases. If we were starting from scratch, it might be less objectionable. It also leads to some awkwardness with short aliases, e.g.

module F (X : sig module M = Foo.Bar.Baz (* ... *) end) = struct (* ... *) end
(* ... *)
module Baz = Foo.Bar.Baz
(* ... *)
module Res = F(struct module M = Baz (* ... *) end) (* :'( *)

For now, I'm going to continue to work through the direction I proposed above.

@lpw25 lpw25 dismissed their stale review October 20, 2021 13:52

Design is still in flux, not going to merge the version that was reviewed.

@garrigue
Copy link
Contributor

@mrmr1993
I'm not sure to follow everything here, as you're talking of tweaks without defining exactly what they are doing (other than trying to look at the code, which would probably take too much time, and be error-prone).

On way to understand what you are saying is that you define the subtyping and equivalence relations in terms of expansion.

  • a path is a subtype of another path only if it expands to a path that is equivalent to that path (i.e, it is not sufficient to have a common expansion):
    p1 <= p2 iff p1 ->* p1' and p1' ~ p2.
  • two paths are equivalent if they are identical or if they both expand to a common path that does not contain applications (and does not start with a functor argument); this relation is extended by structural congruence:
    p1 ~ p1
    p1 ~ p2 if p1 ->* p3, p2 ->* p3, p3 noapps
    p1(p2) ~ p1'(p2') if p1 ~ p1' and p2 ~ p2'
    p1.s ~ p2.s if p1 ~ p2

With this definition, my counter-example does not apply anymore, and I think that you get more or less what you want. Building the correct coercions can still be complicated, and the subtyping relation may be restrictive and hard to check.

@lpw25
Copy link
Contributor

lpw25 commented Oct 24, 2021

At this point I think it would be useful for me to summarise my
understanding of the situation.

OCaml allows impure applicative functors. This means that paths can be
impure. In turn, this means that some module identifiers stand for impure
module expressions rather than standing for concrete module values.

As long as you're only projecting types out of these impure module paths
there are no issues, because the side-effects are guaranteed not to
affect the types parts of modules. However, there are two places in the
type system where module expressions are used to represent the whole
module, including its dynamic components:

  1. The arguments to functors refer to the whole module. Allowing impure
    expressions as these arguments makes some types equal that should not
    be considered equal. As in this example:

    let compare = Stdlib.compare
    let rev_compare x y = - (Stdlib.compare x y)
    
    module Choose_an_ordering(X : sig end) = struct
      type t = int
      let compare = if Random.bool () then compare else rev_compare
    end
    
    module X = struct end
    module M = Set.Make(Choose_an_ordering(X))
    module N = Set.Make(Choose_an_ordering(X))
    
    (* Only sometimes true *)
    assert ((N.mem 1 (M.add 1 (M.add 2 (M.add 3 M.empty)))))
  2. Module aliases refer to the whole module. Currently we use absent
    aliases, which cannot support aliases to impure expressions. This
    results in quite simple examples that crash the compiler:

    module Foo (X : sig end) = struct
      module M = struct let x = Random.int 10 end
      module type S = sig module N = M end
    end
    
    module Bar = struct end
    
    module Baz (X : Foo(Bar).S) = struct X.N.x end;;
    
    >> Fatal error: Cannot find address for: X.N.x
    Fatal error: exception Misc.Fatal_error

I can see three basic solutions here:

  1. Force applicative functors to be pure. This makes all paths pure and
    means all module identifiers stand for a pure module expression. From
    a theoretical perspective this is the ideal solution. Currently the
    only real tool we have to do this would be a value restriction, which
    would be very restrictive. We may at some point have an effect system,
    but even then it would be a very breaking change.

  2. Use present module aliases and treat all module paths as impure when
    checking equality of module aliases. This would mean that this code
    would produce a type error:

    module M = struct end
    
    module N = M
    
    module Foo : sig
      module O = N
    end = struct
      module O = M
    end

    because if M stands for an impure module expression then two
    aliases to it (N and Foo.O) may not actually be equal.

    This would also be a breaking change, but possibly it wouldn't break
    very much code.

  3. Use present module aliases, treat all module paths as pure and just
    ignore the bad behaviour that you get when they aren't. In particular,
    this would allow code such as:

    module F(A :S) = struct let x = ref A.x end
    
    module M = F(A)
    module N = F(A)
    
    module Increment_alias_to_N (A : sig module N = N end) = struct
      let increment () = incr A.N.x
    end
    
    module Increment_M = Increment_alias_to_N(struct module N = M end)
    
    let res = Increment_M.increment (); !N.x

    Where a module of type module M = N is not physically equal to N,
    such that !M.x and !N.x produce different values.

    I personally think this is the most practical solution. I do not
    really expect users to hit these corner cases in practice, and
    whenever they do there will always be an impure applicative functor
    you can point at and say: "This functor is impure, it should be
    generative".

Note that 2 and 3 do not address the issue with functor arguments, which
we would continue to ignore.

Any solution other than those 3 will amount to a sort of effect system
that tracks the purity of module identifiers and paths. Note that the
approach taken for absent aliases in this PR is one such system: all
compilation units and projections from them are necessarily pure.

The existing checks for functor arguments when making absent aliases is
also fundamentally an attempt at such a system but, as shown above, it
doesn't actually work.

Personally, I suspect that anything beyond the "global paths" approach
is going to be quite tricky to get right and will probably require
exposing the notion of purity to the user.

@gasche
Copy link
Member

gasche commented Oct 24, 2021

The summary of @lpw25 is very clear, and its problem analysis corresponds to my non-expert understanding of the situation as I read this surprisingly-intricate thread. I have the same general sentiment that impure applicative functors are known to introduce issues in the rest of the system, and that the fact that they interact badly with a new feature is something we should be willing to accept (instead of making the other features more complex to try to work around impure-applicative problems).

@mrmr1993 has an example in #10673 (comment) that he finds "not acceptable", but to me it is not clearly more or less acceptable than other known-issues with impure applicative functors, which do weaken some expected abstraction properties when they are used. In either case, type-safety is preserved, some other form of reasoning is made invalid, and it's not clear that this is hurting users in practice -- somehow people tend to not define functors that generate observable mutable state and apply it several times and do strange things with the resulting modules.


Thinking out loud: when I read @lpw25 description, one relatively natural idea is to ask for (2), with some approximate analysis of module expression purity: struct end is pure (and in general modules that only define values), so the example of (2) could be made to work by tracking this information in the module expression and its aliases. I guess that this idea does not work well in practice, because the information is not carried across abstraction boundaries.

@mrmr1993
Copy link
Contributor Author

I can see three basic solutions here:

I'm still not keen on any of these, I'm afraid. I've been imagining a slightly more complex solution, with a few basic rules, in order of applicability:

  • absent module aliases to functor applications are not allowed, unless the applications can be removed by normalisation
  • present module aliases may contain a functor application (constructive alias)
  • an absent module alias to a path within a functor argument is allowed (canonical alias)
  • a present module alias containing a functor argument on its 'spine' (ie. ignoring functor arguments in Papplys) must be ascribed (ascribed constructive/canonical alias)
  • a present module alias to a 'local' module is allowed (canonical/constructive alias)
  • an absent module alias to an absent 'local' module follows the rules for an alias to its target
  • a present module alias to a 'global' module is allowed, or made absent if -no-alias-deps and allowed by other rules (canonical/constructive alias)
  • an absent module alias to a 'global' module is allowed (canonical alias)

Then,

  • in functor definitions, we always have a present alias to the argument within the functor, so either kind of alias is allowed when otherwise allowed by the above rules
    • I think we allow preprocessors to give Pmty_alias in this position -- presumably in error -- which could cause an issue. I propose making this an error, and similarly so for module type S = ....
  • when applying the functor to an argument, we need to
    • susbsitute for the ascribed path in present aliases, and the un-ascribed path everywhere else
    • check the above rules for all subtitution points in aliases (sufficient to check the functor rule in all modified absent aliases, by transitivity all other absent aliases are fine, and all present aliases are always fine)
    • (optionally) for substitutions that would create an absent functor alias, we could replace those with present aliases, since we're in 'structure' position.
  • when applying a with module M = ... substitution, we need to
    • check inclusion of the RHS with the module type
    • replace the target module's mty with the RHS (ascribed in present aliases)
    • check the above rules for all modified absent aliases, if the target module was absent (as above)
  • when applying a with module M := ... substitution, we need to
    • check inclusion of the RHS with the module type
    • remove the target module
    • substitute the target module for the RHS (ascribed in present aliases present)
    • check the above rules for all modified absent aliases (as above)
  • module M := ... in signatures behaves as with module M := ... above
  • for consistency, perhaps we ought to have with module M == ..., and then = makes the alias present and == makes it absent, regardless of its previous status

With these rules and checks, we always have well-behaved module aliases AFAICT. This does, of course, require ascription to be included in this PR.

However, there are two places in the
type system where module expressions are used to represent the whole
module, including its dynamic components

@lpw25 for your first example, I still maintain that the typing is correct, just that the behaviour is confusing. The types really are the same. That we want the types to depend on the value of compare is a limitation of the value type system, and orthogonal to module types.

To your second, I think the proposal above should handle this correctly.

@lpw25
Copy link
Contributor

lpw25 commented Oct 26, 2021

Your description seems to be relying on checking properties during functor application and similar. In general, you can't rely on such checks to preserve well-formedness. Abstraction allows the issues with a particular functor application to become apparent only later on. For example, something like the following appears to be allowed by your system:

module Foo (X : Set.S) = struct
  module type T = sig
    module A == X
  end
end

module Bar (F : S -> sig module type T end) = struct
  module type S = sig
    module N : F(Set.Make(Int)).T
  end
end

module Baz(Y : Bar(Foo).S) = struct
  let x = Y.N.A.empty
end

The fundamental issue is that the type of a functor's argument needs to contain all information for deciding if a particular functor application is valid or not. Trying to use additional information from the return type of the functor doesn't really work.

The types really are the same. That we want the types to depend on the value of compare is a limitation of the value type system, and orthogonal to module types.

I disagree: the whole point of applicative functors is to allow types to depend on values. That's why we have Set.Make(Int).t and not just int Set.t. That impure applicative functors break this is essentially a bug. Luckily it's a bug that doesn't really bite people in practice.

Thinking out loud: when I read @lpw25 description, one relatively natural idea is to ask for (2), with some approximate analysis of module expression purity: struct end is pure (and in general modules that only define values), so the example of (2) could be made to work by tracking this information in the module expression and its aliases. I guess that this idea does not work well in practice, because the information is not carried across abstraction boundaries.

Yeah, I think if you are going to treat some module definitions as pure and some as impure then you will need to reflect that purity into their module types or their bindings so that users can abstract them -- maybe something along the lines of this paper. I think that the complexity of such a system is probably not worth it.

@mrmr1993
Copy link
Contributor Author

mrmr1993 commented Oct 26, 2021

For example, something like the following appears to be allowed by your system:

The functor application Bar(Foo).S is also checked in my system, and will be rejected because it violates the rules, as you desire.

The types really are the same. That we want the types to depend on the value of compare is a limitation of the value type system, and orthogonal to module types.

I disagree: the whole point of applicative functors is to allow types to depend on values. That's why we have Set.Make(Int).t and not just int Set.t. That impure applicative functors break this is essentially a bug. Luckily it's a bug that doesn't really bite people in practice.

I model functors differently: they're a tool for abstracting code over some interface (edit: and types over types in that interface). If we want the same for types -- which, as you say, we don't have in practice -- we should amend the type system to allow modules as the arguments to types.

As a point of information: I use impure functors to manage some state which depends on the types in the functor arguments, and to perform some type-dependent initialisation. I would strongly prefer that we don't break that production code by making all functors pure.

I have also suffered with mixing of values from different instantiations of these functors, but it's not useful for me to parameterise the type over the arguments, because the backing state may still differ. If, however, we allow the type system to take arguments from the module system, then I can correctly express that the types depend on state from the particular functor application.

Yeah, I think if you are going to treat some module definitions as pure and some as impure then you will need to reflect that purity into their module types or their bindings so that users can abstract them -- maybe something along the lines of this paper. I think that the complexity of such a system is probably not worth it.

I would love to revisit this when we have some form of typed effects, but I agree that it's not worth pursuing for now.

@lpw25
Copy link
Contributor

lpw25 commented Oct 26, 2021

The functor application Bar(Foo).S is also checked in my system, and will be rejected because it violates the rules, as you desire.

In order to reject this application you need to re-expand and recheck the functor application F(Set.Make(Int)).T when the F is replaced by the Foo. Indeed you will need to recheck every functor application that is affected by any substitution. At this point you have a system similar to C++ templates and other untyped forms of parameterisation: you can't tell in advance what values a parameter can be replaced by so you need to eagerly expand everything and then give the error to the user of the functor rather than the author of the functor. This is both expensive and a bad user experience.

It is not clear from your description whether the following is allowed.

module M = struct let x = Random.int 10 end
module N = M
module Res : sig
  module O = N
end = struct
  module O = M
end

Since I can't see any mention of functor bodies in your description, I assume that either the above example isn't allowed, or the following example is:

module Func (X : sig end) : sig
  module M : sig val x : int end
  module type S
  module N = M
  module Foo (Y : S) : sig module O = N end
  module Bar : S
end = struct
  module M = struct let x = Random.int 10 end
  module N = M
  module type S = sig module O = M end
  module Foo (Y : S) : sig
    module O = N
  end = struct include Y end
  module Bar = struct
    module O = M
  end
end

module Arg = struct end
module A = Func(Arg)
module B = Func(Arg)
module C : sig module O = A.N end = A.Foo(B.Bar)
assert (C.O.x == A.N.x) (* sometimes fails *)

Of course, if the first example isn't allowed then we are basically back to option 2 from my list.

@mrmr1993
Copy link
Contributor Author

Since I can't see any mention of functor bodies in your description, I assume that either the above example isn't allowed, or the following example is

module C : sig module O = A.N end = A.Foo(B.Bar)

Apologies for not detailing this in the description. My intention is to use a two-stage normalisation procedure, where first you normalise to a concrete module (if possible), and then you continue expanding to find the constructive path. Then, after normalisation, equal concrete paths are equal, equal constructive paths are equal, and a concrete path is included in its constructive path (if there was no corresponding concrete path). You can find that implemented in this PR here.

Under that scheme, A.Foo(B.Bar) is illegal because B.Bar.O is different from the concrete A.Bar.O that A.S demands.

In order to reject this application you need to re-expand and recheck the functor application F(Set.Make(Int)).T when the F is replaced by the Foo. Indeed you will need to recheck every functor application that is affected by any substitution. At this point you have a system similar to C++ templates and other untyped forms of parameterisation: you can't tell in advance what values a parameter can be replaced by so you need to eagerly expand everything and then give the error to the user of the functor rather than the author of the functor.

While this is true, we can cheaply mark functor arguments as May_constructive or Must_concrete while we're typing the functor body, and applications of paths from within the argument as Must_constructive or May_concrete, based on their usages. Then, we need only check the compatibility of these at application time.

In fact, you could imagine making these user-visible, and then you could declare Set.Make to be Must_concrete in its argument. This would let you ensure that Set.Make(A).t and Set.Make(B).t are the same type only when A and B really are the same.

@lpw25
Copy link
Contributor

lpw25 commented Oct 26, 2021

Under that scheme, A.Foo(B.Bar) is illegal because B.Bar.O is different from the concrete A.Bar.O that A.S demands.

A.S is just Func(Arg).S which is abstract, as is B.S, so A.Foo(B.Bar) must be legal.

@mrmr1993
Copy link
Contributor Author

A.S is just Func(Arg).S which is abstract, as is B.S, so A.Foo(B.Bar) must be legal.

Ah, yes.

I'm sure it sounds like I keep proposing language features to fix issues, but perhaps we need a concept of private signatures, so that Func has type

module Func (X : sig end) : sig
  module M : sig val x : int end
  module type S = private
  module N = M
  module Foo (Y : S) : sig module O = N end
  module Bar : S
end

and its instantiations get types

module A : functor (X : sig end) -> sig
  module M : sig val x : int end
  module type S = private Func(X).S
  module N = M
  module Foo (Y : S) : sig module O = N end
  module Bar : S
end

where private Foo is an instance Foo, but Foo isn't an instance of private Foo. Then, any module type inside a functor with local present aliases becomes private.

I could imagine this might be useful more generally, to say that e.g. a functor only accepts modules that were output by another functor, or only from an explicit set of modules, but still revealing the module contents.

@lpw25
Copy link
Contributor

lpw25 commented Oct 29, 2021

Private module types are potentially useful. Note that the more general version that allows any subtype of the specified type is quite difficult to implement because the private module type needs to contain the required coercion at runtime.

I don't really see how it helps with my example. I don't think you can just treat:

module type S = E

as if they had written:

module type S = private E

based on the content of E. That would be very confusing.

After further consideration, I'm no longer sure that my option 2, and by extension your proposal, can really work for preventing non-physically-equal module aliases. It tries to work by constraining normalization in the definitions of module aliases, but aliases are not the only form of equality available to modules. For example, the following example would create non-physically-equal module aliases without doing any normalization in the definitions of module aliases:

(* Make an impure functor and two aliases to its output *)
module type Int = struct val x : int end
module Impure (X : sig end) : Int = struct let x = Random.int 100 end
module Unit = struct end

module A = Impure(Unit)
module B = Impure(Unit)

(* Make a Leibniz equality between A and B *)
module type Type = sig module type T end
module type Pred_Int = Int -> Type
module Equal_A_B (P : Pred_Int) (X : P(A).T) : P(B).T = X

module Alias_to (X : Int) = struct
  module type T = sig module M = X end
end
module Alias_to_A : Alias_to(A).T = struct
  module M = A
end
module M : Alias_to(B).T = Equal_A_B(Alias_to)(Alias_to_A)

At this point I think that option 3 is the only viable option (with option 1 only possible in some distant future).

@mrmr1993
Copy link
Contributor Author

I don't really see how it helps with my example. I don't think you can just treat:

module type S = E

as if they had written:

module type S = private E

based on the content of E. That would be very confusing.

That wasn't my intention; instead, if they write module type S where we demand module type S = private (with no concrete signature/expansion), I thought it would be enough to throw an error. This module type depends on modules defined within the functor application. It must be private to avoid mixing these modules between implementations. Hint: add = private or some such.

For example, the following example would create non-physically-equal module aliases without doing any normalization in the definitions of module aliases:

Your example would have been prevented by this rule: casting Alias_to to Int -> Type would no longer be valid, and making the module type private in Type would have prevented Equal_A_B from being defined.

@lpw25
Copy link
Contributor

lpw25 commented Oct 29, 2021

Ok, so you are proposing to ban abstraction for module types that meet some criteria. What is the criteria precisely?

Preventing module type abstraction isn't sufficient. Here's the same example without using module type abstraction:

(* Make a Leibniz equality between A and B *)
module type Type = sig type t end
module type Pred_Int = Int -> Type
module Equal_A_B (P : Pred_Int) = struct
  let coerce (x : P(A).t) : P(B).t = x
end

module Alias_to (X : Int) = struct
  module type T = sig module M = X end
  type t = (module T)
end
module Coerce = Equal(Alias_to)
module M = (val Coerce.coerce (module struct module M = A end) : Alias_to(B).T)

@lpw25
Copy link
Contributor

lpw25 commented Oct 29, 2021

Here's another example that causes trouble for option 2:

module type Int = struct val x : int end
module Impure (X : sig end) : Int = struct let x = Random.int 100 end
module Unit = struct end

module A = Impure(Unit)
module B = Impure(Unit)

module Coerce(F : Int -> sig module N : Int end) : sig
  module M = F(A)
end = struct
  module M = F(B)
end
module Id (X : Int) = struct
  module N = X
end

module O : sig
  module M : sig
    module N = A
  end
end = Coerce(Id)

which only requires normalization of "constructive" paths in the parameters of aliases to other constructive paths.

@mrmr1993
Copy link
Contributor Author

Ok, so you are proposing to ban abstraction for module types that meet some criteria. What is the criteria precisely?

That they contain a present alias to some module instantiated within the functor that they also are a part of.

The rule is slightly more subtle than banning abstraction; I'd like to let them be abstract, but require an explicit coercion to expand them to their constructive path. Specifically, these = private modules would get strengthened to = private F(A).T on functor application, so they can be weakened to the constructive path if needed, but don't alias through it between instances.

Preventing module type abstraction isn't sufficient. Here's the same example without using module type abstraction:

I should have thought of this 🤦‍♂️ This is effectively the same but using a type; we could do the same for types, although this has some weirder interactions that I need to think through properly.

Here's another example that causes trouble for option 2:

It seems you're still considering these options, so I'll detail some concerns

  1. Force applicative functors to be pure.

This is a step backwards in functionality, and breaks existing programs. The 'correct' way to do this is with effects, but we seem not to be close to having a system that can track these in the ways we would need.

2. Use present module aliases and treat all module paths as impure when
checking equality of module aliases.

This is a disaster for code that uses e.g. an Imports module, or otherwise uses aliases to get around shadowing. For example,

module Foo_ = Foo
open Bar
let x : Baz.F(Foo_).t = Bax.x

3. Use present module aliases, treat all module paths as pure and just
ignore the bad behaviour that you get when they aren't.

This causes the module system to confuse the origin of values, allowing the user to write code that does

module T : sig module M = N end = ...
let () = T.M.x := 1

where the value modified isn't N.x. I find this wholely unacceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants