feat: Protocols #1292

scolsen · 2021-07-29T23:38:39Z

This PR adds initial support for Protocols, a sort of lightweight form of typeclasses. Given a collection of interfaces, protocols allow users to enforce type restrictions around the implementation of those interfaces. Before we get into the boring minutiae, it might be easier to see an example of how they work.

To define a protocol, use defprotocol

;; to define a new protocol, we use `defprotocol`
;; defprotocol takes a name and one or more interfaces
(defprotocol Num + -)
:i Num
=> Num : (Protocol Num)
    Defined at line 1, column 14 in 'REPL'
    Required Definitions: +, -

Once we have a protocol defined, we can mark types as members of it using instance

(instance Num Int)
:i Num
=> Num : (Protocol Num: Int)
    Defined at line 1, column 14 in 'REPL'
    Required Definitions: +, -

Of course, if the type doesn't implement the required interfaces, it won't be added to the protocol:

(instance Num Char)
=> The type Char does not implement the following interfaces: +, - which are required by the protocol. at Core Primitives:0:0.

  Traceback:
    (instance Num Char) at REPL:5:1.

We can reference protocols in types to limit the acceptable types in otherwise fully polymorphic positions. To refer to a protocol in a type signature, prefix it with !. For example:

(sig math (Fn [!Num !Num] !Num))
(defn math [x y] (- x (+ x y)))

Protocols leverage the same polymorphism resolution as our regular old variables, so nothing is resolved until we call this function.

Let's try it with a valid Num:

(math 2 3)
=> -3

and what about a type that doesn't implement the protocol, even though it implements the interfaces?

(math 3.0 2.0)
=> I can’t match the types `Double` and `(Protocol Num: Int)`. within `(math 3.0 2.0)`

  3.0 : Double
  At line 12, column 7 in 'REPL'

  Expected first argument to 'math' : (Protocol Num: Int)
  At line 12, column 2 in 'REPL' at REPL:12:1.

oops. Lucky for us, Double already implements our interfaces:

(instance Num Double)
(math 3.0 2.0)
=> -2

Finally, protocols can also be used for arbitrary "type tagging" or type unions simply by omitting any interfaces. One can use this to constrain the polymorphism of functions where desired without requiring additional interface implementations:

(defprotocol Foo)
(sig foo (Fn [!Foo] String))
(defn foo [_] @"foo")
(instance Foo Int)
:i Foo
=> Foo : (Protocol Foo: Int)
    Defined at line 15, column 14 in 'REPL'
    Required Definitions:
(foo 3)
=> "foo"
(foo \w)
=> I can’t match the types `Char` and `(Protocol Foo: Int)`. within `(foo \w)`

  \w : Char
  At line 22, column 6 in 'REPL'

  Expected first argument to 'foo' : (Protocol Foo: Int)
  At line 22, column 2 in 'REPL' at REPL:22:1.

Now that we know how it works, here are a couple of important implementation notes:

Protocols have two representations, XObjs, and a corresponding Type. Only the XObj contains a list of the interfaces required, the type only contains a list of types known to be members of the protocol.
The implementation of the protocols interfaces for a given type is checked at instance call time. If an implementation of a required interface doesn't yet exist in the environment when instance is called, adding the type to the protocol will fail.
Interface implementations are checked by simple subtyping. One can't yet for instance, specify the required position of the type in the interface function implementation. Likewise, protocols are more like unions than typeclasses insofar as they stand in for a variable, rather than take one as an argument.
Protocols are kept track of via the state of the type environment. When a new instance is added, the corresponding protocol is updated in the type env. Unfortunately this means parts of the code base that care about protocols need to call some functions resolveProtocols and updateProtocols (depending on whether or not the have access to a full context or just the type environment) in order to ensure they have the latest version of the protocol's memberships.
They don't yet work for higher kinded types. I'll need to add support for this.
Protocols are always added to the top level of the type environment.
In type signatures protocols must always be differentiated using !. They can share names with structs. When the ! is absent, the type system will try to find a struct. Think of this as similar to haskell's => delimiter for type classes vs types.

This commit adds frontend support for two new primitives, `defprotocol` and `instance`. defprotocol defines a new protocol, a collection of zero or more interfaces. Protocols implement a form of subtyping whereby one can assert that a given type implements a number of interfaces. Types that implement a protocol are called `instances` and are added using the `instance` primitive. This commit does not yet add any logic for enforcing protocols, only the frontend code for adding them to a context. Initially, we will only support protocols in the specification of interfaces. A protocol may be used in place of any type or type variable.

This commit finalizes an initial implementation of protocols by updating the type system to treat them as instances of constrained polymorphism (that is, like var types but more restrictive). Protocols are a form of type constraint whereby only types that implement a set of interfaces may be used in the place of a protocol. Here's the gist of how they work: To define a protocol, a user enters `(defprotocol <name> [interfaces...])`. `defprotocol` adds a fresh protocol to the type environment, inhabited by no types. Users then register types with the protocol using `(instance <protocol> <type>)`. The type will only be registered if it appears in at least one position of the signatures of the protocol's interface's implementations. That is, if there is no existing implementation of all <interfaces> for the type, it cannot be added to the protocol. Users can then reference protocols in type signatures by prefixing their names with `!`. For example, `(sig foo (Fn [!Stringable] String))`. When we type check protocols, we update the state of types based on the current state of the type environment (since a protocol's members may have changed). If the type being passed in the place of a protocol is a member of the protocol, it successfully type checks. Otherwise, it fails to type check. Protocols can also be used for arbitrary "tagging" or "subtyping" over types by defining empty protocols. If a user wants to simply specify a type union, they can call: `(defprotocol Empty)` which will define a protocol that has no interfaces. Any type may be added to this protocol.

eriksvedang · 2021-07-30T07:19:14Z

Very interesting PR! Some feedback:

Being able to constraint parameters is really nice and important. This should be enabled for regular old interfaces too though, or you'd have to define an interface and a protocol each time you want to constraint a parameter of a function. Maybe this should be solved with a "typeclass" macro that does it all in one go? Having to think about whether to use an interface or a protocol seems like it's making the language harder to use, and a bit less elegant. One way around that is to have the user only interact with the protocol construct and let interface be more of an implementation detail, if that makes sense?

Having to have two concepts could also be a sign that the current interface design is a bit too simplistic and the correct solution is to redesign them so that they actually support the things that this PR adds onto protocol (mainly constraints and grouping of functions). Keeping things separated has some appeal though, and there's probably some implementation insight that I'm missing.

My other concern is regarding the syntax and semantics of constraining type variables. First, I didn't catch why the ! is needed. Second, there seems to be no way to constraint a type variable that appears multiple times in a signature? (e.g. Num a => a -> a -> a in Haskell). Without that it seems hard to make use of the constraints in the implementation of the function since each parameter could belong to a different instance of the protocol.

hellerve · 2021-07-30T09:02:00Z

Scott coming in with the hot features, I dig it 🏅

I’ll take a look at it over the weekend, I’m impressed by how little code this is!

Similarly to Erik, I was asking myself whether for the unification algorithm, !Num has to be unified to the same concrete type in t he context of a type definition (so like one variable) or whether it just has to be any type that inhabits the protocol (so like one variable per occurence of !Num). I think both of these would make sense, and both of these should exist. Maybe something like binding a protocol to a variable might therefore be necessary?

eriksvedang · 2021-07-30T10:31:55Z

One additional thing: I think that constraining a single type variable to multiple interfaces/protocols is also an important feature. And if we have that, protocols is very similar to a shorthand for constraining the type variable to a set of interfaces. So that's something to consider for making this feature as simple as possible (conceptually)

scolsen · 2021-07-30T11:59:48Z

Some great points about variable constraints. I was just thinking about this last night actually, and an alternative way to implement this is to do something like:

(defprotocol Stringable-Addable [a b] (str (Fn [a] String)) (add [a a] a))
(instance Stringable-Addable [Int] Int.str Int.+)

Which is kind of along the lines of what you mentioned, Erik, where the protocol and interfaces are kept separate while still allowing us greater control over polymorphism. One downside to this approach is that it kind of "reimplements" interfaces in some ways. One nice thing about the current approach is that you can just enforce subtyping based on pre-existing implementations of the interfaces you care about.

Another downside to this approach is that we'd need to call special protocol functions instead of interfaces in function bodies: e.g.

(sig foo (Fn [Stringable-Addable Stringable-Addable] (Stringable-Addable)))
(defn foo [x y] (Stringable-Addable.add))

So perhaps it really is best to align closer with Haskell for maximum control. At which point we'd instead have something like:

(defprotocol Stringable str)
(defprotocol Fooable)
(sig with-foo [(protocol Stringable Fooable a) ...] (Fn [a] String))
(defn with-foo [x] (concat (str x) "foo"))

One advantage to this approach is that it totally does away with the need for an instance statement, and it keeps our old implementation of highly flexible interfaces.

First, I didn't catch why the ! is needed.

This is an unfortunate side-effect of the way xobjToTy is implemented. By default it translates capitalized symbols to structs, so we need some way to differentiate those from protocols in type signatures, similar to how we check if the first letter is lowercase to cast to Var:

xobjToTy (XObj (Sym spath@(SymPath _ s@(firstLetter : rest)) _) _ _)
  | isLower firstLetter = Just (VarTy s)
  | firstLetter == '!' =
      if (not (null rest))
        then Just (ProtocolTy (SymPath [] rest) [])
        else Nothing

TimDeve · 2021-07-30T13:27:50Z

First of all this is awesome, I've been wanting this in the language for a while.

(sig with-foo [(protocol Stringable Fooable a) ...] (Fn [a] String))

For what it's worth I like this syntax:

(sig my-fun [(where a Stringable Fooable)] (Fn [a] String)

I agree with Erik that having both interfaces and protocol might be a bit confusing, after all we could still keep the current interfaces short names by declaring generic functions:

(sig str [(where a Stringable)] (Fn [a] String)
(defn str [a] (Stringable.str a))

scolsen · 2021-07-30T13:43:49Z

Nice, I like the where style syntax too and you make a great point about generics.

Before I go any further, maybe we can all agree on the best implementation, it sounds like maybe we want to do the following:

replace interface with a new form defprotocol which might look like the following:

(defprotocol (<Name> <params>)
  (<fn-name> <signature>)
)
;;e .g.
(defprotocol (Monoid a)
  (combine (Fn [a a] a))
  (zero a)
)

add an instance form that designates implementations of a protocol for some type t

(instance (Monoid Int)
  (combine [x x] (+ x x))
  (zero 0)
)

add a where form that allows users to constrain type variables to protocols:

(sig plus-zero (where (Monoid a)) (Fn [a] a))
(defn plus-zero [x] (Monoid.combine x (Monoid.zero)))

Or with multiple forms:

(defprotocol (Foo a))
(sig plus-zero (where (Monoid a) (Foo a)) (Fn [a] a))

And supporting multi-parameter protocols should probably be a thing too

(defprotocol (Natrual a b)
  (transform (Fn [a] b))
)

(sig transform-then-plus (where (Natural a b) (Monoid b)) (Fn [a b] b))

And we should also support parameterizing over higher kinds application like haskell does too:

(defprotocol (Functor f)
  (fmap (Fn [(Fn [a] b) (f a)] (f b)))
)

(instance (Functor Maybe)
  (fmap [f m] (match m) (Nothing) (Nothing) (Just x) (Just (f x)))
)

(sig fmap-and-plus (where (Functor f) (Monoid b)) (Fn [(Fn [a] b) (f a)] b))

Sound good? This would be a pretty big breaking change so we'll have to try to be certain we can provide those generics for all existing interfaces.

eriksvedang · 2021-07-30T17:36:59Z

@scolsen Looks great 👍

f-f · 2021-07-30T23:09:12Z

Hi! I usually hang out more in PureScript land than here, but Veit got my attention on this PR, so I'll leave a few throughts (of course feel free to ignore all of this if not relevant at all 😊):

AFAIK Carp doesn't have typeclasses, but already has interfaces - why not go to typeclasses instead of protocols? It doesn't feel that far from what you have here, and if adding them at a later time would be hard as they'd be a third way to do almost the same thing (polymorphism)
something that I don't see mentioned here but that is the single most controversial point about typeclass implementations is the "orphan instance" issue: if you define a class Foo in module A, and define an instance Foo Bar in module B and instance Foo Bar in module C, then building A+B might yield a different result from building A+C, and building A+B+C is not possible at all.
Clojure-style protocols do allow orphan instances, so you might want to do the same - however, allowing them doesn't play well with things like build determinism and a package ecosystem, and in fact some languages (such as PureScript) entirely disable them.
I'd warmly recommend reading what I think is the best resource on typeclasses in language design out there to best inform the design of all of this. (and if you do, I'd be curious to hear what your thoughts on this are! It's a hard problem 🙂)

eriksvedang · 2021-07-31T15:21:20Z

@f-f thanks a lot for the feedback! I think this might be more of a naming issue than anything else – maybe we should stick with the "interface" name to avoid the confusion (or switch to "type class").

scolsen · 2021-07-31T15:21:43Z

@f-f Thanks for the pointers and checking this out! There are definitely some bits to mull over. I'll let you know where we end up!

scolsen · 2021-07-31T15:22:59Z

@f-f thanks a lot for the feedback! I think this might be more of a naming issue than anything else – maybe we should stick with the "interface" name to avoid the confusion (or switch to "type class").

That's not a bad idea. We could just keep interface and change the implementation to this more "typeclass-esque" style where the interface denotes a set of functions (rather than just one function).

scolsen · 2021-08-09T14:36:58Z

I won't get a chance to take a stab at this until Aug 27th or so, so if anyone wants to try implementing it before then, please do!

esseswann · 2021-08-16T18:35:19Z

Would this support composing abstractions in a way rust does:

trait A: B { // A supersets trait B
  fn method(arg: C) -> void // Variable arg must implement trait C but can be absolutely whatever in terms of memory layout and other things
}

where trait is roughly = protocol
This is a form of functional composition constraints that made rust a predictable language

ornamentist · 2022-09-07T03:52:45Z

Has the thinking around ad-hoc polymorphism in Carp changed since this PR or is it still relevant?

scolsen · 2022-09-07T14:14:14Z

Has the thinking around ad-hoc polymorphism in Carp changed since this PR or is it still relevant?

Not in any significant way, we haven't touched this topic in a while.

I have experimented with making the type system more closely resemble GHC's approach to polymorphism, kinds, and type classes (at least as it stood when "Typing Haskell in Haskell" was written) . After that experiment I think I'd at least like to land somewhere in between the GHC approach and the approach we have in Carp currently. Not sure how others feel about it.

ornamentist · 2022-09-08T03:34:01Z

Has the thinking around ad-hoc polymorphism in Carp changed since this PR or is it still relevant?

Not in any significant way, we haven't touched this topic in a while.

I have experimented with making the type system more closely resemble GHC's approach to polymorphism, kinds, and type classes (at least as it stood when "Typing Haskell in Haskell" was written) . After that experiment I think I'd at least like to land somewhere in between the GHC approach and the approach we have in Carp currently. Not sure how others feel about it.

Sounds fascinating— thanks.

scolsen added 3 commits July 29, 2021 14:50

feat: print interfaces in protocol info strings

13777ad

scolsen requested a review from a team July 29, 2021 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Protocols #1292

feat: Protocols #1292

scolsen commented Jul 29, 2021

eriksvedang commented Jul 30, 2021 •

edited

hellerve commented Jul 30, 2021

eriksvedang commented Jul 30, 2021 •

edited

scolsen commented Jul 30, 2021

TimDeve commented Jul 30, 2021

scolsen commented Jul 30, 2021

eriksvedang commented Jul 30, 2021

f-f commented Jul 30, 2021

eriksvedang commented Jul 31, 2021

scolsen commented Jul 31, 2021

scolsen commented Jul 31, 2021

scolsen commented Aug 9, 2021

esseswann commented Aug 16, 2021 •

edited

ornamentist commented Sep 7, 2022

scolsen commented Sep 7, 2022

ornamentist commented Sep 8, 2022

feat: Protocols #1292

Are you sure you want to change the base?

feat: Protocols #1292

Conversation

scolsen commented Jul 29, 2021

eriksvedang commented Jul 30, 2021 • edited

hellerve commented Jul 30, 2021

eriksvedang commented Jul 30, 2021 • edited

scolsen commented Jul 30, 2021

TimDeve commented Jul 30, 2021

scolsen commented Jul 30, 2021

eriksvedang commented Jul 30, 2021

f-f commented Jul 30, 2021

eriksvedang commented Jul 31, 2021

scolsen commented Jul 31, 2021

scolsen commented Jul 31, 2021

scolsen commented Aug 9, 2021

esseswann commented Aug 16, 2021 • edited

ornamentist commented Sep 7, 2022

scolsen commented Sep 7, 2022

ornamentist commented Sep 8, 2022

eriksvedang commented Jul 30, 2021 •

edited

eriksvedang commented Jul 30, 2021 •

edited

esseswann commented Aug 16, 2021 •

edited