Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Recursive datatypes #1337

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from
Draft

feat: Recursive datatypes #1337

wants to merge 15 commits into from

Conversation

scolsen
Copy link
Contributor

@scolsen scolsen commented Oct 14, 2021

This PR is a WIP implementation of recursive data type support in Carp. At the moment, it only supports recursive product types, which are backed by structs with fields that point to values of themselves.

Currently, the following sample code will work:

(deftype IntList [head Int tail IntList])                                                                                                                                                                                        
                                                                                                                                                                                                                                 
(defn main []                                                                                                                                                                                                                    
  (let [is (IntList.init 3 (IntList.init 2 (IntList.make 1)))]                                                                                                                                                                   
    (do (IO.println &(str (IntList.head &is)))                                                                                                                                                                                   
        (IO.println &(str (IntList.head (IntList.tail &is))))                                                                                                                                                                    
        (IO.println &(str (IntList.head (IntList.tail (IntList.tail &is)))))                                                                                                                                                     
        0)))

Here's a list of what's been implemented thus far, and what remains:

  • Support for referring to type names in member fields
  • Support for concrete recursive product types
  • make function for initializing a recursive type with a null recursive part (the end of the recursion chain).
  • Support for recursive sum types
  • Support for generic (type variables in the head of the type) recursive data types

This makes it easier to work with validation functions at call sites as
well as paves the way for permitting recursive types (we pass along the
type name to validation procedures).
This commit implements initial support for recursive product data types.
In C, they're represented as structs that have a field that is a pointer
to the same struct type.

In Carp, we currently substitute recursive references with pointers to
the type, and users must provide a pointer argument during
instantiation. To make creating initial values of these types easier, we
define a make function, which initializes a value of the type with its
recursive part set to the null pointer.
This commit adds a number of alternative type getters/initers for
recursive product types. These are primarily needed to hide the
underlying pointer implementation from the user (otherwise, users need
to deal with pointers explicitly).

This permits one to write:

```clojure
(deftype IntList [head Int tail IntList])

(IntList.tail &(IntList.init 2 (IntList.make 1)))
```

Instead of writing:

```clojure
(IntList.tail (Pointer.to-ref &(IntList.init 2 (Pointer.to-value (IntList.make 1)))))
```
This is in keeping with the way we handle other structs in Carp.
Before, we attempted to free some memory that was never allocated (since
we just print type string literals for recursive portions of a type).
Previously we did not delete the pointers of children of recursive
structs, only their immediate member pointers. This commit fixes that
issue.

Note that this is currently handled as a special case and should be made
general.
This commit is bigger than it should be, for which I apologize, but it
bundles a couple of changes that all work toward supporting recursive
data types:

- It makes type candidates their own module and additionally allows them
  to specify interface constraints -- that one or more member types must
  implement some set of interfaces.
- Updates recursive type handling to allow for "indirect" recursion.
  This permits using types that implement two interfaces alloc and
  indirect as containers for the recursive part.
- We now forward declare recursive types to support the case above.
- Adds a (currently unsafe) Box type for supporting heap allocated,
  memory managed indirection.
Enables users to use "direct" recursion on sumtypes and abstracts away
pointers in function signatures such as case initers.
We need to indirect implicitly casted pointers to structs back to their
values in order for match to work similarly for recursive types as it
does for non-recursive types.
@scolsen
Copy link
Contributor Author

scolsen commented Oct 25, 2021

@carp-lang/maintainers if you want an early look. So far we have:

  • Support for "direct" recursion in data types like one gets in functional languages such as Haskell. This is just sugar for indirect recursion
  • Support for indirect recursion when the wrapping type implements a function that indicates it heap allocates.
  • A (currently unsafe) Box type that can be used for indirect recursion

I need to clean things up, add some tests and add support for generics, but we're getting there.

@scolsen
Copy link
Contributor Author

scolsen commented Oct 25, 2021

e.g. you can run this:

(deftype IntList (Nil []) (Cons [Int IntList]))                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                
(defn main []                                                                                                                                                                                                                                                                   
  (let [is (IntList.Cons 2 (IntList.Cons 1 (IntList.Nil)))]                                                                                                                                                                                                                     
   (match is                                                                                                                                                                                                                                                                    
     (IntList.Cons x next)                                                                                                                                                                                                                                                      
       (match next                                                                                                                                                                                                                                                              
          (IntList.Nil) 0                                                                                                                                                                                                                                                       
          (IntList.Cons _ rest)                                                                                                                                                                                                                                                 
            (match rest                                                                                                                                                                                                                                                         
              (IntList.Nil) 0                                                                                                                                                                                                                                                   
              _ 1))                                                                                                                                                                                                                                                             
     _ 1))                                                                                                                                                                                                                                                                      
)   

Or this, but note the final call in printit will cause a segfault because it unboxes a Nil box (dereference of a null pointer)

(deftype IntList [head Int tail (Box IntList)])                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                
(defn printit []                                                                                                                                                                                                                                                                
  (let [is (IntList.init 3 (Box.init (IntList.init 2 (Box.init (IntList.init 1 (Box.nil))))))]                                                                                                                                                                                  
      (do (IO.println &(str (IntList.head &is)))                                                                                                                                                                                                                                
          (IO.println &(str (IntList.tail &is)))                                                                                                                                                                                                                                
          (IO.println &(str (IntList.head (Box.unbox (IntList.tail &is)))))                                                                                                                                                                                                     
          (IO.println &(str (IntList.tail &(Box.deref @(IntList.tail (Box.unbox (IntList.tail &is)))))))                                                                                                                                                                        
          (IO.println &(str (Box.unbox (IntList.tail &(Box.deref @(IntList.tail (Box.unbox (IntList.tail &is)))))))))))                                                                                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                
(defn main []                                                                                                                                                                                                                                                                   
  (do (printit)                                                                                                                                                                                                                                                                 
      0))

@scolsen
Copy link
Contributor Author

scolsen commented Oct 25, 2021

Of course, I'll need to fix the newly introduced errors as well. But you get the idea.

For empty structs, we generate a dummy field for ANSI C compatibility.
This field needs to be included in initializers for the struct, but
should not be emitted in any other functions. I erroneously included it
in other functions in a previous merge. This commit fixes the issue by
ensuring the dummy field is only included in the struct initializer.
Our current recursion check introduced a bug whereby generic types
receiving instances of themselves e.g. `(Trivial t)` would be identified
as recursive and generate incorrect type emissions.

For now, we simply don't consider generic types as recursive, though a
future change will add recursivity support for these types as well.
Instead of passing types and members separately to routines, we use type
candidates as input to recursivity checks. This simplifies both
validation and recursiveness checking on types and abstracts away
differences in structure between sum type and product type members.

I also had to adjust some test output, will restore them in a future
commit.
@TimDeve
Copy link
Contributor

TimDeve commented Oct 28, 2021

Looks real useful, you know I've been wanting this for a long time :)

I would much prefer if we would dissallowed creating NULL Box however it feels like a big footgun for something that would be used quite often. Aside from that I'm not a big fan of the implicit tail boxing, feels a bit at odd with "allocation and copying are explicit" but I feel a lot less strongly about it than nullable Box 😄

@scolsen
Copy link
Contributor Author

scolsen commented Oct 28, 2021

I would much prefer if we would dissallowed creating NULL Box however it feels like a big footgun for something that would be used quite often. Aside from that I'm not a big fan of the implicit tail boxing, feels a bit at odd with "allocation and copying are explicit" but I feel a lot less strongly about it than nullable Box 😄

Sounds good! In terms of removing Box.nil what are your thoughts on making the signature of unbox Box T -> Maybe T vs. doing something else like Box T -> Pointer T?

Aside from that I'm not a big fan of the implicit tail boxing

Yeah I go back and forth on this too. On the one hand, it's nice for things like continuous signal applications, since the "bottom" of the recursion can be a fixpoint of the data type--for example, you can modulate a recursive value representing some signal every N clicks and return the fixpoint every M clicks...stuff like that (e.g. accessing the recursive part of a value created with make could return itself--maybe we can call that "fix" instead?) on the other hand it's a fairly unintuitive concept and isn't something any other language provides to my knowledge.

@eriksvedang
Copy link
Collaborator

I think @TimDeve suggests that the Box would always contain a live pointer, right? So you can safely unwrap (Box T -> T). Is there a particular reason for it not to behave this way?

In regards to the infinite data structures, how would you construct those? A code example would be great!

Like we said in the chat, I think a good way to prevent surprises would be a combination of meta data and/or interfaces on the types so that you can't accidentally create a heap allocated struct without knowing.

@TimDeve
Copy link
Contributor

TimDeve commented Oct 28, 2021

I think @TimDeve suggests that the Box would always contain a live pointer, right? So you can safely unwrap (Box T -> T)

Yes that's what I mean, a nullable pointer would be (Maybe (Box T)) . We could have some take-ownership function to do (Ptr T) - > (Maybe (Box T)) for dealing with C interop.

Was the main reason you wanted nullable Box because we don't have Maybe in the compiler and you wanted to do some stuff with nullable pointers in the compiler?

If we did have Maybe in the compiler we could do some magic Rust-like stuff where (Maybe (Box T)) compiles down to only a nullable pointer.

@scolsen
Copy link
Contributor Author

scolsen commented Oct 28, 2021

Was the main reason you wanted nullable Box because we don't have Maybe in the compiler and you wanted to do some stuff with nullable pointers in the compiler?

Yes this was the only reason. Box is defined in the compiler but Maybe isn't, so if we relied on Maybe and someone used --no-core we'd end up with an awkward state of affairs where Maybe was needed for Box sigs but didn't exist :/

The bulletproof impl would be conversions from Ptr ts or Ref ts to Box ts -- then the only semantics really would be performing heap allocation (since a Ptr t might be either stack or heap allocated) and making the T value managed. The corresponding unbox would then be Box t -> Ptr t or Ref t. This is a lot sounder from the compiler perspective, but less convenient than having a Maybe. We could make this the compiler level impl then define wrappers that use Maybe instead for convenience in core.

Actually, I quite like the following, wdyt?

In the compiler:

  • Box.init (Fn [(Ref t)] (Box t)) -- copies t, heap allocates the copy, returns a box containing the heap allocated pointer to t
  • Box.unbox (Fn [(Box t)] t) -- copies heap allocated value t, deletes the box, freeing heap allocated t, returns copy of t
  • Box.allocate(Fn [] (Box t)) -- allocates a t value in the heap, (by calling zero?) returns a box pointing to the heap value.

Maybe there's some magical way to do init without copying but idk.

This would mean you can't use box for recursive indirection in product types--but it doesn't make much sense in product types anyway since you need a bottom value (which is providable via a sumtype case).

@TimDeve
Copy link
Contributor

TimDeve commented Oct 28, 2021

I prefer init taking ownership personally so init (Fn [T] (Box T)) as it forces an explicit copy if that's what the user wants. I'm not sure I get why init taking a ref makes it easier to implement if that's what you're saying.

Are you saying it (Maybe (Box T)) would be represented by a (Ptr T) internally?

@eriksvedang
Copy link
Collaborator

eriksvedang commented Oct 28, 2021

I like it. Box.Init could also just be (Fn [t] (Box t)), right? That would only require a shallow copy I think.

@scolsen
Copy link
Contributor Author

scolsen commented Oct 28, 2021

You’re both right! I don’t really know why I selected a ref—i think conceptually it made sense to me to take something that’s already a pointer and transform it into another kind of pointer—but actually taking a value is better!

That would only require a shallow copy I think.

ohhh I hadn’t thought of that. It would require altering the memory management a bit then—before I was working under the assumption that a box was necessarily a newly heap allocated value, but it could totally be a shallow copy too—but I think then its delete function would need to preform a page/nullness check before calling delete—and I guess vice versa for the other side of the copy

@eriksvedang
Copy link
Collaborator

but I think then its delete function would need to preform a page/nullness check before calling delete—and I guess vice versa for the other side of the copy

We're talking about the Box's deleter here, right? If it's always pointing to living memory it shouldn't have to check I think. The contained struct's deleter will only delete the members, not the "shallow" part. So as long as the Box removes the malloced part it should (could?) work? I haven't played around with your code so I'm probably missing something :)

@TimDeve
Copy link
Contributor

TimDeve commented Oct 29, 2021

That is how I implemented the deleter. Likewise for box a shallow copy should be enough because you would be taking ownership of the data structure.

@scolsen
Copy link
Contributor Author

scolsen commented Oct 29, 2021

@TimDeve thanks! re the delete implementation--what we have for init right now is similar except the box is a struct not a pointer cast:

// pseudo c
// this what we have in the recursive type branch
box_init(t t) {
  Box_t  box;
  box.data = CARP_MALLOC(sizeof(t));
  *box.data = t;
  return box;
}

For delete we currently have:

  // if t has delete
box_delete(Box_t box) {
  t_delete(*box.data);
  CARP_FREE(box.data);
}

// if t doesn't have delete (non-managed type)
box_delete(Box_t box) {
  /* Ignore non-managed type inside box */
  CARP_FREE(box.data);
}

How should we change these, if at all? Or can we remove nil and call it a day?

@TimDeve
Copy link
Contributor

TimDeve commented Oct 29, 2021

That looks good to me, do we need the struct or would it work with a pointer? The benefit of the pointer is that you can register a type with a field that's a managed pointer without converting from (Ptr T) to (Box T) not sure how common that is in C.

@scolsen
Copy link
Contributor Author

scolsen commented Oct 29, 2021

I think in theory there's really no difference--I think it would only make a difference if we had plans to augment the Box with other fields like reference counts or something--but I don't think we need to do that, right? Not sure if there are other potential benefits or not.

@scolsen
Copy link
Contributor Author

scolsen commented Nov 19, 2021

@eriksvedang and I chatted about this a bit. I'm going to break some of these changes out into separate PRs so that they're easier to review and merge cleanly! This'll stick around in draft form as a reference until all the pieces are in master.

@scolsen scolsen mentioned this pull request Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants