Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler signature re-work #1116

Draft
wants to merge 53 commits into
base: main
Choose a base branch
from

Conversation

mhermier
Copy link
Contributor

@mhermier mhermier commented Oct 21, 2022

This patch set propose to rework how Signature works.

The key changes in behavior:

  • SignatureType is removed, so it simplify a lot of logic by storing the subscriptArity, parenArity, isInitializer and isSetter instead.
    It contains some simplifications to signatureToString that should be very handy to implement [RFC] Support argument labels as part of method signatures #1112.
  • Compilation errors for setters should provide better error messages.
    class Foo {
      static foo = (a) { a }
    }
    
    if (!Foo.foo = 42) {
    }
    
    outputs:
    Error at '=': The setter syntax cannot be used in this context.
    
    instead of:
    Error at '=': Expect ')' after if condition.
    Error at ')': Expect end of file.                      (or any other trailing garbage error)
    
  • The last (optional) 8 patches, reduces the code size in a one liner and reduce some frictions in the languages:
    • Allow trailing commas for any argument/parameter lists (add symmetry with lists and maps):
      class Foo {
        [a,] { ... }
        bar(a,) { ... }
      }
      foo.bar(a,)
      foo[a,]
    • Allow function of arity 0:
      Fn.new {|| null }
    • Allow subscript of arity 0:
      foo[]
      foo[] = 42
    • Allow setters of any arity:
      class Foo {
        static [i] = () { ... }
        static [i] = (a) { ... }
        static [i] = (a, b) { ... }
        static [i, j,] = (a, b,) { ... } // support trailing in subscript and value parameter list
      }
      Foo[a] = ()
      Foo[a] = 42
      Foo[a] = (42)
      Foo[a] = (42, 69)
      Foo[a, b,] = (42, 69,) // and support trailing in subscript and value argument list

@mhermier mhermier force-pushed the compiler_signature branch 2 times, most recently from f3512ec to d31af8d Compare October 21, 2022 23:01
@mhermier
Copy link
Contributor Author

Updated, for functions of 0 arity that where also enabled as Fn.new {| | ....}. Added a small fixup for the compact version Fn.new {|| ....}.

@mhermier
Copy link
Contributor Author

Updated: found another leaked symbol from the compiler, while tinkering on other things.

@mhermier
Copy link
Contributor Author

Updated: Some minor renaming of functions.

@mhermier mhermier marked this pull request as draft October 24, 2022 12:59
@PureFox48
Copy link
Contributor

PureFox48 commented Oct 24, 2022

As far as allowing functions with zero arity is concerned, am I right in thinking that there would be no essential difference between:

var f = Fn.new { || null }

and

var f = Fn.new { null } 

In effect the first would be a variation of the second and you'd call them both with f.call().

However, I'm struggling to get my head around an indexed property with zero arity such as Foo[] which would be a distinct entity from anything we have already. I can't think of any obvious use for this - you might as well just use an ordinary property. Do you have some practical usage in mind or are you just trying to make indexed properties completely general?

@mhermier
Copy link
Contributor Author

For your first point, it is equivalent (for now). Thought with the new system, it would be possible to distinguish between them and all Fn.call as a getter. That said I don't think it is wise to do so unless there is a compelling need for it and break existing code.

For the subscript change, I already proposed it a while ago, but is now a easy by product of the simplification of how all the list are handled. It simplifies the syntax, by making objects callable as functors (without the strange arity restriction, or the need to use the .call() convention). The idea was revived to me because of the Array implementation, and I thought it was a bummer that we could not instantiate a template class using [] instead of requiring to use an helper method... If name generation and memoïzation of class become a thing, we can somehow simulate a template class as with the neat syntax:

class Set {
  static [] { this[Object] }
  static [clazz] { ... }
}
var ObjectSet = Set[]
var NumSet = Set[Num]

One neat idea would be to move some of the boiler plate to the compiler, so we can have the syntax:

class Set[clazz = Object] {
}

but I don't know how hard it would be to add default arguments without a proper AST. And in addition, we would need to support it done for functions and methods in the same time, so for now it is a no go but we can produce at least an ersatz of it.

@mhermier
Copy link
Contributor Author

Updated: Make it finishList also handle list and map litterals. I will probably add an optionalList abstraction so an element list can automatically be parsed based on the presence of the left token type.

@PureFox48
Copy link
Contributor

Thanks for clarifying those two points.

I agree that it would not be a good idea to have a function getter since, unlike classes, functions don't have any persistent internal state (they just use global state) and it would break tons of exsting code in any case.

It hadn't occurred to me that one could use an indexed property with zero arity as a default for other indexed properties but, yes, I can see the sense in that now. Your idea of using static indexed properties to create a sort of template class is also interesting.

@mhermier
Copy link
Contributor Author

Well technically functions have a state because they are technically closure. So they can have private states, hidden from the global environment. So technically, I think it would make sense to have a getter function that would honor the singleton pattern. That said, unless there is more solid ground I don't consider it an ultimate reason to break code.

Well this rule is stupid in the first place (even if C++23 only adopted more than one operator). While semantically it has a different meaning, it makes absolutely no reason to differentiate it from a function call operator. I simple term, it is a language rule for the sake of a rule because of some historical usage and connotation. Nothing in the grammar rules forbids the extension, if we are willing to ignore the connotation. I mean what ever the language, you can't prevent users from inventing DSL. If using [] makes sense to them why put some arbitrary restriction and make life harder for the user. I mean a user can always act on one argument and using to pass the list of arguments, and spread it manually at the list allocation cost:

class Foo {
  static call()        { "0-spread()" }
  static call(a)       { "1-spread(%(a))" }
  static call(a, b)    { "2-spread(%(a), %(b))" }
  static call(a, b, c) { "3-spread(%(a), %(b), %(c))" }
}

class SubcriptSpreader {
  construct bind(callable) {
    _callable = callable
  }

  [args] {
    var count = args.count
    if (count == 0) return _callable.call()
    if (count == 1) return _callable.call(args[0])
    if (count == 2) return _callable.call(args[0], args[1])
    if (count == 3) return _callable.call(args[0], args[1], args[2])
  }

}

var s = SubcriptSpreader.bind(Foo)
System.print(s[[]])              // expect: 0-spread()
System.print(s[["a"]])           // expect: 1-spread(a)
System.print(s[["a", "b"]])      // expect: 2-spread(a, b)
System.print(s[["a", "b", "c"]]) // expect: 3-spread(a, b, c)

And this is true for every function/method overload in existence...

@PureFox48
Copy link
Contributor

Whilst functions in Wren are closures, they close over variables defined outside their scope. Even if all a parameterless function did was to return a closed over variable, I don't think a user would regard this as analogous to a 'getter' method returning a field value from a class.

But, minor quibbles aside, I do think this proposal as a whole hangs together well and I would therefore be in favor of it even if (as I said at the time) I'm not keen on #1112, preferring the simplicity of overloading methods purely on arity.

@mhermier
Copy link
Contributor Author

In some near future this change set will also relax a little the grammar of attributes (because of code reuse). It means attribute group will be able to be empty, and attributes group will be able to be ',' terminated like every other lists in the language with this changeset.

@mhermier
Copy link
Contributor Author

Updated, changes are a little bit more atomic, added type hinting support.

@PureFox48
Copy link
Contributor

As far as 'type hinting' is concerned, if I'm understanding that correctly, you've just laid the groundwork for:

  1. including a colon and then a type after method or function parameters; and

  2. including the 'crow's foot' operator and then a type after the parameter section (if any) to represent the return type of a method or function.

These hints will be ignored by the compiler and we'll need to agree 'best practice' for what type should be shown particularly as far as sum types, lists, maps etc. are concerned.

Although it's not a built-in type, one thing I'd personally like to see in such hints is the use of 'Int' to represent a Num which is an integer given how common this situation is.

@mhermier
Copy link
Contributor Author

For your last concern, we can always add var Int = Num somewhere to satisfy the fact that the type hint even if discarded should be a valid expression to something. (it is the same case as subclassing where it should be computed to something, and handled/discarded at runtime)

@PureFox48
Copy link
Contributor

Just a further point on the type hinting.

All the discussion I've seen so far has used a colon (rather than a crow's foot) to introduce the return type as well as the parameter type(s).

Having said that, Python uses the crow's foot for the return type though that may be because they use a colon to introduce the function block.

Any particular reason why you prefer the crow's foot? I'd have thought myself that it would be better to keep the number of new symbols needed to a minimum.

@mhermier
Copy link
Contributor Author

I was not able to reuse : because of the setter syntax (with the new relaxed grammar):

class Foo {
  foo = b : Int {}
}

Does it means b is Int or foo = (b) returns Int, so another operator was needed. I looked at what other languages do quickly (C++ and Jai) and mimicked what was existing. But I'm open for suggestions.

On the side note, I'm having some small difficulties to finish patch-set, so it will take me some time to produce some good quality changes. I had to fix some errors related to the GC with WREN_DEBUG_GC_STRESS to test memory issues. This one is almost properly sorted, thought some cruft is left related to attributes. But that lead me too look at attributes, and oh boy... The attribute implementation tries too hard and produce/use a lot of unnecessary junk. Instead, I think we should bypass the constant system responsible of the memory usage reduction (since we can't store a Map as a key of a Map). That way we can simply store the compiler attribute maps directly in the constant pool, instead of serializing them in opcode to recreate them at execution. I have done a changeset for that but I need to clean/polish it.

@PureFox48
Copy link
Contributor

TBH I'd missed that a relaxed grammar for setters was part of the proposal.

Wouldn't it be better for the parameter list, however many there are, to always be enclosed in parentheses? That would be consistent with the current syntax, easy to remember and would mean that we could then use a colon to introduce the return type hint.

@mhermier
Copy link
Contributor Author

From a grammatical point of view, only the = is required to distinguish it is a setter. So the parenthesis are only syntaxic sugar to make it looks like other method declarations. Enforcing them make the code less fluent and makes only really sense in the generalization of arbitrary number of arguments. That consideration allowed some code factorization, so I went for it.

The same could be said for binary operators, and could have the same treatment of optional parenthesis (That makes me realize I missed type hinting for them, but this trivial to implement since the infrastructure is now available).

Here are some file size numbers on my machine:

       text    data     bss     dec     hex filename
vanilla:
-O2   42591    3088       0   45679    b26f wren_compiler.o (ex lib/libwren.a)
-Os   28343    3088       0   31431    7ac7 wren_compiler.o (ex lib/libwren.a)
-O0   40391    3104       0   43495    a9e7 wren_compiler.o (ex lib/libwren.a)

local branch without type hinting and attribute optimization:
-O2   42617    3344       0   45961    b389 wren_compiler.o (ex lib/libwren.a)

local branch without attribute optimization:
-O2   43001    3408       0   46409    b549 wren_compiler.o (ex lib/libwren.a)

local branch with attribute optimization:
-O2   42097    3408       0   45505    b1c1 wren_compiler.o (ex lib/libwren.a)
-Os   28336    3408       0   31744    7c00 wren_compiler.o (ex lib/libwren.a)
-O0   41475    3400       0   44875    af4b wren_compiler.o (ex lib/libwren.a)

src/vm/wren_compiler.c | 1006
  1 file changed, 584 insertions(+), 422 deletions(-)

Considering all the moving parts my change set brings, some of security/language improvement it brings (and probably some missed optimization opportunities), I consider that non explosion of size by factorization quite successful for now.

@PureFox48
Copy link
Contributor

Yep, the file size numbers look good but would presumably be even better if you didn't have to support two syntaxes (with and without parentheses) for setters.

Even if they're not really needed and it's more typing, I'd personally prefer parentheses to be mandatory both on consistency grounds and because of the type hint issue.

There's also #849 to consider which I wouldn't be surprised to see implemented in some shape or form. If it is, then the most likely syntax (as it's short and sweet) would be: foo = _foo though that might not sit too well with setters which have no parentheses after the =.

@mhermier
Copy link
Contributor Author

Updated the header, to mention the relaxed setter declaration syntax.

In the facts, I would go the other way around and would consider generalizing Foo.foo bar as Foo.foo(bar) (in definition and usage). It creates a friction point between function and maps, but it should be nice. This is far from the point of this change-set, but I would Introduce the '' token back so we can use it to declare a closure/lambda/function and line continuation (because the . line continuation does not scale to all binary operator, I think line continuation or a statement separator are better in that sense).

About the #489 I don't think the syntax foo = _foo is viable. Thinking at it again, it does not allow independent setter or getter definitions.

@PureFox48
Copy link
Contributor

Well, I think the point of #849 was to be able to condense:

class Entity {
  construct (pos) {
    _pos = pos
  }

  pos { _pos }
  pos=(v) { _pos = v }
}

into just:

class Entity {
  construct (pos) {
    _pos = pos
  }

  pos = _pos
}

The current way of doing things would still be available for read only or write only properties or where you needed to do more than just get or set a field. So I think it is a viable proposal even if I personally would probably not use it much.

@PureFox48
Copy link
Contributor

Incidentally (and going slightly off topic), I made an even more condensed proposal in #912 for simple classes which only had read-only fields (tuples in effect) where you'd get a default toString method thrown in:

class Entity(pos)

though it wasn't met with much enthusiasm :(

Nowadays, I just create named tuples dynamically using the Meta module.

@mhermier
Copy link
Contributor Author

While still beeing off topic, I'm really considering to borrow most of the code you made public in rosettacode and making it a proper standard library of some sort...

@PureFox48
Copy link
Contributor

Well, you're very welcome to do that :)

I'm afraid there's no separate documentation for the various modules though I've tried to describe each method etc. within the code itself.

@mhermier
Copy link
Contributor Author

While doing some crazy things with "template" programming, I felt into a trap while trying to emulate some method overload. And I think, I need to also allow something that would be named like subscript caller. Basically it means to allow:

class Foo {
  static [a, b, ...](c, d, ...) { }
}

Luckily, it should be pretty simple to implement, with my current change set. Thought, there is a little friction with the function syntax.

Basically, by making = optional, it makes the following calls to [a, b, ...](c) a little confusing:

Foo[a, b, ...] 42     // c is 42
Foo[a, b, ...] { }    // c is a map or a function depending how we consider things
Foo[a, b, ...] () { } // c is a function

I think it should be a map as per Foo[a, b, ...] = { } is a map, but I need to think deeper about it. And likewise, trailing function argument Foo[a, b, ...]=(c, d, ...) { } should allowed for setter arguments...

This simplification rabbit hole is so deep... lets sip some tea at the very bottom Alice.

@mhermier
Copy link
Contributor Author

Now, that I wrote it maybe it is non sense. I need to sort things and maybe simply ban Foo[a, b, ...] 42 and make Foo[a, b, ...] { } a function argument. Maybe it would make the things simpler to understand.

@PureFox48
Copy link
Contributor

If I were you, I wouldn't try and make = optional. Even if it could be made to work, it will make Wren look like a pseudo-functional language which I don't think will go down well with a lot of people.

@mhermier
Copy link
Contributor Author

Can you expand your thought. I'm not sure to understand what you mean. In particular, by:

I wouldn't try and make = optional.

Do you mean, I should not allow the [a, b, ...](c, d, ...) syntax entirely ?

@PureFox48
Copy link
Contributor

I mean that I'd allow foo[a, b, ...] = (c, d, ...) as you originally intended in your opening post but I'd make the = mandatory.

@mhermier
Copy link
Contributor Author

Well it is not nice in the case of what I'm trying to achieve. So maybe it will go in a late patch-set to make it not mandatory.

Anyway a lot of people are fools to not think the language is not pseudo-functional already. It is only that instead of using () it use[].

@PureFox48
Copy link
Contributor

Well, I'll grant you that it's pseudo-functional already in the sense that functions are first class and we have map, where and reduce in the standard library. This seems to be the trend nowadays with what are basically procedural or OO languages and is a good thing IMO.

But I think you can go too far down that path and, when you start making things optional, it means you have to make a decision each time you use them which option to go for.

@mhermier
Copy link
Contributor Author

I think it is more than a trend. If you look at smalltalk, there are lambda/closures every where hidden in plain sight in the form of innocent looking blocks. And if we look more carefully to the smalltalk syntax, the notion of block only exist at the top level of a script, the method body declaration or the lambda body declaration. All the control flow does not use block at all, it is a straight line from the begin of the block to its end, with control flow being simulated via method dispatch. So trying to ban functional style is waste of time...

It is more trending today, probably because the popular kids in C++ town, made it more accessible via the lambda syntax. But the men where already using it with dedicated functor objects (which was a lot of boiler plate). An considering how C++ is influential, it snow balled from there. Even in C, the call back function (re-)learning and its wide adoption showed how important that notion is. (And guess what I did exactly that to implement lists handling in the change-set)

@mhermier
Copy link
Contributor Author

Do you have any idea of what the following line mean?

// TODO: Allow Grace-style mixfix methods?

Was it meant to allow to do stuff like;

Foo.foo(a, b,...) bar (c, d,...) baz (e, f, ...) ..
Foo.foo { /* a function */ } bar { /* a function */ } baz { /* a function */ } ...

@PureFox48
Copy link
Contributor

Had to Google it as I'm not familiar with the Grace programming language but apparently it supports multi-part (or 'mixfix') methods.

I found this example:
if (condition) then { foo(5) } else { bar(6) }

which is a call of the if(_)then(_)else(_) method with the placeholders replaced by actual arguments.

So I think your guess at what it means is correct though why we would want to support something like this in a simple language such as Wren I don't know.

@mhermier
Copy link
Contributor Author

The way I presented it, as composed signature should be less problematic, a nice tribute to Smalltalk invocation style, and probably an alternative solution to #1112. I'll probably give it a try, but it is not my highest priority. For now I only want to have [...](...) with optional late block working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants