Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[asdl/runtime] Add deriving clause / Use ASDL for the ControlFlow exception type #1681

Draft
wants to merge 3 commits into
base: soil-staging
Choose a base branch
from

Conversation

PossiblyAShrub
Copy link
Collaborator

We can now make asdl types derive arbitrarily like so:

module runtime {
  control_flow =
    Break(int levels)
  | Return(int code)
  deriving [Exception]
}

Now we are able to declare the ControlFlow exception type in ASDL. This will make it easier to add more control flow types such as error ('...'). It also makes ControlFlow variant handling logic more uniform with other ASDL types.

We can now make asdl types derive arbitrarily like so:

```asdl
module runtime {
  control_flow =
    Break(int levels)
  | Return(int code)
  deriving [Exception]
}
```
@PossiblyAShrub PossiblyAShrub changed the base branch from master to soil-staging July 21, 2023 15:29
@andychu
Copy link
Contributor

andychu commented Jul 22, 2023

Hm very interesting

I have been wondering about core/error.py too, many of those exceptions are very simple and arguably redundant

Though I think we have to be careful to tease out these 2 issues

  1. the inheritance hierarchy (types) - Python and C++ catch exceptions using the subtype, and we do use that. (It would be nice to write a summary of that -- can't do it off the top of my head)
  2. tags, sum types, composition. An error of a single subtype could hold multiple types of data

So it's sort of another types vs. tags issue

Or maybe composition vs. inheritance

@andychu
Copy link
Contributor

andychu commented Jul 22, 2023

Another thing I'm thinking about is whether error is a builtin or keyword

  • to be consistent, it would be a keyword like return
  • practically speaking, it probably should be a builtin, because error is likely defined by many shell programs as functions

A keyword can't be redefined, but a builtin can

We could introduce an option to parse error, but I think it's better to minimize the number of shopt options

https://github.com/oilshell/oil/wiki/Language-Design-Principles

@PossiblyAShrub
Copy link
Collaborator Author

Though I think we have to be careful to tease out these 2 issues

  1. the inheritance hierarchy (types) - Python and C++ catch exceptions using the subtype, and we do use that. (It would be nice to write a summary of that -- can't do it off the top of my head)

Perhaps a deriving clause is too general of a solution. If we instead added an exception_type generate option, then we could implement more specialized code generation to support features like this.

In retrospect, I am not sure what else we'd want a deriving clause for. Maybe a generate option just makes more sense regardless.

  1. tags, sum types, composition. An error of a single subtype could hold multiple types of data

I am not sure what the issue is here exactly? Isn't this what we were trying to achieve with ControlFlow/IntControlFlow/ValueControlFlow?


Another thing I'm thinking about is whether error is a builtin or keyword

Really good point. That's a tricky problem. If I were to approach this entirely from a UX perspective, it might make sense that:

  • OSH -- error is not a keyword or a builtin, it doesn't even exist
  • YSH:UPGRADE -- error is a builtin, it can be redefined
  • YSH -- error is a keyword, it cannot be redefined

Now we would need to draw the line on a more concrete shopt. Is there anything already that fits this behavior?

But, to be honest, I am not sure how much I like the fact that error then gets so many different ways to be interpreted. Especially for a core control flow structure. Maybe the best way is to just use our hammer and enforce that error is a keyword like break, continue or return?

One alternative would be to shove more behavior into return. The idea that errors are values would allow us to use return instead of introducing error. This already matches the behavior of shell functions where return 0 is like return and return -1 (or some other non-zero number) is like error.

@andychu
Copy link
Contributor

andychu commented Jul 22, 2023

The interesting thing is that the way the code is now, we're not using vm.ControlFlow() at all, it's not a base class

It looks like we could just have

IntControlFlow = (Token token, int arg) generate [is_exception]
ValueControlFlow = (Token token, value value) generate [is_exception]

There's no inheritance at all, and no sum type!

Although I guess we were thinking about rewriting some of the except() clauses too? I guess catching exactly what you want is perhaps more efficient than catching and then rethrowing ?

Not sure to be honest, I haven't looked at all the call sites


It looks like the main thing we're using is if e.IsReturn(): e.StatusCode(), which could arguably be free functions

@andychu
Copy link
Contributor

andychu commented Jul 22, 2023

As far as error, I think what you proposed is reasonable/possible, but I think a bit complex (produces a big test matrix and lots of if statements!)

I was thinking

  • error is a builtin in ALL THREE modes

Generally we don't shy away from adding builtins in plain OSH, because you can always redefine them. It doesn't really break anything; bash adds its own builtins too (e.g. mapfile, readarray)

And actually we don't shy away from keywords -- var const = are all in plain OSH -- no shopt needed

(The reason proc is disallowed in OSH is because I didn't want proc to have legacy errexit behavior, it's confusing if there are 2 readings of the same code. This doesn't really apply to var or const)

However, an error keyword unconditionally seems like it goes too far -- seems like it would definitely break a bunch of programs

I understand why we'd want it to be a keyword in pure YSH, but I also think people generally won't notice (??)

@andychu
Copy link
Contributor

andychu commented Jul 22, 2023

To clarify, this comment which I guess I wrote quite awhile ago, before implementing anything

https://github.com/oilshell/oil/blob/master/core/vm.py#L38

suggests that we might want to have cflow.Shell and cflow.OilReturn() have a common base class (cflow_t)

But it appears we don't actually need that anywhere?

@PossiblyAShrub
Copy link
Collaborator Author

The interesting thing is that the way the code is now, we're not using vm.ControlFlow() at all, it's not a base class

Wow! That was actually a mistake on my part, I had meant for {Int,Value}ControlFlow to derive from ControlFlow. But, I see that it was unnecessary for the {Int,Value}ControlFlow exceptions. If there was another control flow exception type, I would have noticed the lack of a common base class (more on this below.)

It looks like the main thing we're using is if e.IsReturn(): e.StatusCode()

Yes, this is the advantage to having separate control flow exceptions for each keyword. It's technically wasteful because, as we did before, it is possible to determine control flow behavior from the keyword tag. But then we can lift the burden of selection from the user by leaning on language features. ie.

except ControlFlow as e:
  if e.IsReturn(): e.StatusCode()
  else: raise

can then more concisely be written as:

except control_flow.Return as e:
  e.StatusCode()

Now, if we wanted to raise an assertion error for other control_flow.* exceptions, we'd probably want to have a base control_flow_t class as a sort of wild card:

except control_flow.Return as e:
  e.StatusCode()
except control_flow_t:
  raise AssertionError()

The need for this "wild card" case is missing when, as in {Int,Value}ControlFlow, there is only one other case to check against. That is why we don't really need those classes to derive a common ControlFlow. Regardless of this PR, I think I should at least update those exceptions to derive from ControlFlow. When I add error support, in order to store the error message and error code, I think that I may need another control flow exception type. If that is true, then I would probably want a ControlFlow base class to act as a wild card.


As far as error, I think what you proposed is reasonable/possible, but I think a bit complex (produces a big test matrix and lots of if statements!)

You are right. So many interpretations of error would be difficult to test. And, ironically, it might introduce greater confusion than we set out to avoid.

I can see there being confusion over the identity of error. In scripts with an error proc, calling error 'some string' could have a very different meaning from scripts without an error proc. So by making error a builtin, we've introduced a hidden, yet common, footgun. But you are right, making error a keyword is likely going to be disproportionately obnoxious.

Making error a builtin might be the most reasonable choice. Besides, this is probably a false dichotomy, we can always introduce a more strict mode that will raise warnings when:

  • procs are named error, or
  • programs called error are called using the bare word error and not command error

And, if we make sure to document that error is a builtin, noting the consequences of that fact, then making error a builtin is maybe not so bad.

@andychu
Copy link
Contributor

andychu commented Jul 23, 2023

Regardless of this PR, I think I should at least update those exceptions to derive from ControlFlow. When I add error support, in order to store the error message and error code, I think that I may need another control flow exception type. If that is true, then I would probably want a ControlFlow base class to act as a wild card.

I would start from the raw error functionality -- what's the minimum we need to do to get it work?

If it ends up being 3 exceptions that have no common base class, that's OK

Or if it ends up with a common base class, that's also OK

(generally I avoid inheritance unless it adds something, like the fallthrough -- and there is some stuff we might want to revisit in core/error.py -- that is where the deepest inheritance is. We may be confusing the types for catching exceptions vs. the schema of what's inside -- int, value, int and string, etc. )

The class relationships should be derived from the behavior. I think error is the last control flow ever? let's see how it looks first


I can definitely see that removing IsReturn() could be cleaner, but it's also true that subclassing is limited in what it can model. We are stuck with a certain kind of subclassing because C++ and Python support it (in much the same way, which is nice)

But there's also the problem that in Python you can do this:

except (cflow.Break, cflow.Continue):
  log('1')
  log('2')

but in C++ , there's no way to do that -- you have to duplicate the blocks for each case

catch (cflow.Break) {
  log('1')
  log('2')
} catch (cflow.Continue) {
  log('1')
  log('2')
}

which is bad

So if statements around are actually more expressive than what C++ provides

(BTW this is what the Go language designers mean when they say "errors logic is just program logic" and things like that. They don't like the way error handling in Python and C++ is tied to inheritance, which is a fair point.)

@andychu
Copy link
Contributor

andychu commented Jul 23, 2023

  1. tags, sum types, composition. An error of a single subtype could hold multiple types of data

I am not sure what the issue is here exactly? Isn't this what we were trying to achieve with ControlFlow/IntControlFlow/ValueControlFlow?

The issue I'm thinking about is if we do:

class ControlFlow(Exception)
class IntControlFlow(ControlFlow) { Token keyword, int arg }
class ValueControlFlow(ControlFlow) { Token keyword, value val }

(which I suggested, maybe wrongly! Since we didn't need the type relationship apparently!)

OR we do this:

class ControlFlow(Exception) { Token keyword, error_payload p }

And then have a non-exception type

error_payload =
  Int(int i)
| Value(value v)
| ...

This is basically using composition over inheritance, and that seems good -- if we're not getting mileage out of catching exceptions on subtype / subertype

It's a bit complex because in some cases like redirects (which you brought on Zulip), I would avoid the extra indirection

We could have an extra layer like (Redir? redir, command cmd) and simplify one function (although it doesn't necessarily simply others)

Here the extra layer of composition doesn't seem bad

@PossiblyAShrub
Copy link
Collaborator Author

The class relationships should be derived from the behavior. I think error is the last control flow ever? let's see how it looks first

You are right. This feels so backwards from what I am used to, I need to do it more. Traditionally I've been taught that behavior is to be molded around some "best" (usually the existing) structure. But I suppose the resulting hacks to make it work often are an issue, not with the behavior, but the structure around it.

There are some good ideas here, I am going to start on actually implementing an error builtin. Then we can evaluate which structure best fits the solution.

@andychu
Copy link
Contributor

andychu commented Jul 23, 2023

I'm looking over the existing code, and I think there has always been some confusion of catching based on types and types as the shape of the error (what data it holds, e.g. int or value). Those 2 things aren't necessarily the same, but Python and C++ kind of encourage you to make them the same

So yeah I think it makes sense to do the minimum thing for error, and then maybe we do a cleanup based on how that looks


Brainstorming, we have 6 kinds of control flow!

  • break -- affects for/while loops (ignorant of whether it's in a proc or func; it's a non-issue)
  • continue -- ditto
  • return 1 -- affects proc bodies, not funcs
  • return (expr) -- affects func bodies, not procs
  • exit -- affects a stack of procs (or funcs!)
  • error -- affects a stack of procs or funcs.
    • I think it aborts a whole stack of interior funcs, and it causes the statement containing the expression to have status equal to the one in the error ? (default is 1)

Hm I actually think we need to design error a bit more -- with respect to proc and func, and try. I don't think we have a complete spec for what it does ?

Maybe you can start a thread on Zulip with a bunch of cases to put in the spec tests? And I will add some as well


I actually think it was a happy accident that we discovered the 2 kinds of control flow don't need to inherit from each other. It does make sense when I look at those relationships above

At the very least, I think break/continue belong together under the same type, but return 1 should not be the same ControlFlow or IntControlFlow -- that was always a bit of a mistake

@andychu
Copy link
Contributor

andychu commented Jul 23, 2023

Oh and it's also valid to just plow ahead with max() and abs() and all that

And defer error until we actually need it!

Popping up another level, the YSH design should be driven by all the YSH code we write -- that's the only way to know if it's right. We just have to write all the kinds of code that users will write, and see how it looks / if it's comprehensible :)

We have at least 14 different use cases on the blog, etc.

Remember when we did max(), we figured out we need ...args, since the signature is nontrivial, and a few other issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants