Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISLE: eliminate distinction between constructors and extractors #8599

Open
jameysharp opened this issue May 12, 2024 · 2 comments
Open

ISLE: eliminate distinction between constructors and extractors #8599

jameysharp opened this issue May 12, 2024 · 2 comments
Labels
isle Related to the ISLE domain-specific language

Comments

@jameysharp
Copy link
Contributor

Feature

ISLE terms should just be functions, from any number of arguments to any number of results. Internal terms should all be defined using rule. Terms with exactly one argument can be used in pattern matches like extractors are today; all terms can be used on the right-hand side of a rule or if-let, like constructors are today. Possibly a form of type-directed overloading should be added to the language to allow the current pattern of using the same term name for both constructor and extractor.

Benefit

  • Simplify the trie-again and serialized representations of ISLE rules. The current distinction between constructor and extractor makes no difference to code generation.
  • Using rule to define "extractors" allows a form of "or-patterns" (ISLE: Support commutative matches #6128), without requiring us to figure out how priorities should interact when these patterns are inlined as internal extractors are today.

Implementation

I'd start by unifying constructors and extractors in the trie-again representation and ensuring that the generated code doesn't change.

Then maybe introduce something like (overload iadd extract_iadd construct_iadd) to declare that when the term iadd is used, whichever of extract_iadd or construct_iadd has the right type for that context should be used instead. I'm less sure about this.

Next, extend constructor terms to allow multiple results. A pattern binding the result of such a term needs some extra syntactic care to handle tuples. Semantically, though, we can approximate this today by wrapping up multiple results in an enum with a single variant, so I think we should just make the language support it directly.

Finally, remove the restriction that patterns can only use extractors and expressions can only use constructors, and replace (extractor ...) with (rule ...). Optionally, we could add an inline flag on terms and use it for all current extractors to ensure we still generate the same code, since internal extractors are currently inlined automatically. It is difficult to define inlining for all current combinations of ISLE features, though.

Alternatives

There are so many alternatives. Let's talk about it…

@cfallin
Copy link
Member

cfallin commented May 12, 2024

I definitely think this would be a welcome change! As you know I've been saying similar things for a while; my take is that extractors can be lowered somehow to a sugar on constructors and if-lets, now that we have them. This kind of fulfills the original dream (that I had had, anyway!) of a simple core Prolog-like semantics of matching steps in the backtracking phase (originally LHS) prior to committing.

My take on a core calculus would be:

  • Rewrite any mention of a term in the LHS into a dual (that can be specified) that flips args and returns -- "apply" to "unapply".
  • Turn this application into an if-let clause.
  • Build surface syntax-level desugaring of the extractor definition form into a definition of an ordinary rule and the "dual" declaration.

Then we can accept the existing language as-is; and as a next step, devise additional surface syntax for or-patterns and other things as needed.

Concretely I imagine the existing surface syntax example (slight liberties assumed in unifying inst-helper signatures to include ty everywhere)

(decl inst_data (InstructionData) Inst)
(extern extractor inst_data inst_data)
(decl iadd (Type Value Value) Inst)
(extractor (iadd ty x y) (inst_data (InstructionData.BinaryOp (Opcode.Iadd) ty x y)))

(rule (lower (iadd ty x y))
  (do_the_isa_thing x y))

would desugar into core syntax something like:

(decl inst_data (InstructionData) Inst)
(decl inst_data_dual (Inst) InstructionData)
(dual inst_data inst_data_dual)
(extern inst_data_dual inst_data)  ;; note that extern no longer specifies etor/ctor
(decl iadd (Type Value Value) Inst)
(decl iadd_dual (Inst) (Type Value Value)
(dual iadd iadd_dual)

(rule (iadd_dual inst)  ;; all rules have simple bindings only in argument positions
  (if-let instdata (inst_data_dual inst)) ;; inst_data_dual used here because original pattern used `inst_data` and in etor position we rewrite by looking up the dual
  (if-let (InstructionData.BinaryOp (Opcode.Iadd) ty x y) instdata)
  (tuple ty x y))

(rule (lower inst)
  (if-let (tuple ty x y) (iadd_dual inst))
  (do_the_isa_thing ty x y))

It looks a little verbose but it has really nice properties: simple args only on toplevel rules; only "structural destructuring" (as it were?) of tuples and true enums in left-hand sides of if-lets; all invocations to other rules that can fail in right-hand sides of if-lets; and after the prelude of if-lets, a sequence of constructor calls as normal. And finally and most importantly: all forms in a rule that are invocations are invocations to a "constructor", so we can stop calling them that and just call them terms :-)

(To be maximally clear, I see my sketch here as adding onto your points above; strong +1 to tuple-returns, and the implementation details in the trie-again IR; this is mostly sketching my ideas in the place where you noted "less sure about [overload]".)

@jameysharp
Copy link
Contributor Author

Ah, you're filling in two things I was unsure of: term overloading and a syntax for terms with multiple return values.

Regarding dual, that works for me. I gather it's purely syntactic: If a term is used in a pattern, then its dual is looked up instead. I find some appeal in type-directed overloading instead of syntax-directed, especially in conjunction with the existing support for implicit conversions, but this does the job and I can run with it.

I'd like to tweak it a little bit though: If a term currently has only an extractor, like inst_data or the lowering version of iadd, then I don't want to have a dual declaration for it. I think we can just allow terms that have only one parameter to be used in patterns. If a dual declaration exists for that term, then rewrite to it, but otherwise use the original term itself. Then the main sugar here is that rewriting an extractor-only term into the core version of decl has the parameters swapped with the results.

I think the "core" form should be a subset of the full language, so that we can use it directly when we want to. That might mean picking a different name for the core version of decl so the existing version's syntax can coexist with it. (Bikeshedding the name: how about term instead of decl?) We might choose to incrementally migrate existing ISLE source to parts of the core syntax; I'd certainly find term/dual less confusing than the current syntax for extractors, which always feel "backwards" to me.

As for tuple syntax: I had noticed that we need some sort of special form for matching tuples, like your suggested (tuple ...). I like your suggestion better than my idea, which was that you'd use the term name again: (if-let (iadd_dual ty x y) (iadd_dual inst)). And syntactically I was thinking we could get away without a special form for constructing tuples, because we could allow multiple expressions on the right-hand side of a rule, but I like the consistency of your approach. In particular, explicitly constructing a tuple as an expression means it can be on the right-hand side of a let expression, and that allows us to use the same impure binding in multiple tuple fields, which we could not do with if-let.

Also while we're at it I'd like to make our Unit type and unit external constructor into simply 0-tuples, () and (tuple) respectively.

I'm intrigued that you want the core syntax to prohibit function calls in patterns. We have to do that internally when building up the "serialized" representation just before generating Rust source, but the "trie-again" representation doesn't make that distinction, and it's not too hard to build that representation from our current surface syntax. So I'm not sure what that restriction on the core syntax would buy us.

By the way, we absolutely need some kind of inlining for some "extractor" terms or our pattern-matching efficiency will be completely shot. We have to be able to see that iadd and imul both match different enum variants on the result of the same inst_data call or we'll test all variants in some sequence on every instruction. We might gain something by inlining some existing constructors that are used in if-let right now too.

To summarize, I think a reasonable synthesis of our proposals is:

  • Add (tuple ...) syntax in both expressions and patterns, and allow the result type in decl to be a tuple type, though such terms can't declare either internal or external extractors.
  • Add term declarations, which are like decl but only support "constructors". When extern is used with term, it does not support any flags, which have equivalents specified on the term instead, or the constructor/extractor keywords. Allow term declarations to be used in patterns if they have exactly one parameter.
  • Add an inline flag to term. Not all terms can be inlined right now; we'll have to figure out the exact rules for when we want to allow this flag. As an alternative, we could add a pub flag indicating that a term is meant to be called externally, or just a noinline flag, and for everything else, inline anything we can. Or something in between?
  • Add dual declarations; if a pattern uses a term that has a dual declaration then invoke the associated dual term instead. Otherwise if the term was declared with decl then use its extractor, or if it was declared with term then just use it.
  • Turn decl, extractor, and extern constructor/extractor into syntactic sugar for various combinations of term, dual, extern, and rule. Or perhaps just rewrite these at the source level and delete support for them.

Introducing term alone would allow us to convert a variety of simple extractor predicates from Rust to ISLE; more if we also add tuple. Adding inline as well and implementing it when constructing the trie-again representation would be semantically visible because it would make the overlap check more precise and allow us to remove a bunch of rule priorities. Lots of incremental benefits from doing different parts of this work.

@jameysharp jameysharp added the isle Related to the ISLE domain-specific language label May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
isle Related to the ISLE domain-specific language
Projects
None yet
Development

No branches or pull requests

2 participants