Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogenerate IR tree #4560

Merged
merged 5 commits into from Jun 13, 2022
Merged

Autogenerate IR tree #4560

merged 5 commits into from Jun 13, 2022

Conversation

mcpiroman
Copy link
Contributor

@mcpiroman mcpiroman commented Aug 30, 2021

Discussion: https://youtrack.jetbrains.com/issue/KT-47294

The autogeneration is analogous to the one of the FIR tree, although, as suggested, it does not reuse much of the FIR's generator.
It covers expressions (except for IrMemberAccessExpression), declarations and visitors (except for IrElementTransformerVoid).

List of changes (I tried to keep them to minimum):

  • New module: compiler:ir.tree:tree-generator, which generates files into compiler/ir/ir.tree/gen.
  • It has a dependency on a KotlinPoet library used to generate source code.
  • New Idea configuration Generate IR tree, analogous to the Generate FIR tree.
  • Most of the files from compiler/ir/ir.tree/src/(declarations|expressions) removed, except where they had some extension members on the corresponding element.
  • Changed value and type argument members of IrMemberAccessExpression to make it possible to autogen its child elements (see explanation: Autogenerate IR tree #4560 (comment)).
  • Made IrStatementContainer an IrElement. It looks like it was just forgotten to mark it so before.
  • ObsoleteDescriptorBasedAPI is now applied consistently, which means I also had to add a bunch of @OptIn(ObsoleteDescriptorBasedAPI::class) here and there throughout the codebase.
  • accept, transform, acceptChildren and transformChildren are now all in the base (ir.tree) layer, instead of being spread between base and impl - just as in FIR.
  • Changed fun <T> visitConst(expression: IrConst<T>) to fun visitConst(expression: IrConst<*>). The T parameter was not used anywhere.
  • Corrected visitVariableAccess(expression: IrValueAccessExpression) to visitValueAccess to match the element's name.
  • A few accept methods now return more concrete types, so I removed now unnecessary casts.
  • On IrScript changed providedProperties: List<Pair<IrValueParameter, IrPropertySymbol>> to a parallel lists of providedProperties and providedPropertiesParameters.
  • Moved some minor stuff from FIR tree generator to the common source generator module

@max-kammerer
Copy link
Contributor

@mcpiroman Thank you for your amazing work! Frankly speaking I'm not responsible to merge this commits but there are some technical suggestion that I hope will speed up further process:

  1. If it's possible to devide commit into 2 parts (and it's not requires many effort), please split them onto 2 parts: related to adding new infrastructure and related to switching compiler to it. I hope it will speed up landing of first part
  2. I've also noticed that some unrelated files affected in commits, e.g. under .idea folder. Please remove them

@mcpiroman
Copy link
Contributor Author

mcpiroman commented Oct 15, 2021

@max-kammerer I may make the first commit only generate the tree and not use it, but then there will be duplicated classes - one generated and one hand-written - so it won't compile. Assuming you don't want commits that don't compile I would have not to commit those in the former, spliting them into 'everything which does not affect the existing behavior' and 'everything else' fwiw.

Secondly, I have left the IrMemberAccessExpression for now because it requires some less straight forward solution. Would you like to fix it in this PR or a followup?

@mcpiroman mcpiroman closed this Oct 15, 2021
@mcpiroman
Copy link
Contributor Author

Ahh, missclick

@mcpiroman mcpiroman reopened this Oct 15, 2021
@max-kammerer
Copy link
Contributor

max-kammerer commented Oct 18, 2021

@mcpiroman If you suggest to perform splitting in

'everything which does not affect the existing behavior' and 'everything else'

then it would be much better.

Regarding of approach you will take I should note that there could be some clients that could be affected by changes in first (it's depends how would be kept existing behaviour) or/and second parts. So I would suggest to keep as much as possible transparency and clearancy in part that affects current ir declarations and visitors/transfomers/etc (maybe combining some commits, or making actual substitution to generated ones in last commit) and keep classes/files histories (that already done in current approach). Please avoid merge commits and use rebase.

Secondly, I have left the IrMemberAccessExpression for now because it requires some less straight forward solution

It looks ok for me. Could you explain what difficulties are here and what is estimated plan here?

Would you like to fix it in this PR or a followup?

If you mean switching current pr with 'everything which does not affect the existing behavior' I would prefer to keep current solution so other team folks could participate in discussion and add notes having current full view

@mcpiroman
Copy link
Contributor Author

mcpiroman commented Oct 18, 2021

@max-kammerer I don't quite get it overall. As for the review I suggest you just look at the accumulated difference, not commits, even after dividing them anyhow. I feel like in this case it will be more helpful if I describe any changes as needed rather than doing something on the commit layer. I will write the per folder change overview below. Therefore merge commits won't cause any problems, as I expect you will squash everything on merge, right?

  1. The problem with IrMemberAccessExpression is that it uses an unusual pattern for type arguments and value arguments child elements.
    For value arguments it has methods like getTypeArgument() that wrap the private array typeArgumentsByIndex. It has to accept the array's size in the constructor, which is not (and should not be) supported by the codegen. For value arguments it is similar in child elements.
    What's worse is that some of the descendant elements do not have the arguments and therefore override these methods with throw Exception. So it is not only incompatible with my solution but also essentially not type-safe.
    I have hacked some way to make it work but it's not great either and the IrMemberAccessExpression itself cannot be autogenerated. Rather, I'm after making the typeArguments and valueArguments MutableList members, just like with everything else. The only problem is it would require to update a lot of use-sides of calls like getTypeArgument and replace them with the direct access to the actual collections; maybe also some casts or alike.

  2. By the way, I have left some of the extension functions like:

fun IrStringConcatenation.addArgument(argument: IrExpression) {
    arguments.add(argument)
}

which I think are no needed anymore, but to remove them I'd have to modify some use-sides to use the direct collection access. So, should I also commit 1. or 2. with this PR or a follow-up one?

Edit: I've updated the list of changes in the PR's header and I think with those everything should be clear to review as is.

@max-kammerer
Copy link
Contributor

max-kammerer commented Oct 18, 2021

Therefore merge commits won't cause any problems, as I expect you will squash everything on merge, right?

Usually we don't squash commits on merge (if they are not explicitly marked for that) keeping original structure and making it's through rebase pr merge (https://github.com/JetBrains/kotlin/blob/master/docs/contributing.md).
In current commits (at least in polish tree, replace src- with gen-tree) git doesn't recognize many files as moved but marks them as removed and newly added that breaks class/file history (I was wrong here in previous message) that also make changes difficult to review. It's best to split pr on logical pieces by commits, in current approach some minor ones could be merged with bigger one (such structure would be easier to read after non-squash merging, that also will require better commit naming).
It's best to postpone 1 and 2 for separate prs

I will review pr deeper in few day to give non-technical part of feedback.
Please ask me for clarification if something not clear!

And once again, final decision here would required review from other folks too


public abstract var body: IrBlockBody

public override fun <R, D> accept(visitor: IrElementVisitor<R, D>, `data`: D): R =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why data with ``?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library for code generation I use, https://github.com/square/kotlinpoet, seemingly tries to be super safe and always escapes even soft keywords. If you have some other concerns about formatting of the generated code, like line a wrapping, it is also because of it (and that particular case is also by design).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid such artifacts in code, maybe it's worth to file issue in kotlinpoet about that (if there are no way to switch this behaviour off). I didn't compare other changes deeply in generated code as I wrote many generated files lost their history and it's quite complicated to review them

Copy link
Contributor Author

@mcpiroman mcpiroman Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The policy of this library is to generate possibly ugly but firmly working code. I've filed an issue about an option to disable (or control) line wrapping and they declined. I may try with the unneeded escaping but I guess without success. Note that It would be quite hard, sometimes impossible, to track the context of the code and follow the rules of where a particular soft keyword is allowed or not (especially given the language may change). Here is a very relevant example on why this is tricky.

But on the other hand note the generated code won't be seen much either, I expect, mostly from intellisence where this doesn't matter. Or, to get an overview of a given element (like its properties), just go to the DSL (IrTree.kt) as it's much more conscience. (I've just got an idea to for each generated class autogenerate a link in a comment pointing to the line the element is defined in the DSL, don't know if that's possible tho).

many generated files lost their history and it's quite complicated to review them

My advice is to not rely on git here, I don't see how can it help. If you just want to see the generated elements then go to the gen folder. Or if you want to compere them with their hand written original, manually find its class by the name. I recall only one file that is really moved, other than that just treat them as separate entities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add postprocessor here: with replacing all 'data' with data. And removing public modifiers from generated code. Maybe it's also solve issue with file moving (it will require reset and recommit of switch commit)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcpiroman Thank you!

Additonal suggestions:

  1. Avoid : Unit return type generation. By postprocessing or via kotlinpoet if it's possible:
    override fun <D> acceptChildren(visitor: IrElementVisitor<Unit, D>, data: D): Unit {
  2. Avoid additional indentation for function with expressions body:
  3. While not keep old scheme for override fun visitConst(expression: IrConst<*>): IrExpression with type parameter? If we can avoid unneccessery source level breakage I would prefer to not do them
  4. I would also prefer to support some migration for visitVariableAccess: have deprecated one delegating to new 'visitValueAccess'. And moving lower switching to visitValueAccess in separate commit

There is still problem with file movement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@demiurg906
Well, in the IR tree source there are also no parent members. That's why I proposed listing them in the KDoc, in some sane format so that they don't take too much space. But isn't that also an oversight in the (or in the usage of) Intellij? I'd assume it is frequent enough case that it should be possible to list all available members.

@max-kammerer

  1. Can strip that
  2. Don't quite know how to do this, but see below
  3. It is easier for generator, makes more sense (when overridden, the function doesn't visit some consts of type T, it visits all consts), IIRC it also allowed to remove some @Suppress("UNSAFE_CAST").
    But why do you care about the breakage? At least regarding compiler plugins I was told not to worry too much about them as their still in alpha, I'd also expect their authors do look at some git diff of the tree structure between compiler versions anyway.
  4. Same as above. This is generated code so it's not super easy to do such exceptions. Also note there is only one usage of this function in the entire codebase.

But since you want the generated code to be pretty, it should be easier to run some linter over it rather than applying manual corrections, which I don't think would help that much either (I've also dig onto some issue at kotlinpoet and they suggest the same approach). Preferably the same linter as for the rest of the project, right now there isn't one though, right? I've heard about some new service of IntelliJ to lint code outside of the IDE, might be helpful here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also expect their authors do look at some git diff of the tree structure between compiler versions anyway.

Yes, it's true. But in current approach it's quite hard to understand whats going on as file history is broken. Same thing makes full review complicated. That blocks me (and I assume other folks) to perform it in proper way.
Change for 'visitConst' might be ok (as migration here is quite straightforward and it even keeps binary compatibility:)).

At least regarding compiler plugins I was told not to worry too much about them as their still in alpha,

It could be OK in many cases. But here (from my side) core part of IR lowering infrastructure is touched and I would prefer to have some migration mechanism to allow smooth migration (that would be much more important in future but not critical for this PR). I can be wrong but look like in this particular case with 'visitValueAccess' it will require 2-3 additional if in code generator.

But since you want the generated code to be pretty, it should be easier to run some linter over it rather than applying manual corrections, which I don't think would help that much either (I've also dig onto some issue at kotlinpoet and they suggest the same approach).

Most likely you speak about something like code formatter. We used built in one in IDEA and most likely it could be used separately (but it could require some big dependencies that I'm not sure is good to have here). As was already pointed by @demiurg906 IR base declaration assumed to be read in different scenarios and "it's important to have this generated code written in general code style"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely you speak about something like code formatter.

Yes, in particular the newly announced qodana, though now that I looked it doesn't quite seem to support formatting.
So how about postprocessing with ktlint? It would most probably require a few new simple rules for best effect which can be easily plugged-in by creating new module alongside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same code style is important as was written before, but it's last priority concern that I have for this PR //cc @bashor

@max-kammerer
Copy link
Contributor

max-kammerer commented Oct 21, 2021

Once again there are a lot of files that is not recognized by git as modified and moved, but as deleted and newly added (it's breaks file history and most likely would cause problems with commits cherrypicking to 1.6.0/10 release branch that would applied over pr ones). You can manually move these files as is to new folders in master branch and commit them (0 commit), after that you can rebase (or apply patch from pr changes) your pr over 0 commit. (I don't know how you are familiar with git, maybe this article also would be helpful: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History)

@max-kammerer
Copy link
Contributor

I would prefer to have changes for non-generated files with @OptIn(ObsoleteDescriptorBasedAPI::class) annotating in separate commit as they are not directly correspond to main change, but it's minor

@mcpiroman
Copy link
Contributor Author

Assuming you talk about moving the IrElement files from src to gen, then I think the behavior of delete-add is more adequate here because their relationship is not so straightforward:

  • many of the src files contained multiple classes in one file, now every class has it's own
  • some src files had additional declarations like extension functions and therefore they remain, although they core meaning, the class, has been moved
  • the generated classes are quite different (mainly they have more methods) and aren't quite compatible with the src once. Thus it is more logical to consider them as recreated rather than moved.

So here it may make more sense to actually unmark some of the files git marked as moved for the sake of consistency.

@mcpiroman
Copy link
Contributor Author

mcpiroman commented Oct 21, 2021

I would prefer to have changes for non-generated files with @OptIn(ObsoleteDescriptorBasedAPI::class) annotating in separate commit as they are not directly correspond to main change, but it's minor

Ok, but that will produce a not compiling commit

@max-kammerer
Copy link
Contributor

max-kammerer commented Oct 21, 2021

Assuming you talk about moving the IrElement files from src to gen, then I think the behavior of delete-add is more adequate here because their relationship is not so straightforward:

many of the src files contained multiple classes in one file, now every class has it's own
some src files had additional declarations like extension functions and therefore they remain, although they core meaning, the class, has been moved
the generated classes are quite different (mainly they have more methods) and aren't quite compatible with the src once. Thus it is more logical to consider them as recreated rather than moved.

So here it may make more sense to actually unmark some of the files git marked as moved for the sake of consistency.

It's a good points but I'm afraid that such approach will give a lot frustration during review and many problems for this code clients that we have now (I haven't yet taken into account compatibility problems with changes in code itself). So I would prefer to keep matching at least for identically named files.

@max-kammerer
Copy link
Contributor

max-kammerer commented Oct 21, 2021

Just for note: Ideal solution for me would be in splitting all changes into

'everything which does not affect the existing behavior' and 'everything else'

and keep file (and class) history in more proper way

@mcpiroman
Copy link
Contributor Author

mcpiroman commented Oct 21, 2021

I've divided the commits, but I can only guarantee the last will compile. I also don't know how to mark files as moved in git - as far as I know git recognizes moves by itself and the only thing one can do is to change the threshold of the differences to count as a move.

@max-kammerer
Copy link
Contributor

max-kammerer commented Oct 26, 2021

I've divided the commits, but I can only guarantee the last will compile.

Thank you for commit splitting, now logical structure of changes much more clearer! And thank you for additional notes in commit message! Why you think that smth will affect compilation (do you mean last commit or new commit structure)?

I also don't know how to mark files as moved in git - as far as I know git recognizes moves by itself and the only thing one can do is to change the threshold of the differences to count as a move.

I know just straightforward approach with manual file movement to new folder that I've wrote above

@mcpiroman
Copy link
Contributor Author

d532c7d won't compile for sure because the missing @OptIns are added in the next one. The lack of this annotation produces a warning but this projects has warnings-are-errors policy.
Other commits may compile if I've done everything correctly, I just haven't checked.

@max-kammerer
Copy link
Contributor

max-kammerer commented Oct 27, 2021

The lack of this annotation produces a warning but this projects has warnings-are-errors policy.

Yes, it would be compilation fail on warnings (though there are still several modules where this check is not yet enabled).
But it doesn't restrict you to have all commits in your pr without them. At least last one should pass this check

@udalov
Copy link
Member

udalov commented Dec 30, 2021

@mcpiroman Sorry for the enormous delay, and thank you for the great work.

I agree with my colleagues that it'd be better to keep git history. I have a few ideas and can help you with that. I would also like to separate some changes to make the code easier to review. So if you rebase your branch on the current master, and enable "allow edits from maintainers" in this PR (might already be enabled, I don't know), I'd like to make a few changes to your branch to make it mergeable to the current Kotlin master.

@udalov
Copy link
Member

udalov commented Jan 17, 2022

@bashor Thanks for the review!
@mcpiroman Just to clarify, I'm still working on my branch, so I'll address @bashor's comments and upload a new version soon.

@udalov
Copy link
Member

udalov commented Jan 19, 2022

I've pushed the first part to master, and rebased the PR on top of it. I'm still trying to figure out whether the aforementioned performance regression exists, so this is still in progress by me.

In addition to things listed above, I've changed generator a bit following the internal feedback:

  • Do not generate accept in non-leaf elements
    • They are basically useless, and might cause issues if you for some reason add a new IR element manually (not autogenerated) and forget to add accept
  • Do not generate acceptChildren/transformChildren in non-leaf elements, and super calls to it
    • Instead, do it as was done before the PR in master: inline fields of every parent into acceptChildren/transformChildren. This simplifies reading the code (convenient to have acceptChildren/transformChildren declared where you're looking for them, and not in one of supertypes, not obvious which one), and also somewhat simplifies the generator

I've made these changes as separate commits just to make it clear what I changed, but will probably squash them later into the main commit. The whole tree is regenerated only once in the main commit.

@udalov
Copy link
Member

udalov commented Feb 7, 2022

  • Fixes to the suggestions above
  • Make Element.config non-nullable
  • Specify all declaration types in IrTree explicitly -- otherwise IDE plugin in my case still reports multiple errors in random places, and the file is red basically all the time

@udalov
Copy link
Member

udalov commented Feb 16, 2022

I've made a few more changes. As before, I'm going to push the commits which are before the main one separately, and fixup the ones starting with "(fixup)" into the main one.

Most notably, I've made IrElement's accept/transform/acceptChildren/transformChildren methods auto-generated, which allowed to get rid of IrElementBase and to rename IrAbstractElement back to IrElementBase. The new IrElementBase now has default implementations of transform, acceptChildren, transformChildren

Also, I've deduplicated the interface/abstract-class solver copied from FIR tree generator.

@mcpiroman
Copy link
Contributor Author

Should I rebase somehow soon in case I have free time?

@udalov
Copy link
Member

udalov commented May 3, 2022

@mcpiroman Sorry, this has taken a lot longer than it needed and it's completely my fault. I was hoping to do a few remaining changes before merging the PR, but then got caught up with other urgent stuff.

I will rebase it myself this week, take another look at it, and hopefully proceed with merging the PR if there are no objections from the team.

@mcpiroman
Copy link
Contributor Author

No warries, just thought that rebasing might be a blocker to you.

@udalov
Copy link
Member

udalov commented Jun 10, 2022

Changelog after internal review (hopefully the last one):

  • Move files around in a separate commit, otherwise Git was still getting confused by so many simultaneous changes and moves
  • Add a small readme and link it from generated files
  • Restore children visit order in IrWhileLoop: first condition, then body
  • Restore strict cast in IrProperty.transformChildren

@udalov udalov merged commit f003f19 into JetBrains:master Jun 13, 2022
@udalov
Copy link
Member

udalov commented Jun 13, 2022

Thank you for the contribution!

@mcpiroman
Copy link
Contributor Author

So now what about IrMemberAccessExpression?
For me it should also be made autogenerated, but the problem is with these properties:

protected abstract val typeArgumentsByIndex: Array<IrType?>
protected abstract val argumentsByParameterIndex: Array<IrExpression?>

To support autogeneration, those should be made a regular public 'MutableList's (via listField in the generator), but:

  • Being a MutableList, although presized, may be somewhat less efficient than a plain Array , so maybe generate them as arrays too.
  • The current API for those properties becomes just unnecessary wrappers:
abstract class IrMemberAccessExpression<S : IrSymbol> : IrDeclarationReference() {
    ...
    fun getValueArgument(index: Int) = valueArguments[index]
    fun getTypeArgument(index: Int) = typeArguments[index]

    fun putValueArgument(index: Int, valueArgument: IrExpression?) {
        valueArguments[index] = valueArgument
    }
    fun putTypeArgument(index: Int, type: IrType?) {
        typeArguments[index] = type
    }

    val valueArgumentsCount get() = valueArguments.size
    val typeArgumentsCount get() = typeArguments.size
}
  • We can either keep it, or remove (inline) these methods to directly use the lists. IJ provides a nice refactoring for that, however there are some problems:

    • It is a huge amount of callsides and therefore one-line changes.
    • That refactoring won't work for kotlin-native because it is in different gradle build. Maybe there is a walkaround for that?
    • All extensions will need to do the same (if we do care).
  • Current out of range error messages such as "No such value argument slot: $index" are lost and replaced by generic ones provided by the list/array.

  • IrPropertyReferenceImpl and IrLocalDelegatedPropertyReferenceImpl currently have:

override val argumentsByParameterIndex: Array<IrExpression?>
        get() = throw UnsupportedOperationException("Property reference $symbol has no value arguments")

To support walking, these throws now need to be replaced by empty list/array. But because those elements just don't have value arguments, I think it is better (more type-safe) to move that list from IrMemberAccessExpression to some other type (IrMemberAccessWithValueArgumentsExpression?).

  • I found some extension properties in the form of
val IrMemberAccessExpression<*>.typeArguments get() = List(typeArgumentsCount) { getTypeArgument(it) }`

Those could be eliminated to use the now exposed list/array.

  • Probably also change names to just valueArguments and typeArguments.

Now, it could be just left hand written or made dirty-special-cased in the generator, but for me its better long-term to have all the tree elements constrained in structure, as enforced by the generator ,especially that it is already done for all but this one. It would also much help with exploration of the ideas I wrote about here: https://youtrack.jetbrains.com/issue/KT-51427. WDYT?

@mcpiroman mcpiroman deleted the autogen-ir-tree branch October 31, 2022 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants