Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support merging JSON arrays, objects #148

Closed
wants to merge 2 commits into from
Closed

Conversation

heaths
Copy link
Contributor

@heaths heaths commented Jan 25, 2024

Partly resolves cli/cli#1268 and replaces cli/cli#5652

heaths added a commit to heaths/cli that referenced this pull request Jan 25, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
@heaths
Copy link
Contributor Author

heaths commented Jan 25, 2024

@samcoe the idea here is to retain the basic - and intentionally rather minimal - design Nate put into place for JSON arrays and write them to stdout as we read them. This should have no practical impact to existing JSON array responses. Using the same interface - again, to retain the same basic design and not require near the level of refactoring cli/cli#5652 did - we can merge JSON objects using v1.0.0 of mergo, cache the merged object in memory, and write the merged object in the end. Given there can be multiple arrays and arrays will always be nested within at least the root data element, this seems the only way.

@andyfeller
Copy link
Contributor

@heaths : Thank you for this! ❤️ One thing I want to share is that @samcoe has left GitHub though hopefully will still be a contributor and advocate. 😞 Wanting to make sure we demonstrate our side of the commitment on this topic, I'm bringing this up with @williammartin so we can avoid the same stagnation from last time.

@heaths
Copy link
Contributor Author

heaths commented Jan 25, 2024

Good luck to you, @samcoe, in your new venture! Thanks for all the support over the years!

@andyfeller
Copy link
Contributor

andyfeller commented Jan 25, 2024

@heaths : having reviewed these changes a few times, I think I have a mental picture of how these changes fit into gh api but would like to confirm a few assumptions I'm making:

  1. Moving the JSON merging logic here should primarily minimize the gh api changes to the original mergeJSON logic

  2. mergeJSON will still need to determine whether it is merging arrays or objects and use the appropriate Merger

Beyond that, is there anything else to consider regarding these changes and the original PR?

@heaths
Copy link
Contributor Author

heaths commented Jan 25, 2024

No. In fact, I would pretty much forget the old PR exists. That was a huge refactor whereas you can see the new one on draft status is tiny. The logic of REST vs GraphQL was already determined, so I built on that. The goal was to keep the same basic flow that is currently in release: streaming for REST array responses (in case anyone took a dependency on that behavior), at least.

Copy link
Member

@williammartin williammartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this @heaths, I have left a few comments and some thoughts inline here.

Firstly, I think you've done a really good job at minimising the changes required in the current api command code so kudos for that.

Secondly, though I don't think it should necessarily be done as part of this work, do you think it's worth exploring an earlier branching of graphql and rest in the API command? Things like isFirstPage and the streaming vs writing on Close behaviours are screaming that there's too much going on here. Similarly, when I look at the api command there's so many conditionals predicated on whether we're hitting graphql or not and it seems likely that would only increase. If we branched earlier, much of this PRs code would seem to become much simpler and idiomatic.

Finally, I see you put this in go-gh, is there a reason for that? Particularly since it seems tailored for our use case in the api command (e.g. objectMergerPage.Read not actually reading into the provided byte slice) I'm not sure that it makes sense for this to live as a consumable semvered package.

pkg/jsonmerge/object.go Outdated Show resolved Hide resolved
pkg/jsonmerge/object.go Outdated Show resolved Hide resolved
pkg/jsonmerge/object.go Show resolved Hide resolved
pkg/jsonmerge/object.go Show resolved Hide resolved
pkg/jsonmerge/object.go Show resolved Hide resolved
@heaths
Copy link
Contributor Author

heaths commented Jan 27, 2024

Secondly, though I don't think it should necessarily be done as part of this work, do you think it's worth exploring an earlier branching of graphql and rest in the API command? Things like isFirstPage and the streaming vs writing on Close behaviours are screaming that there's too much going on here. Similarly, when I look at the api command there's so many conditionals predicated on whether we're hitting graphql or not and it seems likely that would only increase. If we branched earlier, much of this PRs code would seem to become much simpler and idiomatic.

If you look at the original PR #5652 where I did refactor how a lot of this worked, it's huge. I'm not opposed to a major refactor similar to that, but I spent countless hours maintaining a clean merge since the PR was ignored for so long. I can't afford that again.

However, I don't think the idea here is wrong. Only when we need to paginate do we even need to consider merging, which is what these related changes do. Yes, the cli/cli does have a lot of branching on "/graphql" but the behavior for REST vs. GraphQL is very different, so that's not surprising. Paging is different, merging is different, output is different.

Finally, I see you put this in go-gh, is there a reason for that? Particularly since it seems tailored for our use case in the api command (e.g. objectMergerPage.Read not actually reading into the provided byte slice) I'm not sure that it makes sense for this to live as a consumable semvered package.

The idea I discussed with @andyfeller and Sam was that because this was effectively a thin wrapper around mergo - while retaining Nate's "streaming" REST response merger - it could be useful to others. It's not unreasonable for extension authors to query for responses and may need to merge them as well. Having written several extensions myself - including a project V2 extension that called a lot of GraphQL queries and mutations - having utility functions like this in go-gh is very useful. That's why I've helped move a bunch of functionality or add some here. We could move it to cli/cli but I think it would be more valuable here.

I'll respond to individual comments.

@williammartin
Copy link
Member

williammartin commented Jan 27, 2024

I'm not opposed to a major refactor similar to that, but I spent countless hours maintaining a clean merge since the PR was ignored for so long. I can't afford that again.

Just to be super clear, I have no expectations for this to happen:

  • As part of this work
  • On your back

I'm asking for your opinion as someone who has more familiarity with the code than Andy or me.

However, I don't think the idea here is wrong

I also would not use the word wrong here. As in my comment above, you've done a remarkable job at providing something that explicitly doesn't require wholesale changes in the command layer. It's crafty!

Yes, the cli/cli does have a lot of branching on "/graphql" but the behavior for REST vs. GraphQL is very different, so that's not surprising. Paging is different, merging is different, output is different.

The point I'm failing to make is that the API command branches 7 times (at a cursory count) based on whether this is a graphql request or not. That makes for a lot of state that needs to be carried around mentally when reading the code. Additionally, it forces other abstractions to contort into supporting the shared code path (see isLastPage, and objectMergerPage which says it satisfies the io.Reader interface but isn't really a valid implementation as per the interface doc comments because it never reads bytes into p).

If instead, we branched once, early, I'm curious how that tradeoff would play out. Yes, there would be some duplication but I wonder if we're in a case of "duplication is far cheaper than the wrong abstraction". The more code that gets written to support this style, the more we will feel the pressure to continue to invest in it over time.

Again, I'm not asking for you to do any work to implement this, or even to validate it, I'm just interested in whether it resonates with you given your experience poking around in here. If you were to say "I don't know and I don't have the time to investigate it" that's totally understandable.

Having written several extensions myself - including a project V2 extension that called a lot of GraphQL queries and mutations - having utility functions like this in go-gh is very useful.

Thanks for the insight. I'm not opposed to keeping it here though I have some reservations about the existing abstraction. Having it here doesn't require that we maintain it beyond keeping it exposed and not breaking it, and if it's useful in its current state so be it.

I've responded to individual comments and unresolved those that I would like a response to.

@williammartin
Copy link
Member

Also, just to set expectations, I don't think there is anything about this PR that is really blocking. I just want to make sure we all get on the same page about its usage and limitations. Right now I fully expect that the PR will not change significantly, and will get merged in shortly. I certainly don't intend to drag this out especially given the experience we talked about before on #5652.

@heaths
Copy link
Contributor Author

heaths commented Jan 29, 2024

If instead, we branched once, early, I'm curious how that tradeoff would play out. Yes, there would be some duplication but I wonder if we're in a case of "duplication is far cheaper than the wrong abstraction". The more code that gets written to support this style, the more we will feel the pressure to continue to invest in it over time.

Again, I'm not asking for you to do any work to implement this, or even to validate it, I'm just interested in whether it resonates with you given your experience poking around in here. If you were to say "I don't know and I don't have the time to investigate it" that's totally understandable.

I see. Thanks for clarifying. Yes, bifurcating early might actually be better. That's why keeping my original PR clean against your upstream trunk was so expensive. As other PRs came in - often just changing a line or two - the branching was so complicated it would throw everything off. It was also hard to understand initially, which made my PR refactoring much of the code even harder to understand - even for me when cleaning up merge conflicts.

I'm open to doing this work. What is your timeline for the next release, though? I have a couple thoughts depending on that:

  1. Close this PR and move it to the internal package in cli/cli. Make some of the changes we've discussed above (I haven't tried the CopyN suggestion yet but will once we settle on some other feedback).
  2. Merge that for an initial implementation.
  3. As a separate PR, refactor.

If we split the work, an initial release to merge JSON arrays and objects could ship. Refactoring later should have no user impact. That said, I'd also have less incentive to finish 3, but still interested. That said, if one of your team wants to work on that, I wouldn't be offended. 😄

@williammartin
Copy link
Member

williammartin commented Jan 30, 2024

I see. Thanks for clarifying. Yes, bifurcating early might actually be better. That's why keeping my original PR clean against your upstream trunk was so expensive. As other PRs came in - often just changing a line or two - the branching was so complicated it would throw everything off.

Yeh understood. Total nightmare for everyone once that starts happening.

What is your timeline for the next release, though?

We release every 2 weeks or so, sometimes more frequently. I would anticipate we ship by next Tuesday at the latest. I don't think any large scale refactor would need to make it into the next release though.

I think we should proceed with getting this work into cli/cli pronto. This is a useful feature and we absolutely don't want to block it on some nebulous refactor that provides uncertain value and has an unknown time-bound. It may end up being more toil and risk than we are comfortable with.

That said, I'd also have less incentive to finish 3, but still interested. That said, if one of your team wants to work on that, I wouldn't be offended. 😄

So just for context right now I'm the only full time maintainer on the core CLI. @andyfeller is a machine and jumps back and forth between this and copilot CLI work. We are looking to add more engineers but it's a process.

That said, I do think that I would find it valuable to get my hands grubby on this one since the refactor isn't time sensitive. There's a lot I still need to learn about the nuances of this command. I don't think I'd looked at the implementation of gh api before this PR and trying to offer constructive review without experience in the code is challenging.

@heaths
Copy link
Contributor Author

heaths commented Jan 31, 2024

Closing per discussion above and in cli/cli#8620.

@heaths heaths closed this Jan 31, 2024
@heaths heaths deleted the merge-json branch January 31, 2024 07:33
heaths added a commit to heaths/cli that referenced this pull request Jan 31, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this pull request Feb 2, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this pull request Feb 15, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this pull request Mar 12, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this pull request Apr 4, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Output a single JSON document from api --paginate
3 participants