Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge JSON responses from gh api #5652

Closed
wants to merge 1 commit into from
Closed

Conversation

heaths
Copy link
Contributor

@heaths heaths commented May 16, 2022

Fixes #1268

@heaths
Copy link
Contributor Author

heaths commented May 16, 2022

A WIP that still needs some refactoring to avoid duplicate code. Also depends on darccio/mergo#210 getting merged or, if we want to stick with that library, vendoring or forking it. This was more a proof of concept but was hoping for some early feedback on the concept. With the mergo change this would handle both top-level JSON arrays or objects, so both REST and GraphQL responses.

@heaths
Copy link
Contributor Author

heaths commented May 17, 2022

@mislav I need to add some specific tests, but wanted to see - if you have time to review - if you feel this is on the right track. I maintained writing responses as they come without pagination so that the user can see something happening. I do ponder adding a progress indicator if paginating, though, because it's pretty long. The main concern with that is that, even writing to stderr, it's likely callers have redirected at least stdout if not stderr using . Writing progress could, therefore, be a breaking change.

EDIT: Actually, considering we'd only show progress if stderr wasn't redirected, this would probably be fine. I'll make the change, but certainly can pull it if there's concern.

@heaths heaths force-pushed the issue1268 branch 2 times, most recently from e042de9 to a7ca779 Compare May 17, 2022 07:29
@heaths heaths marked this pull request as ready for review May 17, 2022 12:45
@heaths heaths requested a review from a team as a code owner May 17, 2022 12:45
@heaths heaths requested review from mislav and removed request for a team May 17, 2022 12:45
@cliAutomation cliAutomation added the external pull request originating outside of the CLI core team label May 17, 2022
@cliAutomation cliAutomation added this to Needs review 🤔 in The GitHub CLI May 17, 2022
@heaths
Copy link
Contributor Author

heaths commented May 19, 2022

@mislav how much do we care about keeping errors printed in the same order. This is what we do currently:

{
  "message":"Not Found",
  "documentation_url":"https://docs.github.com/rest"
}
gh: Not Found (HTTP 404)
exit status 1

I'm trying to better refactor this so the code is cleaner and easier to maintain (basically, fewer branches in the code and having to pass fewer parameters) but trying to keep this error format is making that more difficult unless I just inline all the code into the for loop.

But, if the error format could be the following, there's a lot cleaner refactorings possible:

gh: Not Found (HTTP 404)
{
  "message":"Not Found",
  "documentation_url":"https://docs.github.com/rest"
}
exit status 1

Frankly, I like this format better anyway. That an error occurred is shown immediately, followed by more details from the server.

go.mod Outdated Show resolved Hide resolved
@heaths
Copy link
Contributor Author

heaths commented May 24, 2022

@mislav I've been using this in a few instances and not exhibited any problems. What do you think?

@heaths
Copy link
Contributor Author

heaths commented May 25, 2022

darccio/mergo#210 was merged so we can actually pick up v0.3.13 and not need a redirect.

@heaths
Copy link
Contributor Author

heaths commented Jun 1, 2022

@mislav @samcoe what do you think of this? Makes the output of paginated responses much easier to use.

Copy link
Contributor

@samcoe samcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heaths Thanks for your patients on this review. Overall, I like this functionality and it feels straight forward with the mergo package. Just left a couple comments for your consideration.

pkg/cmd/api/api.go Outdated Show resolved Hide resolved
pkg/cmd/api/api.go Outdated Show resolved Hide resolved
@heaths
Copy link
Contributor Author

heaths commented Jun 18, 2022

@mislav @samcoe this is ready to review. Even before the refactoring, the response body would be read completely into a buffer anyway - and in some cases, multiple times - so I ended up allocating a single bytes.Reader that implements io.ReadSeeker so I could seek to the beginning as needed without additional allocations, which was possible before.

pkg/cmd/api/api.go Outdated Show resolved Hide resolved
Copy link
Contributor

@samcoe samcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heaths This looks good to me. Thanks for the thorough work here.

The GitHub CLI automation moved this from Needs review 🤔 to Needs to be merged 🎉 Jun 27, 2022
@eddiebergman
Copy link

Hi, thanks for the work! If the pr is approved, can we get it merged to fix some downstream issues at pwntester/octo.nvim#302 :)

@heaths
Copy link
Contributor Author

heaths commented Aug 4, 2022

@samcoe @mislav is this good to merge for next release? It would sure be helpful.

@vilmibm
Copy link
Contributor

vilmibm commented Aug 22, 2022

We're still interested in this but wanted consensus; waiting on @mislav 's review

@heaths
Copy link
Contributor Author

heaths commented Mar 13, 2023

@mislav you made mention of this the other day for someone else that logged a dup bug. I'm glad you're still interested in it, but can you help me understand what you'd like me to do to get it over the finish line? I can resolve the merge conflicts, but I've been doing that for many months and it takes time, though less after I rebased and squashed. I'm happy to resolve the conflicts if you're going to take this or otherwise leave comments so I can get this in suitable shape.

For anyone else that comes across this PR and not issue #1268, I used basically the same code here in an extension that will merge after the fact* without having to install utilities like jq that might be harder to acquire for your system e.g., Windows. Run:

gh ext install heaths/gh-merge-json
gh api "{endpoint}" --paginate | gh merge-json # returns valid JSON

* albeit slower, since gh already stringified the response(s), which the extension has jsonify again, merge, and re-stringify.

/cc @samcoe

Copy link
Contributor

@mislav mislav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the late review. I tried multiple times to come up with suggestions on how to potentially make this PR better, but I didn't have good ideas, and I've put it off until later. A lot of time has passed since and here we are. I do really want to see this feature in, but even though I care about this a lot, I also want us to approach the implementation carefully and with as little disruption to the gh api command as possible.

My qualms with the implementation you've proposed here are:

  • Code size: quite a lot of existing code and non-pagination supporting code paths needed to be modified in order to support this feature.
  • No streaming support: we used to stream API responses directly from the network to stdout, but this implementation relies on buffering all the responses until all pages are fetched. That means that the client receiving the output of gh api cannot start processing JSON until everything has finished fetching.
  • Passing entire API responses through a 3rd-party "mergo" library: we used to return API responses unchanged, and now we're significantly massaging them through something which to me is a black box. I feel this is risky, since a bug in mergo can potentially corrupt API responses.
  • Deep-merging of GraphQL responses: GraphQL queries can also be paginated, but GraphQL queries can also contain multiple arrays in response data. The "mergo" library doesn't know which array is the one that is being paginated, so it will merge them all according to its deep merge strategy.

As part of this review, I wanted to explore an alternate implementation to see if it's even possible to address these points. I came up with #7190, where I'd value your thoughts when you find time. Specifically, the code spike tries to keep the implementation small, have the modifications only affect the --pagination code path, avoid a 3rd party library, and continue streaming API responses to stdout. This is done by wrapping resp.Body in a Reader that strategically omits the leading [ or trailing ] of JSON arrays. This seems to work on my testing, so please try it out! GraphQL pagination is intentionally non-addressed in this exploration since solving it would be non-trivial.

The GitHub CLI automation moved this from Needs to be merged 🎉 to Needs review 🤔 Mar 17, 2023
@heaths
Copy link
Contributor Author

heaths commented Mar 17, 2023

  1. I'll take a look at your PR soon. Conceptually, I like the approach and tried to come up with similar that would handle GraphQL, but see my thoughts below.
  2. I appreciate that this buffers responses and that's a behavioral change, but don't understand why a new third-party library is a concern when the vast majority of modules in go.mod also are. It seems to be a solid library with a lot of importers - over 3,200 - on pkg.go.dev. Again, though I appreciate the concern about buffering responses. As a mere suggestion, what about allowing people to opt into paging internally with the caveat responses will no longer be buffered? For the majority of use cases I imagine it would be of little consequence. Or maybe only do it for GraphQL paged responses.

I honestly can't think of any way to do this when an array is nested and you could have multiple arrays. With any nested arrays in a REST response, those would be properties of a resource (an array of resources or not) so only a top-level array would need "merging" and your approach (based on your description) would work fine there.

Alternatively, what about considering a built-in command to optionally buffer responses a la https://github.com/heaths/gh-merge-json? Ideally, output from gh api just works but at least not requiring a separate install may be nice. That said, #7173 simplifies that scenario so it may be moot.

@heaths
Copy link
Contributor Author

heaths commented Mar 17, 2023

Thinking about this more, given the GraphQL paginated code was invalid JSON anyway, does the fact it was streaming output really mattered? Either you're piping to something that is consuming JSON somehow (jq, PowerShell's ConvertFrom-Json, whatever) and would err on invalid JSON, or you're redirecting to a file in which case streaming doesn't really matter anyway.

For simple arrays that only REST could return, your approach in #7173 would seem to solve the problem but what about limiting this to just paginated GraphQL responses? Looking through your PR, I could use a similar approach that would require less refactoring and only impact a limited set of scenarios e.g., a reader (though, since instanced - perhaps passed in, which yours could be too) that under the right conditions does an internal read, unmarshals the JSON, merges, and when done, marshals back out to the original reader on last page.

@mislav
Copy link
Contributor

mislav commented Mar 28, 2023

2. but don't understand why a new third-party library is a concern when the vast majority of modules in go.mod also are. It seems to be a solid library with a lot of importers

I'm not insinuating that something is wrong with the library: it's just that gh api used to stream the actual, unmodified GitHub API results directly to stdout, but with this change, all --paginate results will have been deserialized into Go values, proxied through a 3rd-party library not maintained by us, and then serialized back into JSON. That, I feel, is a too radical of a departure from that gh api used to do. Imagine if curl did any of this before returning you the results! So if we could avoid that, I'd be super happy since it would mean that we are not compromising too much of the command's design.

I could use a similar approach that would require less refactoring and only impact a limited set of scenarios e.g., a reader (though, since instanced - perhaps passed in, which yours could be too) that under the right conditions does an internal read, unmarshals the JSON, merges, and when done, marshals back out to the original reader on last page.

That sounds good! I'd be down with that. Would you explore this approach when you have time?

@heaths
Copy link
Contributor Author

heaths commented Mar 29, 2023

That sounds good! I'd be down with that. Would you explore this approach when you have time?

@mislav sure I can explore it if there's an active interest in taking it (I'm certainly interested in solving it), but it might be easier to pick up from your changes in #7190 if you're planning to go ahead with that; and I can either close this PR and open anew, or rebase it entirely so the history is here, but I would then recommend treating it as a completely new PR i.e., don't review the diff.

@mislav mislav removed their assignment Jun 9, 2023
@andyfeller
Copy link
Contributor

andyfeller commented Nov 30, 2023

@heaths: what makes the most sense regarding this PR? Are you okay with closing it until such a time that either you or the CLI team can prioritize the work?

Given the age and triage efforts to ensure PRs don't grow stale, I appreciate you understanding the desire to ensure the CLI team is staying engaged while having a up to date view of work in flight.

If you want my $0.02, I think closing this and creating a new PR when that work can be prioritized while linking it back here for posterity makes the most sense. 🤷

By the way, nice to meet you and thank you for being a consistent CLI friend and contributor! 👋 ❤️ 🙇

@heaths
Copy link
Contributor Author

heaths commented Nov 30, 2023

What work do you need to prioritize? The PR has been done for ages. Seems the outstanding concern was using a third party module, which is sort of absurd given how many are already in use. This module has decent usage and it's straight forward. It could even be reimplimented, but that's not very idiomatic.

There's no rush of breaking changes here. The output is invalid JSON that would be valid. Any work around like slurping wouldn't break because there'd be nothing to slurp. I see no downside.

I could rebase this and make it conditional for GraphQL.

@andyfeller
Copy link
Contributor

Thanks for the follow up, @heaths 🙇

It seems this PR needs prioritization from both sides:

  1. core maintainers end to discuss these changes and how to move forward following Mislav's departure
  2. yours on addressing the conflicting files

There is a lot of history and mentions of the related issue in other places. I admit there is a lot to catch up on here.

@heaths
Copy link
Contributor Author

heaths commented Dec 4, 2023

Would a quick sync over Teams be acceptable (we're on the same network)? Most of the history is just me keeping this up to date (which gets harder with the PR age, hence trying to get buy-in from Mislav) and not getting it reviewed with requested changes for many months, but we could maybe go through this PR and recap quickly.

The quick gist is that Mislav wasn't keen on the mergo dependency I took after some investigation. I think concern about this being a breaking change - given it was already broken - was resolved, but there was still some concern about a behavioral change when logging headers/logs via --verbose since it'd only happen with the first or last (I forget which we settled on off hand) now. Personally, I think that likelihood is low. Given this doesn't emit valid JSON and everyone had to do work arounds (I even wrote an extension to make it easier on non-*nix) I'd be surprised if any were doing verbose logging in automation. Seems all that would likely have been quelled or, at the very least, difficult to associate with any particular request unless people used something like tee. I feel having this emit valid JSON so it's useable "out of the box" would be worth the potential for unlikely but possible breaking behavior changes with corner case behaviors (verbose logging).

EDIT: I sent you a message over Teams in case you wanted to chat briefly tomorrow morning.

@andyfeller
Copy link
Contributor

@heaths : sorry for being slow to respond, I'm going to raise this up with @samcoe and @williammartin, seeing if we can make time next week to get on the same page and get this settled 👍

@andyfeller andyfeller added the discuss Feature changes that require discussion primarily among the GitHub CLI team label Dec 8, 2023
@andyfeller
Copy link
Contributor

@heaths : Happy new year! Just making sure to deliver on pre-holidays conversation about keeping an eye on this PR so we can prioritize it and get it shipped. No pressure or anything, just wanting to deliver on the commitment on our end 🫂

@heaths
Copy link
Contributor Author

heaths commented Jan 16, 2024

@andyfeller @samcoe I had a thought. Building on Nate's writer that merges top-level JSON arrays and a discussion we had about something similar for GraphQL, could I create a REST and GraphQL response merger (separate implementations) in github.com/cli/go-gh? Could be generally useful. To maintain streaming for REST responses, I'd more or less keep

type paginatedArrayReader struct {
. A GraphQL would cache the map until the last page is merged and then flush it to whatever writer was passed in.

I'd develop these in conjunction to test them out - make sure the design is good - but github.com/cli/go-gh would need to get merged first, of course. Thoughts?

@samcoe
Copy link
Contributor

samcoe commented Jan 16, 2024

@heaths That seems like a good idea to me, I don't know if we will get around to fully automatic pagination but having a helper(s) to merge responses is definitely a step forward for cli/go-gh#23.

heaths added a commit to heaths/go-gh that referenced this pull request Jan 25, 2024
heaths added a commit to heaths/cli that referenced this pull request Jan 25, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
@heaths heaths mentioned this pull request Jan 25, 2024
4 tasks
@heaths
Copy link
Contributor Author

heaths commented Jan 25, 2024

Closing in lieu of #8620. @samcoe @andyfeller that and cli/go-gh#148 are ready for review.

@heaths heaths closed this Jan 25, 2024
heaths added a commit to heaths/cli that referenced this pull request Jan 31, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this pull request Feb 2, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this pull request Feb 15, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this pull request Mar 12, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this pull request Apr 4, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature changes that require discussion primarily among the GitHub CLI team external pull request originating outside of the CLI core team
Projects
No open projects
The GitHub CLI
  
Needs review 🤔
Development

Successfully merging this pull request may close these issues.

Output a single JSON document from api --paginate
8 participants