New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge JSON responses from gh api
#5652
Conversation
A WIP that still needs some refactoring to avoid duplicate code. Also depends on darccio/mergo#210 getting merged or, if we want to stick with that library, vendoring or forking it. This was more a proof of concept but was hoping for some early feedback on the concept. With the mergo change this would handle both top-level JSON arrays or objects, so both REST and GraphQL responses. |
@mislav I need to add some specific tests, but wanted to see - if you have time to review - if you feel this is on the right track. I maintained writing responses as they come without pagination so that the user can see something happening. I do ponder adding a progress indicator if paginating, though, because it's pretty long. The main concern with that is that, even writing to stderr, it's likely callers have redirected at least stdout if not stderr using . Writing progress could, therefore, be a breaking change. EDIT: Actually, considering we'd only show progress if stderr wasn't redirected, this would probably be fine. I'll make the change, but certainly can pull it if there's concern. |
e042de9
to
a7ca779
Compare
@mislav how much do we care about keeping errors printed in the same order. This is what we do currently:
I'm trying to better refactor this so the code is cleaner and easier to maintain (basically, fewer branches in the code and having to pass fewer parameters) but trying to keep this error format is making that more difficult unless I just inline all the code into the But, if the error format could be the following, there's a lot cleaner refactorings possible:
Frankly, I like this format better anyway. That an error occurred is shown immediately, followed by more details from the server. |
@mislav I've been using this in a few instances and not exhibited any problems. What do you think? |
darccio/mergo#210 was merged so we can actually pick up v0.3.13 and not need a redirect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heaths Thanks for your patients on this review. Overall, I like this functionality and it feels straight forward with the mergo
package. Just left a couple comments for your consideration.
@mislav @samcoe this is ready to review. Even before the refactoring, the response body would be read completely into a buffer anyway - and in some cases, multiple times - so I ended up allocating a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heaths This looks good to me. Thanks for the thorough work here.
Hi, thanks for the work! If the pr is approved, can we get it merged to fix some downstream issues at pwntester/octo.nvim#302 :) |
We're still interested in this but wanted consensus; waiting on @mislav 's review |
@mislav you made mention of this the other day for someone else that logged a dup bug. I'm glad you're still interested in it, but can you help me understand what you'd like me to do to get it over the finish line? I can resolve the merge conflicts, but I've been doing that for many months and it takes time, though less after I rebased and squashed. I'm happy to resolve the conflicts if you're going to take this or otherwise leave comments so I can get this in suitable shape. For anyone else that comes across this PR and not issue #1268, I used basically the same code here in an extension that will merge after the fact* without having to install utilities like gh ext install heaths/gh-merge-json
gh api "{endpoint}" --paginate | gh merge-json # returns valid JSON * albeit slower, since /cc @samcoe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry for the late review. I tried multiple times to come up with suggestions on how to potentially make this PR better, but I didn't have good ideas, and I've put it off until later. A lot of time has passed since and here we are. I do really want to see this feature in, but even though I care about this a lot, I also want us to approach the implementation carefully and with as little disruption to the gh api
command as possible.
My qualms with the implementation you've proposed here are:
- Code size: quite a lot of existing code and non-pagination supporting code paths needed to be modified in order to support this feature.
- No streaming support: we used to stream API responses directly from the network to stdout, but this implementation relies on buffering all the responses until all pages are fetched. That means that the client receiving the output of
gh api
cannot start processing JSON until everything has finished fetching. - Passing entire API responses through a 3rd-party "mergo" library: we used to return API responses unchanged, and now we're significantly massaging them through something which to me is a black box. I feel this is risky, since a bug in mergo can potentially corrupt API responses.
- Deep-merging of GraphQL responses: GraphQL queries can also be paginated, but GraphQL queries can also contain multiple arrays in response data. The "mergo" library doesn't know which array is the one that is being paginated, so it will merge them all according to its deep merge strategy.
As part of this review, I wanted to explore an alternate implementation to see if it's even possible to address these points. I came up with #7190, where I'd value your thoughts when you find time. Specifically, the code spike tries to keep the implementation small, have the modifications only affect the --pagination
code path, avoid a 3rd party library, and continue streaming API responses to stdout. This is done by wrapping resp.Body
in a Reader that strategically omits the leading [
or trailing ]
of JSON arrays. This seems to work on my testing, so please try it out! GraphQL pagination is intentionally non-addressed in this exploration since solving it would be non-trivial.
I honestly can't think of any way to do this when an array is nested and you could have multiple arrays. With any nested arrays in a REST response, those would be properties of a resource (an array of resources or not) so only a top-level array would need "merging" and your approach (based on your description) would work fine there. Alternatively, what about considering a built-in command to optionally buffer responses a la https://github.com/heaths/gh-merge-json? Ideally, output from |
Thinking about this more, given the GraphQL paginated code was invalid JSON anyway, does the fact it was streaming output really mattered? Either you're piping to something that is consuming JSON somehow ( For simple arrays that only REST could return, your approach in #7173 would seem to solve the problem but what about limiting this to just paginated GraphQL responses? Looking through your PR, I could use a similar approach that would require less refactoring and only impact a limited set of scenarios e.g., a reader (though, since instanced - perhaps passed in, which yours could be too) that under the right conditions does an internal read, unmarshals the JSON, merges, and when done, marshals back out to the original reader on last page. |
I'm not insinuating that something is wrong with the library: it's just that
That sounds good! I'd be down with that. Would you explore this approach when you have time? |
@mislav sure I can explore it if there's an active interest in taking it (I'm certainly interested in solving it), but it might be easier to pick up from your changes in #7190 if you're planning to go ahead with that; and I can either close this PR and open anew, or rebase it entirely so the history is here, but I would then recommend treating it as a completely new PR i.e., don't review the diff. |
@heaths: what makes the most sense regarding this PR? Are you okay with closing it until such a time that either you or the CLI team can prioritize the work? Given the age and triage efforts to ensure PRs don't grow stale, I appreciate you understanding the desire to ensure the CLI team is staying engaged while having a up to date view of work in flight. If you want my $0.02, I think closing this and creating a new PR when that work can be prioritized while linking it back here for posterity makes the most sense. 🤷 By the way, nice to meet you and thank you for being a consistent CLI friend and contributor! 👋 ❤️ 🙇 |
What work do you need to prioritize? The PR has been done for ages. Seems the outstanding concern was using a third party module, which is sort of absurd given how many are already in use. This module has decent usage and it's straight forward. It could even be reimplimented, but that's not very idiomatic. There's no rush of breaking changes here. The output is invalid JSON that would be valid. Any work around like slurping wouldn't break because there'd be nothing to slurp. I see no downside. I could rebase this and make it conditional for GraphQL. |
Thanks for the follow up, @heaths 🙇 It seems this PR needs prioritization from both sides:
There is a lot of history and mentions of the related issue in other places. I admit there is a lot to catch up on here. |
Would a quick sync over Teams be acceptable (we're on the same network)? Most of the history is just me keeping this up to date (which gets harder with the PR age, hence trying to get buy-in from Mislav) and not getting it reviewed with requested changes for many months, but we could maybe go through this PR and recap quickly. The quick gist is that Mislav wasn't keen on the mergo dependency I took after some investigation. I think concern about this being a breaking change - given it was already broken - was resolved, but there was still some concern about a behavioral change when logging headers/logs via EDIT: I sent you a message over Teams in case you wanted to chat briefly tomorrow morning. |
@heaths : sorry for being slow to respond, I'm going to raise this up with @samcoe and @williammartin, seeing if we can make time next week to get on the same page and get this settled 👍 |
@heaths : Happy new year! Just making sure to deliver on pre-holidays conversation about keeping an eye on this PR so we can prioritize it and get it shipped. No pressure or anything, just wanting to deliver on the commitment on our end 🫂 |
@andyfeller @samcoe I had a thought. Building on Nate's writer that merges top-level JSON arrays and a discussion we had about something similar for GraphQL, could I create a REST and GraphQL response merger (separate implementations) in Line 112 in 57f6787
I'd develop these in conjunction to test them out - make sure the design is good - but |
@heaths That seems like a good idea to me, I don't know if we will get around to fully automatic pagination but having a helper(s) to merge responses is definitely a step forward for cli/go-gh#23. |
Partly resolves cli/cli#1268 and replaces cli/cli#5652
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
Closing in lieu of #8620. @samcoe @andyfeller that and cli/go-gh#148 are ready for review. |
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
Fixes #1268