Merge JSON responses from `gh api` #5652

heaths · 2022-05-16T09:53:53Z

heaths · 2022-05-16T09:55:58Z

A WIP that still needs some refactoring to avoid duplicate code. Also depends on darccio/mergo#210 getting merged or, if we want to stick with that library, vendoring or forking it. This was more a proof of concept but was hoping for some early feedback on the concept. With the mergo change this would handle both top-level JSON arrays or objects, so both REST and GraphQL responses.

heaths · 2022-05-17T07:10:27Z

@mislav I need to add some specific tests, but wanted to see - if you have time to review - if you feel this is on the right track. I maintained writing responses as they come without pagination so that the user can see something happening. I do ponder adding a progress indicator if paginating, though, because it's pretty long. The main concern with that is that, even writing to stderr, it's likely callers have redirected at least stdout if not stderr using . Writing progress could, therefore, be a breaking change.

EDIT: Actually, considering we'd only show progress if stderr wasn't redirected, this would probably be fine. I'll make the change, but certainly can pull it if there's concern.

heaths · 2022-05-19T02:29:26Z

@mislav how much do we care about keeping errors printed in the same order. This is what we do currently:

{
  "message":"Not Found",
  "documentation_url":"https://docs.github.com/rest"
}
gh: Not Found (HTTP 404)
exit status 1

I'm trying to better refactor this so the code is cleaner and easier to maintain (basically, fewer branches in the code and having to pass fewer parameters) but trying to keep this error format is making that more difficult unless I just inline all the code into the for loop.

But, if the error format could be the following, there's a lot cleaner refactorings possible:

gh: Not Found (HTTP 404)
{
  "message":"Not Found",
  "documentation_url":"https://docs.github.com/rest"
}
exit status 1

Frankly, I like this format better anyway. That an error occurred is shown immediately, followed by more details from the server.

go.mod

heaths · 2022-05-24T05:35:30Z

@mislav I've been using this in a few instances and not exhibited any problems. What do you think?

heaths · 2022-05-25T21:12:07Z

darccio/mergo#210 was merged so we can actually pick up v0.3.13 and not need a redirect.

heaths · 2022-06-01T08:16:48Z

@mislav @samcoe what do you think of this? Makes the output of paginated responses much easier to use.

samcoe

@heaths Thanks for your patients on this review. Overall, I like this functionality and it feels straight forward with the mergo package. Just left a couple comments for your consideration.

pkg/cmd/api/api.go

heaths · 2022-06-18T07:16:43Z

@mislav @samcoe this is ready to review. Even before the refactoring, the response body would be read completely into a buffer anyway - and in some cases, multiple times - so I ended up allocating a single bytes.Reader that implements io.ReadSeeker so I could seek to the beginning as needed without additional allocations, which was possible before.

pkg/cmd/api/api.go

samcoe

@heaths This looks good to me. Thanks for the thorough work here.

eddiebergman · 2022-07-16T18:50:46Z

Hi, thanks for the work! If the pr is approved, can we get it merged to fix some downstream issues at pwntester/octo.nvim#302 :)

heaths · 2022-08-04T21:13:35Z

@samcoe @mislav is this good to merge for next release? It would sure be helpful.

vilmibm · 2022-08-22T16:23:55Z

We're still interested in this but wanted consensus; waiting on @mislav 's review

heaths · 2023-03-13T21:22:42Z

@mislav you made mention of this the other day for someone else that logged a dup bug. I'm glad you're still interested in it, but can you help me understand what you'd like me to do to get it over the finish line? I can resolve the merge conflicts, but I've been doing that for many months and it takes time, though less after I rebased and squashed. I'm happy to resolve the conflicts if you're going to take this or otherwise leave comments so I can get this in suitable shape.

For anyone else that comes across this PR and not issue #1268, I used basically the same code here in an extension that will merge after the fact* without having to install utilities like jq that might be harder to acquire for your system e.g., Windows. Run:

gh ext install heaths/gh-merge-json
gh api "{endpoint}" --paginate | gh merge-json # returns valid JSON

* albeit slower, since gh already stringified the response(s), which the extension has jsonify again, merge, and re-stringify.

/cc @samcoe

mislav

Hi, sorry for the late review. I tried multiple times to come up with suggestions on how to potentially make this PR better, but I didn't have good ideas, and I've put it off until later. A lot of time has passed since and here we are. I do really want to see this feature in, but even though I care about this a lot, I also want us to approach the implementation carefully and with as little disruption to the gh api command as possible.

My qualms with the implementation you've proposed here are:

Code size: quite a lot of existing code and non-pagination supporting code paths needed to be modified in order to support this feature.
No streaming support: we used to stream API responses directly from the network to stdout, but this implementation relies on buffering all the responses until all pages are fetched. That means that the client receiving the output of gh api cannot start processing JSON until everything has finished fetching.
Passing entire API responses through a 3rd-party "mergo" library: we used to return API responses unchanged, and now we're significantly massaging them through something which to me is a black box. I feel this is risky, since a bug in mergo can potentially corrupt API responses.
Deep-merging of GraphQL responses: GraphQL queries can also be paginated, but GraphQL queries can also contain multiple arrays in response data. The "mergo" library doesn't know which array is the one that is being paginated, so it will merge them all according to its deep merge strategy.

As part of this review, I wanted to explore an alternate implementation to see if it's even possible to address these points. I came up with #7190, where I'd value your thoughts when you find time. Specifically, the code spike tries to keep the implementation small, have the modifications only affect the --pagination code path, avoid a 3rd party library, and continue streaming API responses to stdout. This is done by wrapping resp.Body in a Reader that strategically omits the leading [ or trailing ] of JSON arrays. This seems to work on my testing, so please try it out! GraphQL pagination is intentionally non-addressed in this exploration since solving it would be non-trivial.

heaths · 2023-03-17T21:17:12Z

I'll take a look at your PR soon. Conceptually, I like the approach and tried to come up with similar that would handle GraphQL, but see my thoughts below.
I appreciate that this buffers responses and that's a behavioral change, but don't understand why a new third-party library is a concern when the vast majority of modules in go.mod also are. It seems to be a solid library with a lot of importers - over 3,200 - on pkg.go.dev. Again, though I appreciate the concern about buffering responses. As a mere suggestion, what about allowing people to opt into paging internally with the caveat responses will no longer be buffered? For the majority of use cases I imagine it would be of little consequence. Or maybe only do it for GraphQL paged responses.

I honestly can't think of any way to do this when an array is nested and you could have multiple arrays. With any nested arrays in a REST response, those would be properties of a resource (an array of resources or not) so only a top-level array would need "merging" and your approach (based on your description) would work fine there.

Alternatively, what about considering a built-in command to optionally buffer responses a la https://github.com/heaths/gh-merge-json? Ideally, output from gh api just works but at least not requiring a separate install may be nice. That said, #7173 simplifies that scenario so it may be moot.

heaths · 2023-03-17T22:19:29Z

Thinking about this more, given the GraphQL paginated code was invalid JSON anyway, does the fact it was streaming output really mattered? Either you're piping to something that is consuming JSON somehow (jq, PowerShell's ConvertFrom-Json, whatever) and would err on invalid JSON, or you're redirecting to a file in which case streaming doesn't really matter anyway.

For simple arrays that only REST could return, your approach in #7173 would seem to solve the problem but what about limiting this to just paginated GraphQL responses? Looking through your PR, I could use a similar approach that would require less refactoring and only impact a limited set of scenarios e.g., a reader (though, since instanced - perhaps passed in, which yours could be too) that under the right conditions does an internal read, unmarshals the JSON, merges, and when done, marshals back out to the original reader on last page.

mislav · 2023-03-28T21:33:39Z

2. but don't understand why a new third-party library is a concern when the vast majority of modules in go.mod also are. It seems to be a solid library with a lot of importers

I'm not insinuating that something is wrong with the library: it's just that gh api used to stream the actual, unmodified GitHub API results directly to stdout, but with this change, all --paginate results will have been deserialized into Go values, proxied through a 3rd-party library not maintained by us, and then serialized back into JSON. That, I feel, is a too radical of a departure from that gh api used to do. Imagine if curl did any of this before returning you the results! So if we could avoid that, I'd be super happy since it would mean that we are not compromising too much of the command's design.

I could use a similar approach that would require less refactoring and only impact a limited set of scenarios e.g., a reader (though, since instanced - perhaps passed in, which yours could be too) that under the right conditions does an internal read, unmarshals the JSON, merges, and when done, marshals back out to the original reader on last page.

That sounds good! I'd be down with that. Would you explore this approach when you have time?

heaths · 2023-03-29T18:21:29Z

That sounds good! I'd be down with that. Would you explore this approach when you have time?

@mislav sure I can explore it if there's an active interest in taking it (I'm certainly interested in solving it), but it might be easier to pick up from your changes in #7190 if you're planning to go ahead with that; and I can either close this PR and open anew, or rebase it entirely so the history is here, but I would then recommend treating it as a completely new PR i.e., don't review the diff.

andyfeller · 2023-11-30T13:46:02Z

@heaths: what makes the most sense regarding this PR? Are you okay with closing it until such a time that either you or the CLI team can prioritize the work?

Given the age and triage efforts to ensure PRs don't grow stale, I appreciate you understanding the desire to ensure the CLI team is staying engaged while having a up to date view of work in flight.

If you want my $0.02, I think closing this and creating a new PR when that work can be prioritized while linking it back here for posterity makes the most sense. 🤷

By the way, nice to meet you and thank you for being a consistent CLI friend and contributor! 👋 ❤️ 🙇

heaths · 2023-11-30T15:26:34Z

What work do you need to prioritize? The PR has been done for ages. Seems the outstanding concern was using a third party module, which is sort of absurd given how many are already in use. This module has decent usage and it's straight forward. It could even be reimplimented, but that's not very idiomatic.

There's no rush of breaking changes here. The output is invalid JSON that would be valid. Any work around like slurping wouldn't break because there'd be nothing to slurp. I see no downside.

I could rebase this and make it conditional for GraphQL.

andyfeller · 2023-12-04T21:56:45Z

Thanks for the follow up, @heaths 🙇

It seems this PR needs prioritization from both sides:

core maintainers end to discuss these changes and how to move forward following Mislav's departure
yours on addressing the conflicting files

There is a lot of history and mentions of the related issue in other places. I admit there is a lot to catch up on here.

heaths · 2023-12-04T22:19:08Z

Would a quick sync over Teams be acceptable (we're on the same network)? Most of the history is just me keeping this up to date (which gets harder with the PR age, hence trying to get buy-in from Mislav) and not getting it reviewed with requested changes for many months, but we could maybe go through this PR and recap quickly.

The quick gist is that Mislav wasn't keen on the mergo dependency I took after some investigation. I think concern about this being a breaking change - given it was already broken - was resolved, but there was still some concern about a behavioral change when logging headers/logs via --verbose since it'd only happen with the first or last (I forget which we settled on off hand) now. Personally, I think that likelihood is low. Given this doesn't emit valid JSON and everyone had to do work arounds (I even wrote an extension to make it easier on non-*nix) I'd be surprised if any were doing verbose logging in automation. Seems all that would likely have been quelled or, at the very least, difficult to associate with any particular request unless people used something like tee. I feel having this emit valid JSON so it's useable "out of the box" would be worth the potential for unlikely but possible breaking behavior changes with corner case behaviors (verbose logging).

EDIT: I sent you a message over Teams in case you wanted to chat briefly tomorrow morning.

andyfeller · 2023-12-08T15:06:31Z

@heaths : sorry for being slow to respond, I'm going to raise this up with @samcoe and @williammartin, seeing if we can make time next week to get on the same page and get this settled 👍

andyfeller · 2024-01-04T16:47:19Z

@heaths : Happy new year! Just making sure to deliver on pre-holidays conversation about keeping an eye on this PR so we can prioritize it and get it shipped. No pressure or anything, just wanting to deliver on the commitment on our end 🫂

heaths · 2024-01-16T06:44:49Z

@andyfeller @samcoe I had a thought. Building on Nate's writer that merges top-level JSON arrays and a discussion we had about something similar for GraphQL, could I create a REST and GraphQL response merger (separate implementations) in github.com/cli/go-gh? Could be generally useful. To maintain streaming for REST responses, I'd more or less keep

cli/pkg/cmd/api/pagination.go

Line 112 in 57f6787

type paginatedArrayReader struct {

. A GraphQL would cache the map until the last page is merged and then flush it to whatever writer was passed in.

I'd develop these in conjunction to test them out - make sure the design is good - but github.com/cli/go-gh would need to get merged first, of course. Thoughts?

samcoe · 2024-01-16T22:04:51Z

@heaths That seems like a good idea to me, I don't know if we will get around to fully automatic pagination but having a helper(s) to merge responses is definitely a step forward for cli/go-gh#23.

Partly resolves cli/cli#1268 and replaces cli/cli#5652

Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.

heaths · 2024-01-25T10:28:06Z

Closing in lieu of #8620. @samcoe @andyfeller that and cli/go-gh#148 are ready for review.

Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.

heaths force-pushed the issue1268 branch from c0ee199 to 3c20993 Compare May 17, 2022 07:07

heaths force-pushed the issue1268 branch 2 times, most recently from e042de9 to a7ca779 Compare May 17, 2022 07:29

heaths marked this pull request as ready for review May 17, 2022 12:45

heaths requested a review from a team as a code owner May 17, 2022 12:45

heaths requested review from mislav and removed request for a team May 17, 2022 12:45

cliAutomation added the external pull request originating outside of the CLI core team label May 17, 2022

cliAutomation added this to Needs review 🤔 in The GitHub CLI May 17, 2022

heaths commented May 19, 2022

View reviewed changes

go.mod Outdated Show resolved Hide resolved

samcoe assigned mislav May 24, 2022

heaths force-pushed the issue1268 branch from 9c5bee9 to 0156091 Compare May 26, 2022 04:43

heaths force-pushed the issue1268 branch from 0156091 to c93e483 Compare June 4, 2022 05:58

samcoe reviewed Jun 7, 2022

View reviewed changes

pkg/cmd/api/api.go Outdated Show resolved Hide resolved

pkg/cmd/api/api.go Outdated Show resolved Hide resolved

pwntester mentioned this pull request Jun 17, 2022

Octo review start fails with large pull requests pwntester/octo.nvim#302

Open

samcoe reviewed Jun 22, 2022

View reviewed changes

pkg/cmd/api/api.go Outdated Show resolved Hide resolved

samcoe approved these changes Jun 27, 2022

View reviewed changes

The GitHub CLI automation moved this from Needs review 🤔 to Needs to be merged 🎉 Jun 27, 2022

laughedelic mentioned this pull request Jul 28, 2022

GraphQL pagination breaks on partial results #6013

Open

vilmibm approved these changes Aug 22, 2022

View reviewed changes

heaths mentioned this pull request Mar 7, 2023

Output a single JSON document from api --paginate #1268

Closed

mislav suggested changes Mar 17, 2023

View reviewed changes

The GitHub CLI automation moved this from Needs to be merged 🎉 to Needs review 🤔 Mar 17, 2023

mislav removed their assignment Jun 9, 2023

andyfeller added the discuss Feature changes that require discussion primarily among the GitHub CLI team label Dec 8, 2023

samcoe mentioned this pull request Dec 11, 2023

--paginate doesn't work well with /enterprises/{enterprise}/consumed-licenses and ?page= #8419

Closed

heaths added a commit to heaths/go-gh that referenced this pull request Jan 25, 2024

Support merging JSON arrays, objects

3df15ac

Partly resolves cli/cli#1268 and replaces cli/cli#5652

heaths mentioned this pull request Jan 25, 2024

Support merging JSON arrays, objects cli/go-gh#148

Closed

heaths added a commit to heaths/cli that referenced this pull request Jan 25, 2024

Merge JSON responses from gh api

543b36e

Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.

heaths mentioned this pull request Jan 25, 2024

Merge JSON responses from gh api #8620

Merged

4 tasks

heaths closed this Jan 25, 2024

heaths added a commit to heaths/cli that referenced this pull request Jan 31, 2024

Merge JSON responses from gh api

238c75b

Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.

heaths added a commit to heaths/cli that referenced this pull request Feb 2, 2024

Merge JSON responses from gh api

a237afc

Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.

heaths added a commit to heaths/cli that referenced this pull request Feb 15, 2024

Merge JSON responses from gh api

523d524

Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.

heaths added a commit to heaths/cli that referenced this pull request Mar 12, 2024

Merge JSON responses from gh api

cad62d3

Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.

heaths added a commit to heaths/cli that referenced this pull request Apr 4, 2024

Merge JSON responses from gh api

8e27e9a

Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge JSON responses from `gh api` #5652

Merge JSON responses from `gh api` #5652

heaths commented May 16, 2022

heaths commented May 16, 2022

heaths commented May 17, 2022 •

edited

heaths commented May 19, 2022

heaths commented May 24, 2022

heaths commented May 25, 2022

heaths commented Jun 1, 2022

samcoe left a comment

heaths commented Jun 18, 2022

samcoe left a comment

eddiebergman commented Jul 16, 2022

heaths commented Aug 4, 2022

vilmibm commented Aug 22, 2022

heaths commented Mar 13, 2023 •

edited

mislav left a comment

heaths commented Mar 17, 2023

heaths commented Mar 17, 2023

mislav commented Mar 28, 2023

heaths commented Mar 29, 2023

andyfeller commented Nov 30, 2023 •

edited

heaths commented Nov 30, 2023 •

edited

andyfeller commented Dec 4, 2023

heaths commented Dec 4, 2023 •

edited

andyfeller commented Dec 8, 2023

andyfeller commented Jan 4, 2024

heaths commented Jan 16, 2024

samcoe commented Jan 16, 2024

heaths commented Jan 25, 2024

Merge JSON responses from gh api #5652

Merge JSON responses from gh api #5652

Conversation

heaths commented May 16, 2022

heaths commented May 16, 2022

heaths commented May 17, 2022 • edited

heaths commented May 19, 2022

heaths commented May 24, 2022

heaths commented May 25, 2022

heaths commented Jun 1, 2022

samcoe left a comment

Choose a reason for hiding this comment

heaths commented Jun 18, 2022

samcoe left a comment

Choose a reason for hiding this comment

eddiebergman commented Jul 16, 2022

heaths commented Aug 4, 2022

vilmibm commented Aug 22, 2022

heaths commented Mar 13, 2023 • edited

mislav left a comment

Choose a reason for hiding this comment

heaths commented Mar 17, 2023

heaths commented Mar 17, 2023

mislav commented Mar 28, 2023

heaths commented Mar 29, 2023

andyfeller commented Nov 30, 2023 • edited

heaths commented Nov 30, 2023 • edited

andyfeller commented Dec 4, 2023

heaths commented Dec 4, 2023 • edited

andyfeller commented Dec 8, 2023

andyfeller commented Jan 4, 2024

heaths commented Jan 16, 2024

samcoe commented Jan 16, 2024

heaths commented Jan 25, 2024

Merge JSON responses from `gh api` #5652

Merge JSON responses from `gh api` #5652

heaths commented May 17, 2022 •

edited

heaths commented Mar 13, 2023 •

edited

andyfeller commented Nov 30, 2023 •

edited

heaths commented Nov 30, 2023 •

edited

heaths commented Dec 4, 2023 •

edited