Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.10 dag API changes breaking HTTP API (discussion for resolution) #8415

Closed
rvagg opened this issue Sep 8, 2021 · 10 comments · Fixed by #8439
Closed

0.10 dag API changes breaking HTTP API (discussion for resolution) #8415

rvagg opened this issue Sep 8, 2021 · 10 comments · Fixed by #8439
Assignees
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization
Milestone

Comments

@rvagg
Copy link
Member

rvagg commented Sep 8, 2021

Surfaced via @tabcat's work @ orbitdb/orbitdb#909, picked up in failing tests @ https://github.com/ipfs/js-ipfs/blob/870d446f/packages/ipfs-http-client/test/dag.spec.js when running against the go-ipfs@next package which is the 0.10.0-rc1 binary.

In summary - the changes we made are fine, there's no error, but we may have gone harder than we should have, or intended to and there are ways we could bridge the breakage for users. @tabcat says they're moving to the block API exclusively, which is fine and probably suits the workflow there, but the dag API is where we've put all the new goodness and users avoiding it suggests that maybe we could do better.

  1. We've lost cid-version for dag put, or at least it doesn't work.

Looking through history I'm wondering if we ever had a cid-version for the dag API? It looks like it's only valid for the add and files API. Does this mean that https://github.com/ipfs/js-ipfs/blob/870d446f/packages/ipfs-http-client/test/dag.spec.js#L33 is incorrect and is only working because it's defaulting to v0 for a dag-pb block?

Maybe the resolution here is to be clear in the CHANGELOG that the CIDs for the dag API are always v1? I don't see a note for that anywhere. Alternatively we could just introduce it there and allow it to be used for the case where format=dag-pb so people can get their beloved CIDv0's.

  1. The git-raw codec is included by default now.

This test https://github.com/ipfs/js-ipfs/blob/870d446f/packages/ipfs-http-client/test/dag.spec.js#L77 assumes it's not available, but it gets an answer. Are we intending to enable the plugin by default now?

  1. The changes in meaning of input-enc and format might make more impact than we intend.

We now have the very clear and crisp meaning that input-enc means the codec used to decode the bytes that you send in, and format is the codec that it should use to re-encode the node after it's been decoded. The two arguments are independent variables now. Prior to this update input-enc was more subtle and tied to the format argument. It meant something like "I'm giving you data that I want stored in format codec, and I'm either giving you the raw bytes of the data already in that codec, or a json form that will unmarshall into a node that you can then encode in that codec". That's about it. Each of the raw, cbor and protobuf options for input-enc simply say "I'm giving you raw bytes" and then narrow down your format options to match (except raw which has the full flexibility). The json input-enc option does its unmarshall routine, with its pseudo-dag-json scheme.

So, if you look at https://github.com/ipfs/js-ipfs/blob/870d446f/packages/ipfs-http-client/test/dag.spec.js again you'll see that behaviour in action. input-enc=raw format=dag-cbor means "I'm giving you some raw dag-cbor encoded bytes that I expect you to store". But now we're saying that it means "I'm giving you raw data, which is just a single byte array, that you should encode as a dag-cbor, i.e. a single byte array prefixed with a CBOR tag.".

Possible actions with this could be:

a. Be much more clear in our documentation about this change and what it means now.
b. Deprecate input-enc and introduce a new input-codec or input-format to do what we're now doing with it. We just assume that the bytes being provided can be passes straight to a decoder for input-codec unless input-enc is json, in which case we have a conflict and either override with input-codec if supplied, or if not supplied, set input-codec to dag-json. I think that would give a pretty clean upgrade path without breaking existing use-cases.

@aschmahmann @achingbrain @warpfork @willscott

@rvagg rvagg added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Sep 8, 2021
@ipfs ipfs deleted a comment from welcome bot Sep 8, 2021
@rvagg
Copy link
Member Author

rvagg commented Sep 8, 2021

#8416 has option (b) for issue (3)

@willscott
Copy link
Contributor

For (3) I'm not convinced that the changed names is great either, because it introduces a second breaking change of that API. It seems easier to explain the 1 change, rather than making people who are using this functionality to migrate twice.

For (1) I think we should continue pushing for cid v1 as the standard.

@rvagg
Copy link
Member Author

rvagg commented Sep 9, 2021

it introduces a second breaking change of that API

I think the approach I'm suggesting with #8416 reduces the breakage surface area here to nearly nothing rather than introducing something else that breaks. Keep the old behaviours in place where they are exclusively used but add affordances for new behaviour and encourage (via docs and perhaps hiding the CLI option) use of the new pattern. The question of removing deprecated APIs I think is separate and I see a lot of them in here already so I don't imagine that they're going to be gone any time soon.

I don't have a very strong preference for #8416, it'd just be nice to avoid impacting users so overtly and now we know that there are common call paths that are going to break, maybe we have the opportunity to be a bit more gentle. Not my call, just presenting options.

@willscott
Copy link
Contributor

(i think you can ignore this point towards (3). I had incorrectly assumed v0.10 was already final, but it sounds like it's still in release candidate mode. I'm okay either way given that.)

@aschmahmann
Copy link
Contributor

@tabcat says they're moving to the block API exclusively

Something to note here is that we will end up making some breaking changes to the block API at some point having to do with the names being incorrect/inconsistent.

e.g. protobuf instead of dag-pb, or more terribly cbor instead of dag-cbor and json instead of dag-json.

We've lost cid-version for dag put, or at least it doesn't work.

I don't see a good reason to keep the CID version. If someone really wants to convert a CIDv1 to a CIDv0 they can use the ipfs cid command. If someone asks for this we can consider adding it back, but I don't think it's worth doing preemptively.

The git-raw codec is included by default now.

Yep. We now allow all IPLD codecs to be used as input and output formats. Should be documented as a new feature.

The changes in meaning of input-enc and format might make more impact than we intend.

I see the argument here, although I'm a little concerned that while we could make things "work" it'll end up being more confusing.

For example, we previously had ipfs dag put --input-enc=json --format=cbor taking json-ish data and interpretting them as dag-cbor. This command still fails since 1) we've switched from json to dag-json 2) we've changed cbor to mean cbor and dag-cbor to mean dag-cbor.

So IIUC this change only helps if:

  1. The input encoding was not json (since otherwise an error is better as the user is hopefully encouraged to change their inputs to be dag-json if they switch to typing in dag-json)
  2. The format is not cbor or protobuf, since we've changed the names to be dag-cbor and dag-pb

So is this only helping people who previously had ipfs dag put --input-enc=raw --format=dag-cbor (since we previously had cbor and dag-cbor both mean dag-cbor)?

Basically, what I'm thinking is that given all the changes made in the ipfs dag commands during the release it's better to be as loud/erroring as possible so that users check out the release notes. We may also want to make the release notes have some more prescriptive "if you were doing X, do Y instead" as opposed to only describing the changes more abstractly.

On that note, one of the commands I'm currently worried about is ipfs dag put --input-enc=json --format=cbor which will change how the data is ingested without causing any noise. I'm not sure if there's anything reasonable to be done about that change though.

WDYT @rvagg

@rvagg
Copy link
Member Author

rvagg commented Sep 10, 2021

WDIT? Diabolical choices! I think this really depends on expected user behaviour currently in the wild, which I have no data on, nor much experience with. It's just interesting to see OribitDB running into this and our own js-ipfs-http-client doing it too, suggesting that maybe it's not that rare?

Given the patterns I've seen in both of those, I'm imagining --input-enc=raw to be more common. It's even the default in js-ipfs-http-client because it's expected that users are constructing their own data to be ingested. But that's the one where the breakage will hurt the most. The json encoding will break if you were previously writing bytes the plain base64 Go way, or were writing dag-pb data in the older ProtoNode way. Again, my gut feel there is that it's not going to be as common that users are conforming to these standards just to write json data and would default to client-side encoding. (Again, I really have no idea about what's in the wild here!) So if raw is the most common, then we're on the path of maximal blind breakage. We've changed from saying "here are my bytes for the --format block" to "here's a single Bytes IPLD node that I want you to encode as --format"—which is really not something that most users are going to be wanting unless it's --format=raw. But if you're ingesting a raw block are you really reaching for the dag API and not the block API?

Of course one option here is to just properly break it and rename --input-enc entirely so users notice the break and have to fix, and in doing so have to figure out the how and why. That might be better than blind breakage that's only going to be picked up if you have good enough integration tests.

@aschmahmann
Copy link
Contributor

@rvagg that's a good point about the breaking nature of --input-enc=raw in addition to the JSON breakages.

It seems like it might be a better idea to just replace --input-enc with --input-codec and --format with --store-codec. This will break all existing users except for people who are using ipfs dag put with no arguments and using the older JSON-ish format.

@BigLep
Copy link
Contributor

BigLep commented Sep 14, 2021

@aschmahmann : I agree with your proposal as it causes folks to fail loud/fast/hard.

@rvagg
Copy link
Member Author

rvagg commented Sep 14, 2021

+1 to --input-codec and --store-codec, avoid most of the hidden failure cases

This will break all existing users except for people who are using ipfs dag put with no arguments and using the older JSON-ish format

Except in the case of bytes, which now uses the dag-json bytes layout. Probably worth making that clear. There's also the edges with numbers—where we now have differentiation between integers and floats, v0.9.1 and prior would interpret all json input numbers as floats. I'm not sure we documented that in the changelog yet either.

The only other thing to consider is the ipfs-http-client stuff; it'll be a hard break there. I know js-ipfs-http-client hard-wires defaults for both input-enc and format, so there'll be a version break for people trying to use it and the newer go-ipfs. I'm not sure that can be easily finessed with version sniffing or not but it might be awkward.

@lidel
Copy link
Member

lidel commented Sep 14, 2021

cc @warpfork for visibility

Re: producing CIDv1 +1

I am also +1 for renaming and forcing people who would be impacted to notice the change. "
Less unknowns, better developer experience.

  • if we want to rename them to *-codec – does this mean it will be strict so we only allow codec names from this table in this API?
    • if so, we should be making it clear in --help text and inform user how to get a list of all available formats via ipfs cid codecs --implemented or similar – this also helps with (2)
    • if we plan to allow non-codec formats, then we should go with *-format
  • I am not sure how popular dag commands are over http api anyway– anecdotally, we probably should not worry about slight window of inconsistency, because js and go were already incompatible and if folks wanted something that works cross lang boundary, they used block or object apis instead (cc @achingbrain if there is more depth to this)

@BigLep BigLep added this to the go-ipfs 0.10 milestone Sep 23, 2021
@BigLep BigLep added this to In Review in Go IPFS Roadmap Sep 23, 2021
@BigLep BigLep linked a pull request Sep 23, 2021 that will close this issue
Go IPFS Roadmap automation moved this from In Review to Done Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants