Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of IPLD in Python #24

Open
dhruvbaldawa opened this issue Sep 2, 2017 · 37 comments
Open

Implementation of IPLD in Python #24

dhruvbaldawa opened this issue Sep 2, 2017 · 37 comments
Labels

Comments

@dhruvbaldawa
Copy link
Member

Hey guys,

I am interested in helping to implement the IPLD spec in Python. It will be great to have some pointers about where to start and if there is any ongoing effort already in place.

Thanks.

following up from ipfs-shipyard/py-ipfs#1 (comment)

@daviddias
Copy link
Member

Hi @dhruvbaldawa, welcome!

The best way to start is to first implement a CID module. See the JS version here:

Then, you will want to be able to have a resolver for a format. Pick one of these two (or even both):

Then, you want to be able to resolve through a Graph. You will need some kind of kv store, that can be shimmed, but the primitives exposed should be same as:

For inspiration, see @jbenet's talk at:

Btw, we have a weekly stand up, check info here:

Let me know if you have questions on the way and have fun!

@dhruvbaldawa
Copy link
Member Author

@diasdavid Thanks for the pointers.

I have a working implementation (with tests) for multibase and multicodec in Python now. Will work on CID implementation next and then polish the code for multibase and multicodec before moving on to other parts of the system.

https://github.com/dhruvbaldawa/python-multibase/
https://github.com/dhruvbaldawa/python-multicodec/

I will try and make it to the weekly stand up as well.
Thanks once again!

@TKorr
Copy link

TKorr commented Sep 4, 2017

you may find these implementations useful to work off, or fork.
https://github.com/tehmaze/python-multihash
https://github.com/fredthomsen/py-multicodec

look forward to seeing progress in the python arena...

@daviddias
Copy link
Member

To improve discoverability? What about moving those repos to the multiformats org?

I've added:

To the Python team https://github.com/orgs/multiformats/teams/python-team/members

Also, for consistency purposes, let's keep py-* as the prefix.

@dhruvbaldawa
Copy link
Member Author

@TKorr thanks for the references, I did refer those repos. But I thought I'd refer the existing JS and Go implementations and try to emulate what they are doing because I don't have the historical context for those projects.

And yes, renaming the repos does make sense, I will do that tonight. Does it make sense to add the implementation repos to the multiformats org?

@daviddias
Copy link
Member

And yes, renaming the repos does make sense, I will do that tonight. Does it make sense to add the implementation repos to the multiformats org?

It does. I'm just going through all the pieces implemented and created teams for every language :)

@daviddias
Copy link
Member

See multiformats/multiformats#42 :)

@dhruvbaldawa
Copy link
Member Author

Update: I have CID implementation!

https://github.com/dhruvbaldawa/py-cid

For the next week, my focus will be adding documentation for multicodec, multihash and CID and add some more test coverage for these projects. As well, I will read about MerkelDAG and go through existing implementations.

@daviddias
Copy link
Member

Awesome work @dhruvbaldawa! I went ahead and created a Python team on IPLD and added you https://github.com/orgs/ipld/teams/python-team/members

@dhruvbaldawa
Copy link
Member Author

Cool, should I just go ahead and move the project into the organization?

@daviddias
Copy link
Member

@dhruvbaldawa
Copy link
Member Author

Cool, can you please add me to the multiformats organization as well?

@daviddias
Copy link
Member

Oh, I thought I already did. Just added you :)

@dhruvbaldawa
Copy link
Member Author

@diasdavid Hi, I can't add integrations for the repos that I have moved into multiformats org. Integrations like readthedocs.org for building and uploading the documentation, it works fine for the repos under IPLD org. Can you please check and add the required permissions?
Thanks!

@daviddias
Copy link
Member

@dhruvbaldawa
Copy link
Member Author

Thanks! Yes, that is it (for now :)). Just out of curiosity, I am adding the repos under Python Team when I create them so will the new repos get these new permissions by default?

@daviddias
Copy link
Member

@dhruvbaldawa almost. Since the default perms for the whole org is read, I then have to change it to write. But to external hooks, you actually need to be an admin and that is also what I did. You are both part of the Python Team and an admin of the modules you maintain.

@fredthomsen
Copy link

Just noticed this now. I see a new python multicodec repo has been created which is fine. Looking mine over I may have not implemented it correctly anyway. Regardless, one good test tool worked is in hypothesis which is great for generating data that meets certain requirements and then that data can be used for round trip testing any sort of function that is two-way ie encode/decode.

@dhruvbaldawa
Copy link
Member Author

@fredthomsen yes, I know about hypothesis for now I am using pytest fixtures to get a similar effect. Actually, I did find a few issues when I did that, for example, bin and base2 shared the same code (which broke my tests) and my implementation was failing for varints with multiple bytes

https://github.com/multiformats/py-multicodec/blob/5707f9d6e28e2880aa2a4a67726cf65cf9f4d2fd/tests/test_multicodec.py#L20

@TKorr
Copy link

TKorr commented Sep 18, 2017

D o people think the https://github.com/ivilata/pymultihash implementation is ready to be added to the multiformats repo too. looks complete, can fork if the creator doesn't want to maintain it anymore.

@dhruvbaldawa
Copy link
Member Author

+1, I think its a really good implementation :)

@daviddias
Copy link
Member

ping @ivilata

@ivilata
Copy link

ivilata commented Sep 19, 2017

@TKorr, please feel free to fork or adapt pymultihash yourself in any way you see fit, since I don't think I'd have the time to do it myself right now. Thank you very much!

@daviddias
Copy link
Member

@ivilata thanks for showing to the thread. Another option is to move the implementation to the multiformats organization and add it to the Python team, this way you and the other can contribute to the same codebase without having to maintain several forks. How does that sound?

@TKorr
Copy link

TKorr commented Sep 19, 2017

sounds good to me. Could you add me to the python team as well.

@daviddias
Copy link
Member

@ivilata
Copy link

ivilata commented Sep 27, 2017

Perfect, thanks!

@aratz-lasa
Copy link

Hello all,

I have started working on the Python version of IPLD. By now, I have done dag-json codec and I will soon finish dag-cbor. However, I am not sure how to modularize these parts, as well as others such as ipld-block.

At first, I though separating all the codecs in different libraries, but I am not sure if it would be a better idea to create a single unified library where all the IPLD parts are included.

PD: I am not sure if there is any special methodologies or procedures I should follow in order to work on Python IPLD.

@rvagg
Copy link
Member

rvagg commented May 15, 2020

Hi @aratz-lasa, good timing for this question because we're currently revisiting some of these questions across the "official" implementations. There are currently two projects to learn from that are experimenting with slightly different approaches, informed by their languages and use environments.

In Go, github.com/ipld/go-ipld-prime, the core dag-cbor and dag-json implementations come bundled with the library but you have to hook them up at init time for them to be used. You get the framework, but the codecs need to be inserted. There's also dag-pb implemented in a separate library that can be wired in the same way: https://github.com/ipld/go-ipld-prime-proto - I can't recall whether there was an intention of merging this into ipld-prime but in Go, since you have to opt-in at init, it doesn't really matter where the implementations are. The important bit is that you can plug in additional implementations as you need them, and someone else could write a new one and load that in too without the permission of the authors of ipld-prime.

In JS, we have a brand new experimental approach that I think will become the new one because it seems to be working out well. Keep in mind that one of the constraints we're trying to deal with in JS is bundle size, and the definitional creep of IPLD and Multiformats means that bundles keep on getting bigger as we include more things. The current js-ipld implementation pulls in all of the standard implementations and assumes you'll need them. But users implementing new things ontop of IPLD are not necessarily going to want the baggage of dag-pb, and dag-json really is just a curiosity for practical purposes, it's more likely that you're going to just want IPLD on dag-cbor and maybe some interoperability with other codecs if you're dealing with IPFS. So you should be able to choose a narrow stack like that for your usecase and not have a bloated bundle full of things you're never going to touch.

https://github.com/multiformats/js-multiformats is the newest work and it's slowly working its way through the rest of our stack, including new implementations of js-dag-cbor, js-dag-json which will all sit behind the js-block API. We're right in the middle of this work right now so it's difficult to show it off. But with the new js-multiformats approach, you get a bare-bones multiformats (multihash, multibase and CID) implementation that doesn't know how to do much on its own. You then load in whichever multibases, multihashes and multicodecs that you need and you have a full system for generating hashes and CIDS from binary data and performing encode/decode operations on that data too. This will all get wrapped in a nice Block primitive that encapsulates a CID+Binary pair (this is all in https://github.com/ipld/js-block but hasn't quite got the new multiformats treatment yet).

But you can see that the language constraints and idiomatic conventions come in to play here. Go gives you much more leeway to bundle things that don't get touched by the compiler. JavaScript punishes you.

You also have to answer the question of how you're instantiating and/or navigating these data structures. JavaScript currently takes the approach of instantiating pure JavaScript object forms of all of the decoded data. While Go has a much more plumbing/porcelain approach to navigating the data where you traverse nodes and fetch out the pieces you want--and in doing so, is able to obscure the Link boundary by letting you back your structures by loaders that will transparently load blocks as required by traversal. Personally I find this Go approach much more compelling for the IPLD vision, but it's certainly much easier to read and write blobs of data in JavaScript and use JS-native navigation techniques to traverse block-local data (I can use proper for loops for arrays, and Object.keys(), Object.values() etc. for objects, plus all of my typeof and other inspection techniques work, so you're not forced into an uncanny valley situation of navigating data strictly through a foreign API).

There's a big push in Rust at the moment too and they're answering these questions too so it might be worthwhile tuning into what they're doing. @vmx.

Also feel free to join us any time for our weekly team call, focused on IPLD and what we're all doing across this landscape. Mainly attended by Protocol Labs folks who are focused on doing this full-time but we also have a number of regular community members joining so we get to bounce around discussions just like this with everyone. Details here: https://github.com/ipld/team-mgmt#weekly-call

@aratz-lasa
Copy link

@rvagg thank you for your insightful explanation. I have been checking the new JS multiformats approach, and I think it is really elegant and modular. I will think about it, but at first, it looks like Python should follow this approach, instead of Go's. However, should multicodec be responsible of encoding/decoding? Is not just an "agreed-upon codec table", without further logic than adding/removing codec prefixes? (Maybe this is not the right place to ask it, sorry).

Also, I do agree Go's blocks traversal is closer to the IPLD vision, than the JS one. But, I was not aware of these differences. Thank you for explaining it, you saved me a lot of reading time.

I think I will start joining the weekly calls, in order to get a deeper understanding. So, thank you for your invitation!

Lastly, in case I want to publish a repo (codecs or future cid-block) how should I proceed?

@vmx
Copy link
Member

vmx commented May 15, 2020

@aratz-lasa I'd also like to point you to https://github.com/ipld/specs/tree/acf08c3c02f8eecd85f9077f8d403bdb941d5da7/design/libraries which could be an interesting read. It has a bit of Go perspective, but is also interesting for other languages.

Lastly, in case I want to publish a repo (codecs or future cid-block) how should I proceed?

I think for now it's easiest to publish them on your GitHub account and we link to them.

@aratz-lasa
Copy link

@aratz-lasa I'd also like to point you to https://github.com/ipld/specs/tree/acf08c3c02f8eecd85f9077f8d403bdb941d5da7/design/libraries which could be an interesting read. It has a bit of Go perspective, but is also interesting for other languages.

Indeed, it is very useful, thank you @vmx .

I think for now it's easiest to publish them on your GitHub account and we link to them.

Perfect.

@MLDovakin
Copy link

MLDovakin commented Apr 16, 2021

@daviddias
Hi, I'm new to distributed systems, is it possible to upload a file hashed by the Python library to ipfs via ipfs desktop?
Is this library officially used for ipfs in Python? in it you can interact with ipfs desktop?
https://libraries.io/pypi/ipfs-api

@monperrus
Copy link
Contributor

FTR, https://github.com/hashberg-io/dag-cbor implements dag-cbor in Python

@monperrus
Copy link
Contributor

I have done dag-json codec

@aratz-lasa that's super useful! I cannot find it under your Github. Maybe you put that code under another organization? Thanks!

@snarfed
Copy link
Contributor

snarfed commented Apr 24, 2023

https://github.com/hashberg-io/multiformats/ and https://github.com/hashberg-io/dag-cbor are great! I published the first release of https://github.com/snarfed/dag-json just now, which uses their multiformats.CID.

@monperrus
Copy link
Contributor

@snarfed that's great, thanks a lot! I've installed it and tested the code snippets from the README, it works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests