Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ask w3c/activitypub how to extend the activitypub protocol #2

Open
ppwfx opened this issue Jun 5, 2018 · 39 comments
Open

ask w3c/activitypub how to extend the activitypub protocol #2

ppwfx opened this issue Jun 5, 2018 · 39 comments

Comments

@ppwfx
Copy link
Member

ppwfx commented Jun 5, 2018

maybe it would be a good idea to ask the activitypub authors for general guidelines on how to extend the protocol, to make sure it won't end in a big mess

@EorlBruder
Copy link

Cwebber chimed in on the issues of gogs and gitea offering to answer questions: gogs/gogs#4437 (comment)
That could be a starting point.

@jonasfranz
Copy link

@EorlBruder I've already chatted with @cwebber and he gave me some tips of how to implement ActivityPub for gitea. He recommended for example to use ocap-ld. For more details, please checkout the chat protocol: https://chat.indieweb.org/social/2018-06-04

@ppwfx
Copy link
Member Author

ppwfx commented Jun 5, 2018

https://www.w3.org/TR/activitypub/#Overview

It's likely that ActivityStreams already includes all the vocabulary you need, but even if it doesn't, ActivityStreams can be extended via [JSON-LD].

@yookoala
Copy link

yookoala commented Jun 5, 2018

I initially thought Fork and PR can be defined as types of Activity.

If you think about how we usually work with git services, we actually subscribe to repositories and branches. Not users. Which is a bit different from the model as described in ActivityPub. If a PR or a Fork is just an activity one issue I can think of is the lack of description for a git service to act upon. It also make it hard for repository / branch updates (e.g. new push, delete) propagate.

If a repository / branch is more like an actor, then the whole protocol can be repository / branch subscribing each other for updates. The protocol seems to be more natural.

Just my opinion at the moment.

@NoraCodes
Copy link

NoraCodes commented Jun 5, 2018

I agree that it makes more sense for repositories to be actors. Then, for instance, opening an issue would be modeled as the repository opening the issue, with a separate metadata field for the username of the user that opened it.

Then, a repo would be repo_name@host.name, which also provides a convenient format by which users can set their upstreams: git remote add whatever gitpub://repo_name@host.name. Git's remote formats are already extensible, so we could trivially provide a module which would look up the preferred method of communicating with that GitPub remote (ssh, https, git-transport, gittorrent, etc).

In this way, "users" (individual contributors) become a non-entity in the GitPub AP model. This is basically fine as they are tracked by Git, and we can tag issues and PRs separately, but it does raise questions around e.g. behavior control (reporting, blocking) and "social" features, such as providing a GitHub like "follow this user" feature.

@Unip0rn
Copy link

Unip0rn commented Jun 5, 2018

How about treating Repos like "virtual" users. That way we get notifications about stuff happening at no cost.

That would also take into account what @yookoala wrote about the usual subscribe-habits on git services.

Also we could implement branches as the equivalent to threads in communications. the commit message could be the CW, the diff would come as "body" of the message (should I explain why I find that useful?).

Still no cost.

Then Issues.... Messages with mention of the repo-handle without a CW? I kinda dislike that idea since it would mean you cannot CW a issue what would be useful e.g. when writing the issue from an account that usually interacts with non-techie people that don't like to read long issues.

Just some thought and I always like to learn why I am wrong

@ppwfx
Copy link
Member Author

ppwfx commented Jun 5, 2018

@LeoTindall

what's the benefit of adding gitpub: as new protocol to git? In my understanding gitpub is will simply be a protocol that is utilized to standardize the exchange of meta data for repos

I think the way uris of repos look like should be stay as familiar as possible as it will be easier to adapt and a lot of tooling is relying on it, e.g. golang import doesn't support that format

a

plus I think it makes sense to have the user that owns the repo in the uri too

@jaywink
Copy link
Member

jaywink commented Jun 5, 2018

I do have to disagree on not representing users as proper AP actors. There is no reason not to also represent repositories, organizations and such as actors too - that is all fine in the AP/ActivityStreams2 world. In fact, it has multiple different types of actors (and if something is missing - extensions to the rescue!)

An actor is something that does an activity. Users can do activities, like liking a repository, following a person. Repositories can do activities, but also organizations can do activities, like "create a repository".

Then, a repo would be repo_name@host.name

In AP, actor ID's are URL's, I would encourage sticking to this for compatibility. A repo would then for example be https://host.name/repo_owner/repo_name.

The problem IMHO something like GitPub should solve is:

  • how to notify repo owner of a PR -> send an activity!
  • how to notify PR owner (thread basically) of a comment -> send an activity and the repo owner then forwards it
  • how to notify a PR creator the repo owner merged it -> send an activity!

What we're dealing with is not any more difficult or special than social networking content. We're dealing with PR's (= posts), comments (= err.. comments), stars/likes (= stars/likes). What really IMHO only requires extensions is a pull request/merge request object and activities related to them. Any addition metadata can be attached to the activities/objects as needed via extensions.

@yookoala
Copy link

yookoala commented Jun 6, 2018

I do not oppose to the notion "user being an actor". It would be interesting to see all Git web service join the federverse in long run (e.g we can subscribe to a gitlab account activity from pump.io or mastodon). Also it comes natural to federate comments, posts and reactions with existing AP spec.

My comment meant to describe how PR and Fork activity should be described. I think if a protocol like GitPub should pick up momentum, it should be simple and specific to its own problem domain.

So if something has already been specified in AP (or GitPubSub) and work well, a spec like GitPub do not need to specify it at all. It can simply refer to those existing spec. That way, we might also leverage on existing libraries and ease the pain in the implementation process.

@arucard21
Copy link

I also think that this should be implemented by simply providing additional Object types (and possibly Actor types) that can be used with the standard ActivityPub protocol. This should be sufficient for creating forks and merge requests, as explained by @jaywink. As such, I think we should not be defining a GitPub protocol and perhaps stop calling it that to avoid confusion. This project should most likely just provide these additional Object types.

I also see some different things being considered that may not be relevant. So it might be a good idea to define what exactly needs to be possible with federation, and what doesn't. As I understand it, forks and merge requests are entirely limited to web frontends for git repositories. This has nothing to do with git (the versioning control system) itself.

I see these 3 user stories that would need to be made possible here:

  1. A person can fork a code repository on a web frontend hosted on one server to the web frontend on another server
  2. A person can submit a merge request from the web frontend hosted on one server, containing a forked repository, to the web frontend hosted on another server, containing the parent repository.
  3. A person can star/like a repository on one server to receive updates and notifications about them in another server.

As I understand it, 1 requires the addition of a Fork Object type, 2 requires the addition of both a Fork and a Merge Request Object type, and 3 requires the addition of a Repository Actor type. Of course, many details would still need to ironed out but this might help provide focus to the discussions about them.

@ppwfx
Copy link
Member Author

ppwfx commented Jun 6, 2018

@arucard21

Issues would then simply be modeled as Note and comments as Note with a inReplyTo property?

@neithernut
Copy link
Member

neithernut commented Jun 6, 2018

@21stio The question is whether you really want to tackle issues right now: you'd have to find a common ground for semantics, metadata representation and workflows.

Also, solutions for issue tracking inside the repo already exist, e.g. git-issues and git-dit (of which I am a co-author). There's an entire distributed-issue-tracking community: https://dist-bugs.branchable.com/

AFAIK most of those solutions are not production ready, or lack a mechanism to properly propagate issues. The latter could be provided by this project, which is one of the reasons why I'm interested in it as a maintainer of git-dit.

Btw: there's also git-appraise, which is a system for reviewing merge-requests.

@bill-auger
Copy link
Member

bill-auger commented Jun 6, 2018

allow me to jump in here briefly - only because i notice that most of the participants of this thread are not represented on the other threads on this repo where i have been in discussion so far - the ideas here seem to be flying at a frantic pace, although this group is less than a day old; so i would like to offer some external context

there is no reason to presume that a pull request or any of the actions mentioned here are somehow intrinsically "webby" activities - it only happens to be so today - for example: the anatomy of a pull request is a .patch file that is associated with source repo URL and a destination repo URL, perhaps with a topic and additional comments (and o/c the git commit log can contain those intrinsically) - all of that data could be trivially transmitted in plain text via email, for example - and in fact, that is exactly how merge requests were done before git

after reading this thread, i get the impression that the main goal here is to produce "activity streams" to be consumed by "social" relay services such as mastadon and friends - there is no need to constrain the scope of applicability to web pages nor to "social" use-cases - if done properly this is just an API that can be fully implemented in a wide variety of clients such as email, command line, and native desktop applications - if you are thinking this will only be applicable to websites, you are not thinking grand enough

there is a lot of excitement here which is great, but i should point out that this is no new idea - people have been discussing this topic for a long time and there are already some design documents describing not a protocol, but a complete federated hosting solution accommodating multiple types of clients; with activity-pub being added to the discussion only recently as one possible communications format - these extensions to activity-pub are really the least of the work that needs to happen in regards to a complete system - the communication protocol is discussed very little in those existing documents as it is among the least time-consuming tasks and one that really only needs to be done after some working server and client exists - the idea of "social activity streams" came as an after-though, as a bonus feature that activity-pub would provide; but that is far from the most important feature - the primary, over-arching goal of federating your project should be to collaborate on software without relying on third-party hosts, not merely to announce you activities - im not sure if anyone else here has looked at it that way; so i had to "throw that out there" for yas

i would hope that people here would take a look at the work that has already been done on the notabug-2.0 and vervis projects; if only for inspiration of what is possible and desirable outside the constraints of the web browser

https://notabug.org/NotABug.org/notabug-2.0

@bill-auger
Copy link
Member

bill-auger commented Jun 6, 2018

to the comment above:

I think it makes sense to have the user that owns the repo in the uri too

that is a github-ism - not all forges have a concept of repo "owner" or "namespace" - sourceforge and pague are 2 notable examples where the repo is the top-level atom and users are entirely orthogonal

@jaywink
Copy link
Member

jaywink commented Jun 6, 2018

Issues would then simply be modeled as Note and comments as Note with a inReplyTo property?

IMHO maybe avoid using Note in a too generic way, even though it would be easy. Maybe another object type that extends Note would make more sense (Issue?) since otherwise someone from Mastodon would flood your code hosting platform with microblogging posts :) If a separate object type is used, it is easier to deal with the object properly since it is known what it is, not just what kind of data it contains.

Edit: but yes a comment could be a Note with inReplyTo 👍

@bill-auger
Copy link
Member

bill-auger commented Jun 6, 2018

regarding the concept of "actors" - i do not know the first thing about activity-pub but a repo is not an actor in any semantic sense, because a repo can not initiate any actions - all actions in the system are initiated by users either directly (e.g. pressing a button) or indirectly (e.g. git push) and repos are the targets of most such actions - but users are also the targets of some actions (such as "follow" and "mention") so users are certainly "actors" in that they are the only initiators of actions and are also the taget of certain actions initiated by other users

perhaps i am just confusing your nomenclature and perhaps repos need to be "actors" for some technical reason; but that above is the common sense description of the real-world agents and events - object-oriented systems are supposed to model those common sense descriptions

@yookoala
Copy link

yookoala commented Jun 6, 2018

@bill-auger: According to the ActivityStream's Vocabulary spec, Actors are "Object types that are capable of performing activities". They have defined these core actor types:

  • Application
  • Group
  • Organization
  • Person
  • Service

Person is only one of them. And you also got Application and Service here.

Being an actor in ActivityPub also means you have an inbox and outbox for a proper pub-sub to happen.

If you think about it, a repository is the result of a series of activities (e.g push, force push. rebase, squash, remove) A branch also. Although those activities are done by different users, we're more interested in subscribing the repository or the branch instead of the user who did that. It is trivial to have the same activities on the feed of those users, but the repositories and branches should be subscribable.

An in the case of PR and Fork. What's usually happen is someone updates the downstream branch, the upstream needs an update. Especially for PRs, which case upstream would want to know that before merging. So downstream branch should be able to report on their activities to upstream repository / branch (and not reporting all activities of that user, nor reporting to the upstream owner).

That's what I meant for repository and branch being actor.

@yookoala
Copy link

yookoala commented Jun 6, 2018

We might also describe commits as activity that involves 2 actor: the User and the Branch. So you may select which one to subscribe to if needed.

@bill-auger
Copy link
Member

yes im quite sure i am just confusing the words - actor in the networking sense is like a "channel" with an input and an output - a repo probably does not need an output but it needs an input - a user needs both

i only remarked that because someone said a repo should be an actor but a user should not - that did not jive with me

@arucard21
Copy link

@21stio

Issues would then simply be modeled as Note and comments as Note with a inReplyTo property?

I think issues should be left out-of-scope for now. While included in some web frontends, they are not a core part of the git collaboration workflow. Please note that I'm not saying that issue tracking isn't important, I'm saying that it's an entirely separate concern. While important, it would complicate matters significantly to include this in a first attempt to get federated git collaboration.

@bill-auger

i would hope that people here would take a look at the work that has already been done on the notabug-2.0 and vervis projects; if only for inspiration of what is possible and desirable outside the constraints of the web browser

I took a quick look at this project and it seems to be focused on an entirely separate implementation. I think the knowledge and experience from that project is useful, but they seem to be different approaches. But I also don't think this discussion is focused on "webby" activities. I think this is something that will impact the web frontends, but what we are discussing is how to use ActivityPub for this. This should result in a (HTTP-based) API that can be used from the web frontends but also from any other clients or platforms. This is also what you describe as "if done correctly" so perhaps we're just misunderstanding each other here.

@yookoala

According to the ActivityStream's Vocabulary spec, Actors are "Object types that are capable of performing activities". [...] Person is only one of them. And you also got Application and Service here.

I agree that since Service is considered one of the core Actor types, it seems natural to consider a Repository an Actor type too. I'm not sure if it's needed to have Branches as Actor type as well. I think federation does not need to happen on the branch or commit level. That might make the API too "chatty" which will cause problems with network traffic and scalability. If you wish to receive updates about a specific branch, you can still just subscribe to the repository and then filter locally on just accepting updates about that branch. Implementation-wise, it may make more sense to have the server (as Application Actor) or Fork (as Repository Actor) subscribe to the upstream repository (as Repository Actor). That way any updates on the upstream repo can be synced to the forked repo, and users can be notified of this through the normal notification system of their local server.

As a first specification, I think it makes sense to start with the simplest user story. This allows more focused discussion and highlights many general problems early on, without having to deal with the complexity of all the other functionality you might want to implement. In this case, I think the simplest user story would be:
A person can star/like a repository on one server to receive updates and notifications about them in another server.

A big question here might be whether to do this client-to-server or server-to-server. Client to server seems to fit best, with the user following a repository on another server, but this might cause too much traffic. But this is a problem ActivityPub would have on social media as well, with many users following many other users. Perhaps there already is some way to deal with this.

@bill-auger
Copy link
Member

arucard21 - im not sure why you would say "different approaches" - that could be making exactly my point for me

i am saying that this should be much more that an "activity stream" to be consumed by social websites and placed on your "activity wall" or tweeted to your fans - like "john just starred a repo!!!" - "dave just pushed a commit" - those are the "webby" concerns - there is no other use case - and frankly i find them annoying

this should be a much grander endeavour to create a federated network of complete forges - activity streams are the least interesting aspect of that - just an adornment that could totally be omitted without losing an iota of awesomeness

@yookoala
Copy link

yookoala commented Jun 7, 2018

@arucard21: I thought of the 2 major case like this:

Forking

  1. Ken has a repository "foobar/hello" on Server A.
  2. Lennon wants to fork this repository. He has an account on Server B with the "lennon" namespace and he want to fork to be "lennon/hello" on Server B.
  3. Lennon instruct Server B to fork "foobar/hello" on Server A by providing the source repository URL.
  4. Server B discover the repository's git / https / ssh endpoint for cloning. It cloned the repository and setup "lennon/hello" with all the branches from "foobar/hello".
  5. For the sack of politeness, Server B tell Server A and say "Hey, we have a fork of your foobar/hello at my lennon/hello. You may subscribe the changes here with the information attached here".
  6. Server B might subscribe to "foobar/hello" and do something about it. Totally optional.
  7. Server A might display an other link to Server B. Or even subscribe to Server B's "lennon/hello". Totally optional.

Notes

  • Server B might display a back link on Server A. That is not concerns the protocol.

Pull Request

  1. Lennon made some awesome changes on his "lennon/hello" repository on Server B. He made it available on the "awesome" branch there.
  2. Lennon say to Server B, "hey, I want to send a PR to upstream server".
  3. Server B, "Ho. I remember. It is foobar/hello on Server A, right? If not, please tell me."
  4. Lennon, "That is correct. Thanks for asking."
  5. Server B ask Server A, "Hey pal. What are the branches on your foobar/hello?"
  6. Server A shown Server B.
  7. Server B ask Lennon, "Alright, here are the branches on foobar/hello on Server A. Which base branch do you want?"
  8. Lennon, "Mmm... I'd want to base on master".
  9. Server B might show Lennon the diff. Or not. It'll say, "OK. I'll tell Server A. Stay tuned."
  10. Server B tell Server A, "Hello. My user Lennon would want to send you a PR. Here is the URL of the branch that we're talking about."
  11. Server A, "OK. One moment" It checks the repository and branch, or not.
  12. Server A check Ken's setting or its own policy. It found that it should not create the PR right the way. It ask Ken, "Hey, there is an in coming PR and this is the branch's URL." It might also give Ken a preview, or not.
  13. Ken, "Seems fine. I agree for you to create a PR here. But do not accept it just yet."
  14. Server A tell Server B, "OK. The PR is created here."
  15. Server B tell Lennon, "OK, Here is the PR address. Go see it on Server A. Remember to follow their rules. Enjoy"
  16. Now. Lennon and Ken may have email conversation, or talks over Server A. Lennon may also need to create an account on Server A to reply Ken's comment. These does not concern the PR protocol itself.
  17. As requested by Ken, Lennon done several updates on Server B lennon/hello. Server B would tell Server A about these new pushes / remove / rebase. It would tell Server B if Lennon simply gave up and removed the branch. But Lennon persisted in the PR.
  18. Server A knows about the changes at the PR downstream branch. It might display to Ken.
  19. Finally, Ken agree that the PR is ready. Ken merged it and closed the PR from upstream.
  20. Server A tells Server B, "Seems the PR is closed. Will let you know otherwise. I know where to find you anyway."
  21. Server B trusts Server A and said nothing back.
  22. Server B tells Lennon, "The PR has been merged. Congrat."

@yookoala
Copy link

yookoala commented Jun 7, 2018

@arucard21: In the Pull Request flow above, since changes are done on a specific branch, and not entire repository. So it seems only intuitive to also make branch an actor.

@arucard21
Copy link

arucard21 commented Jun 7, 2018

@bill-auger

i am saying that this should be much more that an "activity stream" to be consumed by social websites and placed on your "activity wall" or tweeted to your fans - like "john just starred a repo!!!" - "dave just pushed a commit" - those are the "webby" concerns - there is no other use case - and frankly i find them annoying

I don't think this is the case here. We're looking at ActivityPub which is built using ActivityStreams. It actually allows much more than just this kind of stuff that would show up in an "activity wall" type thing. ActivityPub was created for social networking but essentially supports federation of any kind of content, by using custom Object, Actor or Activity types. So a lot of the concerns about federation in general will already be thought of and handled by using ActivityPub. I think that NotABug differs in this aspect, since it tries to think of and handle all these concerns by itself. This may end up with a more suitable and streamlined protocol, but will likely take much more effort to complete. I hope this clarifies things a bit, and if I still misunderstand you, let me know.

@arucard21
Copy link

arucard21 commented Jun 7, 2018

@yookoala
I actually think that Forking would be done in the exact opposite workflow. That fits the ActivityPub workflow more closely.

  1. Lennon access the Ken's repository's web frontend on server A.
    • This already implies Lennon is allowed to access Ken's repo (to some extent)
  2. Lennon click the Fork button (or some "Fork to my machine" variant of the button) for the repo, which now asks you where to fork this to.
    • You will also need to authenticate as a user on Server B, possibly using OAuth 2.0. This ensures that you have the permissions to create the fork on Server B for this Lennon.
  3. The "Repository" Actor on Server A sends a "Create" Activity for a "Fork" Object to Lennon on Server B.
    • The "Fork" object needs to contain all the information needed to actually create a fork.
  4. The fork of the repo is now available on Server B, including where it came from (this is part of the information needed to create the fork).

Subscribing to the repo on Server A can be done by Server B. It has all the information required to do this. But this is not necessary for forks. A fork doesn't get any updates from its parent. This is only done through git, but pulling from upstream and pushing to your own fork. So once forked, the parent and its fork do not need to be synchronized. The only time this "parent" information is used again, is when you want to create a Pull/Merge Request from your fork to the parent.

For the Pull/Merge Requests, I think it makes more sense to generate the diff (or a patch) on Server B and send that to Server A. In ActivityPub, this would result in Lennon sending a "Create" Activity with a "Pull/Merge Request" Object to the parent "Repository" Object on Server A. This allows for very simple communication where you don't have to keep track of too many things.

Implementation-wise, this may require Server B to pull the latest commits of the branch it needs to compare against from Server A. So perhaps Server B can create a branch (e.g. "federated_master") on the fork, pull the latest commits to it, generate the diff, then delete the branch again. Hopefully, there are even better ways to do this. This is just a first-guess to indicate that there might be some more complexity, but it should be solvable.

@Arkanosis
Copy link
Member

I like the idea of starting from the server A. But then:

The "Repository" Actor on Server A sends a "Create" Activity for a "Fork" Object to Lennon on Server B.

How does Server B know that Server A is allowed to create that particular Fork object? If Lennon has had to tell it first, then it already has everything it needs to know without Server A telling it. Also, what if Server A refuses to tell Server B anything? My opinion is that this shouldn't prevent Server B from forking.

A fork doesn't get any updates from its parent.

While not necessary, I like very much the GitHub feature showing: “This branch is x commits ahead, y commits behind upstream:master.”

@yookoala
Copy link

yookoala commented Jun 7, 2018

How does Server B know that Server A is allowed to create that particular Fork object? If Lennon has had to tell it first, then it already has everything it needs to know without Server A telling it. Also, what if Server A refuses to tell Server B anything? My opinion is that this shouldn't prevent Server B from forking.

Note: Still talking about how I'm imagining it. The work group will discuss over it before anything is finalized. So think of everything I said at this point as "a possible solution" only.

Lennon must know the repository URL first. Server B would need Lennon to at least give it the "foobar/hello" full URL on server A. But having the URL does not mean it know which protocol or endpoint that it should look for. Not does it know, at this point, if Server A support federation at all.

With ActivityPub specifications, Server B can simply send an HTTP request to that exact URL and inspect it:

curl -L -H Accept:application/activity+json https://server-a.org/foo/bar

The URL should response to the "Accept" header and response a valid JSON-LD data. As specified in ActivityStream 2.0 and our GitPub extension context, the information would include:

  • What is the SCM you're using? (perhaps)
  • What is the URI for cloning? (https://server-a.org/foobar/hello.git or git://server-a.org/foobar/hello.git or event https://server-c.com/some-company/some-project/smile.git
  • Where can I pull updates from you? (e.g. https://server-a.org/foobar/hello/outbox)
  • Where can I push updates to you? (e.g. https://server-a.org/foobar/hello/inbox)
  • What can I push to you? (e.g. `["fork", "pr"])

When received this, Server B can tell Lennon that we can start forking. If received any invalid response, then Server B will have to tell Lennon, "Sorry, that URL does not support GitPub and federated forking".

A fork doesn't get any updates from its parent.

While not necessary, I like very much the GitHub feature showing: “This branch is x commits ahead, y commits behind upstream:master.”

I think the point is that Server B CAN choose to subscribe to upstream's update for such display. And of course it can do any other thing to achieve the same result (e.g. frequently fetching from Server A). If Server A is pushing message to Server B anyway, Server B might not like it.

@Arkanosis
Copy link
Member

@yookoala I agree. I concur with your “Totally optional”s above.

@arucard21
Copy link

@Arkanosis

How does Server B know that Server A is allowed to create that particular Fork object? If Lennon has had to tell it first, then it already has everything it needs to know without Server A telling it. Also, what if Server A refuses to tell Server B anything? My opinion is that this shouldn't prevent Server B from forking.

This is about permissions and should be handled by authentication and authorization mechanisms. ActivityPub recommends (non-normatively) using OAuth 2.0 for client-to-server authentication and authorization. That means that Lennon will likely get an OAuth 2.0 authorization request on Server B from the Repository Actor (e.g. "Do you want to allow Repository X from Server A to create a fork?"). Lennon can then, on Server B, authorize the Repository Actor to create the repository. This is just a rough indication, perhaps some details may need to be changed in order to make this work correctly.

@bill-auger
Copy link
Member

@arucard21 - that was very clear thanks

i quite assumed that activity-pub was more powerful than that - the reason i said those things is that those seemed to be the main concern the first day - i was mostly trying to ensure that the scope of discussion was not limited to those simple use-cases

i understand now what you meant by different approaches - the notabug document was really no approach though - it was just the requirements overview - there were multiple implementations separate from that - the second of which has settled on using activity-pub and is implementing it now

@yookoala
Copy link

yookoala commented Jun 7, 2018

@arucard21: You're talking about the Client-Server protocol of ActivityPub. But I think we're dealing with Server-Server communication in Fork and PRs. So I might not go for that path. Especially for public repositories, forking do not require any specific permission at all.

If we're talking about private repositories, then certain authentication and authorization would be needed. In which case, OAuth2 would be a nice choice for initial permission granting. In that case, a user might need to have an account on server A to be able to fork it to anywhere else.

@jaywink
Copy link
Member

jaywink commented Jun 7, 2018

I don't quite follow why User B who is on her own server needs access/account to the repo of User A who is on his own server. Assuming of course the repository is public. But even if it is private, if User B has access to the repository, why should they have to go through User A server? This would not allow users to act purely from their own server which IMHO is the whole point of decentralization. I don't need to visit/know about another server when I write a reply or share someones social media post. I don't want to visit/know about any other servers when I fork a repository I follow already either.

If we're talking about private repositories, then certain authentication and authorization would be needed.

IMHO, not really. First of course we need to know how does User B know about the repository of User A that is private. If they know about it - they already have access. This is like limited visibility social media posts, which are not public. The way activitypub handles private messaging is through targeting. If you don't want to say something is public, you assign it an audience. It's simple. If someone receives it, it has been sent to them, so they are allowed to see it.

Now when User B forks the private repo of User A they can do so without authentication since they already have the repository from before, because User A sent it to them. Thus, User A has no control any more whether User B can or cannot fork the repo. Naturally User A can remove visibility of User B at any time. Unfortunately (or luckily!) in a decentralized world, when you share something outside your server, you permanently lose control of that information - at least for that particular version of it.

@yookoala
Copy link

yookoala commented Jun 8, 2018

@jaywink: A private post in ActivityPub might not be a correct analogy. In SNS context, you'd read a post from feed listing. But in a Git services federverse, you're most likely browsing to that particular repository on some website (or if you're a member of that repo). So its more like a Mastodon account behind a good old fashion HTTP authentication.

In the current paradigm of Git services, private repositories are forbidden to all who has no valid authentication or authorization. So if you have not login, or if you're not a member of the private repository https://somebucket.org/topsecret/awesome-project, it is a 403 Forbidden or a 404 Not Found page for you. Same should apply to a git clone unless you have correct auth / rsa cert.

So to fit with that paradigm, user Lennon on https://server-b.org who wants the above repository would need to prove he / she has 2 different authentication:

  1. Authentication to https://server-b.org to prove he / she has permission to create repository there somewhere.
  2. Authentication to https://somebucket.org/ to prove he / she has permission to clone that topsecret/awesome-project repository.

I think @arucard21 model solve this situation by having either server be an OAuth2 application to the other server. That way, if you have login to one server, you can use OAuth2 to prove you have authentication to the other one. Problem solved.

@bill-auger
Copy link
Member

it does not need to be oauth dance either - there are other ways of doing cross-site auth - one of the notabug-2.0 implementations uses "macaroons" signed by each user's home-server's SSL cert - users can present their macaroon to foreign hosts directly and the foreign host needs only to verify it's authenticity against the user's home-server's SSL cert

@arucard21
Copy link

Especially for public repositories, forking do not require any specific permission at all.
Even for public repositories, you still need some permissions. At the very least, Lennon needs to have permission to create forks on Server B. This is usually automatically permitted as soon as you have an account on Server B. But this is why, even with public repo's, you can't just fork something without any authentication. Otherwise, this would be open to abuse. Someone could just start forking every repo to Lennon's Server B which would then get spammed or DDoS'ed.

if User B has access to the repository, why should they have to go through User A server? This would not allow users to act purely from their own server which IMHO is the whole point of decentralization.

I suggested the workflow where you go to Server A to fork the repo to Server B because that fits more closely to the way forks are currently created. Also, since the parent repo (on Server A) doesn't actually need to know about its forks (one of which would be on Server B), you actually don't need any federation API to create a fork on Server B. If you start the workflow from Server B, you just:

  • create a new fork (probably a button or menu option in the web frontend)
  • provide the git clone URL.
    • If it's a private repo, you'll need to provide credentials as well.
  • Server B can then fork the repo and set this provided URL as parent of the fork.

As you can see, this is already possible right now. No new API or federation communication is needed. But even here, you can't really work entirely from Server B, your own server. You have to go to Server A to get the git clone URL. So if you already have to go there, I thought it'd be nicer to just be able to press a button and provide your own server's URL and user credentials there. Presumably, these are more familiar to you. And whenever you have to fork another git repo, you just browse to that repo and provide the same URL and credentials to fork it. Seemed more intuitive that way.

it does not need to be oauth dance either - there are other ways of doing cross-site auth - one of the notabug-2.0 implementations uses "macaroons" signed by each user's home-server's SSL cert

I'm not familiar with macaroons but I quickly looked them up to get an idea of them. As far as I understand it, a macaroon can only be verified by the machine that issued it. So a foreign server, Server A in the running example, would not be able to verify a macaroon issued by Server B. In the workflow that starts from Server A (the one I suggested), it could replace OAuth 2.0 but it seems less suitable for that.

@ppwfx
Copy link
Member Author

ppwfx commented Jun 8, 2018

@arucard21

I really like the features that are currently build around forks

  • being able to see who forked a repo
  • being able to see how many people forked a repo
  • how far ahead the forks are

Very often a repo becomes unmaintained and by looking into the fork graph somebody can find an active repo

@yookoala
Copy link

yookoala commented Jun 8, 2018

I suggested the workflow where you go to Server A to fork the repo to Server B because that fits more closely to the way forks are currently created. Also, since the parent repo (on Server A) doesn't actually need to know about its forks (one of which would be on Server B), you actually don't need any federation API to create a fork on Server B. If you start the workflow from Server B, you just:

@arucard21: The workflow I originally imagined can simply translate to the "Fork" button in server A by some UI modification:

  1. Have an extra webfinger query for a resource endpoint:gitpub-fork on Server B. Just to see if it supports federated forking.
  2. When Lennon ask Server A to fork the repo foobar/hello, Server A can query for that endpoint and redirect him there. And that endpoint can have slightly different behavior if it carries certain query string (e.g. ?token=some-quickly-expired-token&repo=foobar/hello). Basically, all things up to step 3 in my previous comment is done by this roundtrip.

So I didn't think your workflow is necessary for public repo.

But I do think its simpler to have only 1 workflow that can apply to both public and private repos.

@arucard21
Copy link

I really like the features that are currently build around forks

I think those features are also nice. But I consider them separate from the normal forking process. You could make this possible by having the parent repo subscribe to all forks. I think this is better supported in the workflow that starts from Server A (where the parent repo is hosted), because then Server A will always know when a fork is created. If you start the workflow from Server B, then Server B would have to notify Server A that a fork was created, so it could subscribe to it. Of course, there is no guarantee that Server B will provide this notification (either accidentally or maliciously).

The workflow I originally imagined can simply translate to the "Fork" button in server A by some UI modification

This is actually also what I had in mind. But instead of going to an endpoint on Server B, this modified button would just send the "Create Fork" ActivityPub request directly.

But I do think its simpler to have only 1 workflow that can apply to both public and private repos.

Agreed.

@bill-auger
Copy link
Member

the mailing list is now fully functional - a thread has been started on the mailing list to continue the discussion in this issue - for those who are subscribed to the mailing list, check your email for the thread titled "extending the activity-pub protocol"; and reply to it to continue this discussion begun on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests