-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ask w3c/activitypub how to extend the activitypub protocol #2
Comments
Cwebber chimed in on the issues of gogs and gitea offering to answer questions: gogs/gogs#4437 (comment) |
@EorlBruder I've already chatted with @cwebber and he gave me some tips of how to implement ActivityPub for gitea. He recommended for example to use ocap-ld. For more details, please checkout the chat protocol: https://chat.indieweb.org/social/2018-06-04 |
https://www.w3.org/TR/activitypub/#Overview
|
I initially thought Fork and PR can be defined as types of Activity. If you think about how we usually work with git services, we actually subscribe to repositories and branches. Not users. Which is a bit different from the model as described in ActivityPub. If a PR or a Fork is just an activity one issue I can think of is the lack of description for a git service to act upon. It also make it hard for repository / branch updates (e.g. new push, delete) propagate. If a repository / branch is more like an actor, then the whole protocol can be repository / branch subscribing each other for updates. The protocol seems to be more natural. Just my opinion at the moment. |
I agree that it makes more sense for repositories to be actors. Then, for instance, opening an issue would be modeled as the repository opening the issue, with a separate metadata field for the username of the user that opened it. Then, a repo would be In this way, "users" (individual contributors) become a non-entity in the GitPub AP model. This is basically fine as they are tracked by Git, and we can tag issues and PRs separately, but it does raise questions around e.g. behavior control (reporting, blocking) and "social" features, such as providing a GitHub like "follow this user" feature. |
How about treating Repos like "virtual" users. That way we get notifications about stuff happening at no cost. That would also take into account what @yookoala wrote about the usual subscribe-habits on git services. Also we could implement branches as the equivalent to threads in communications. the commit message could be the CW, the diff would come as "body" of the message (should I explain why I find that useful?). Still no cost. Then Issues.... Messages with mention of the repo-handle without a CW? I kinda dislike that idea since it would mean you cannot CW a issue what would be useful e.g. when writing the issue from an account that usually interacts with non-techie people that don't like to read long issues. Just some thought and I always like to learn why I am wrong |
@LeoTindall what's the benefit of adding I think the way uris of repos look like should be stay as familiar as possible as it will be easier to adapt and a lot of tooling is relying on it, e.g. golang import doesn't support that format plus I think it makes sense to have the user that owns the repo in the uri too |
I do have to disagree on not representing users as proper AP actors. There is no reason not to also represent repositories, organizations and such as actors too - that is all fine in the AP/ActivityStreams2 world. In fact, it has multiple different types of actors (and if something is missing - extensions to the rescue!) An actor is something that does an activity. Users can do activities, like liking a repository, following a person. Repositories can do activities, but also organizations can do activities, like "create a repository".
In AP, actor ID's are URL's, I would encourage sticking to this for compatibility. A repo would then for example be The problem IMHO something like GitPub should solve is:
What we're dealing with is not any more difficult or special than social networking content. We're dealing with PR's (= posts), comments (= err.. comments), stars/likes (= stars/likes). What really IMHO only requires extensions is a pull request/merge request object and activities related to them. Any addition metadata can be attached to the activities/objects as needed via extensions. |
I do not oppose to the notion "user being an actor". It would be interesting to see all Git web service join the federverse in long run (e.g we can subscribe to a gitlab account activity from pump.io or mastodon). Also it comes natural to federate comments, posts and reactions with existing AP spec. My comment meant to describe how PR and Fork activity should be described. I think if a protocol like GitPub should pick up momentum, it should be simple and specific to its own problem domain. So if something has already been specified in AP (or GitPubSub) and work well, a spec like GitPub do not need to specify it at all. It can simply refer to those existing spec. That way, we might also leverage on existing libraries and ease the pain in the implementation process. |
I also think that this should be implemented by simply providing additional Object types (and possibly Actor types) that can be used with the standard ActivityPub protocol. This should be sufficient for creating forks and merge requests, as explained by @jaywink. As such, I think we should not be defining a GitPub protocol and perhaps stop calling it that to avoid confusion. This project should most likely just provide these additional Object types. I also see some different things being considered that may not be relevant. So it might be a good idea to define what exactly needs to be possible with federation, and what doesn't. As I understand it, forks and merge requests are entirely limited to web frontends for git repositories. This has nothing to do with git (the versioning control system) itself. I see these 3 user stories that would need to be made possible here:
As I understand it, 1 requires the addition of a Fork Object type, 2 requires the addition of both a Fork and a Merge Request Object type, and 3 requires the addition of a Repository Actor type. Of course, many details would still need to ironed out but this might help provide focus to the discussions about them. |
Issues would then simply be modeled as |
@21stio The question is whether you really want to tackle issues right now: you'd have to find a common ground for semantics, metadata representation and workflows. Also, solutions for issue tracking inside the repo already exist, e.g. git-issues and git-dit (of which I am a co-author). There's an entire distributed-issue-tracking community: https://dist-bugs.branchable.com/ AFAIK most of those solutions are not production ready, or lack a mechanism to properly propagate issues. The latter could be provided by this project, which is one of the reasons why I'm interested in it as a maintainer of git-dit. Btw: there's also git-appraise, which is a system for reviewing merge-requests. |
allow me to jump in here briefly - only because i notice that most of the participants of this thread are not represented on the other threads on this repo where i have been in discussion so far - the ideas here seem to be flying at a frantic pace, although this group is less than a day old; so i would like to offer some external context there is no reason to presume that a pull request or any of the actions mentioned here are somehow intrinsically "webby" activities - it only happens to be so today - for example: the anatomy of a pull request is a .patch file that is associated with source repo URL and a destination repo URL, perhaps with a topic and additional comments (and o/c the git commit log can contain those intrinsically) - all of that data could be trivially transmitted in plain text via email, for example - and in fact, that is exactly how merge requests were done before git after reading this thread, i get the impression that the main goal here is to produce "activity streams" to be consumed by "social" relay services such as mastadon and friends - there is no need to constrain the scope of applicability to web pages nor to "social" use-cases - if done properly this is just an API that can be fully implemented in a wide variety of clients such as email, command line, and native desktop applications - if you are thinking this will only be applicable to websites, you are not thinking grand enough there is a lot of excitement here which is great, but i should point out that this is no new idea - people have been discussing this topic for a long time and there are already some design documents describing not a protocol, but a complete federated hosting solution accommodating multiple types of clients; with activity-pub being added to the discussion only recently as one possible communications format - these extensions to activity-pub are really the least of the work that needs to happen in regards to a complete system - the communication protocol is discussed very little in those existing documents as it is among the least time-consuming tasks and one that really only needs to be done after some working server and client exists - the idea of "social activity streams" came as an after-though, as a bonus feature that activity-pub would provide; but that is far from the most important feature - the primary, over-arching goal of federating your project should be to collaborate on software without relying on third-party hosts, not merely to announce you activities - im not sure if anyone else here has looked at it that way; so i had to "throw that out there" for yas i would hope that people here would take a look at the work that has already been done on the notabug-2.0 and vervis projects; if only for inspiration of what is possible and desirable outside the constraints of the web browser |
to the comment above:
that is a github-ism - not all forges have a concept of repo "owner" or "namespace" - sourceforge and pague are 2 notable examples where the repo is the top-level atom and users are entirely orthogonal |
IMHO maybe avoid using Edit: but yes a comment could be a |
regarding the concept of "actors" - i do not know the first thing about activity-pub but a repo is not an actor in any semantic sense, because a repo can not initiate any actions - all actions in the system are initiated by users either directly (e.g. pressing a button) or indirectly (e.g. git push) and repos are the targets of most such actions - but users are also the targets of some actions (such as "follow" and "mention") so users are certainly "actors" in that they are the only initiators of actions and are also the taget of certain actions initiated by other users perhaps i am just confusing your nomenclature and perhaps repos need to be "actors" for some technical reason; but that above is the common sense description of the real-world agents and events - object-oriented systems are supposed to model those common sense descriptions |
@bill-auger: According to the ActivityStream's Vocabulary spec,
Person is only one of them. And you also got Application and Service here. Being an actor in ActivityPub also means you have an inbox and outbox for a proper pub-sub to happen. If you think about it, a repository is the result of a series of activities (e.g push, force push. rebase, squash, remove) A branch also. Although those activities are done by different users, we're more interested in subscribing the repository or the branch instead of the user who did that. It is trivial to have the same activities on the feed of those users, but the repositories and branches should be subscribable. An in the case of PR and Fork. What's usually happen is someone updates the downstream branch, the upstream needs an update. Especially for PRs, which case upstream would want to know that before merging. So downstream branch should be able to report on their activities to upstream repository / branch (and not reporting all activities of that user, nor reporting to the upstream owner). That's what I meant for repository and branch being actor. |
We might also describe commits as activity that involves 2 actor: the User and the Branch. So you may select which one to subscribe to if needed. |
yes im quite sure i am just confusing the words - actor in the networking sense is like a "channel" with an input and an output - a repo probably does not need an output but it needs an input - a user needs both i only remarked that because someone said a repo should be an actor but a user should not - that did not jive with me |
@21stio
I think issues should be left out-of-scope for now. While included in some web frontends, they are not a core part of the git collaboration workflow. Please note that I'm not saying that issue tracking isn't important, I'm saying that it's an entirely separate concern. While important, it would complicate matters significantly to include this in a first attempt to get federated git collaboration.
I took a quick look at this project and it seems to be focused on an entirely separate implementation. I think the knowledge and experience from that project is useful, but they seem to be different approaches. But I also don't think this discussion is focused on "webby" activities. I think this is something that will impact the web frontends, but what we are discussing is how to use ActivityPub for this. This should result in a (HTTP-based) API that can be used from the web frontends but also from any other clients or platforms. This is also what you describe as "if done correctly" so perhaps we're just misunderstanding each other here.
I agree that since Service is considered one of the core Actor types, it seems natural to consider a Repository an Actor type too. I'm not sure if it's needed to have Branches as Actor type as well. I think federation does not need to happen on the branch or commit level. That might make the API too "chatty" which will cause problems with network traffic and scalability. If you wish to receive updates about a specific branch, you can still just subscribe to the repository and then filter locally on just accepting updates about that branch. Implementation-wise, it may make more sense to have the server (as Application Actor) or Fork (as Repository Actor) subscribe to the upstream repository (as Repository Actor). That way any updates on the upstream repo can be synced to the forked repo, and users can be notified of this through the normal notification system of their local server. As a first specification, I think it makes sense to start with the simplest user story. This allows more focused discussion and highlights many general problems early on, without having to deal with the complexity of all the other functionality you might want to implement. In this case, I think the simplest user story would be: A big question here might be whether to do this client-to-server or server-to-server. Client to server seems to fit best, with the user following a repository on another server, but this might cause too much traffic. But this is a problem ActivityPub would have on social media as well, with many users following many other users. Perhaps there already is some way to deal with this. |
arucard21 - im not sure why you would say "different approaches" - that could be making exactly my point for me i am saying that this should be much more that an "activity stream" to be consumed by social websites and placed on your "activity wall" or tweeted to your fans - like "john just starred a repo!!!" - "dave just pushed a commit" - those are the "webby" concerns - there is no other use case - and frankly i find them annoying this should be a much grander endeavour to create a federated network of complete forges - activity streams are the least interesting aspect of that - just an adornment that could totally be omitted without losing an iota of awesomeness |
@arucard21: I thought of the 2 major case like this: Forking
Notes
Pull Request
|
@arucard21: In the Pull Request flow above, since changes are done on a specific branch, and not entire repository. So it seems only intuitive to also make branch an actor. |
I don't think this is the case here. We're looking at ActivityPub which is built using ActivityStreams. It actually allows much more than just this kind of stuff that would show up in an "activity wall" type thing. ActivityPub was created for social networking but essentially supports federation of any kind of content, by using custom Object, Actor or Activity types. So a lot of the concerns about federation in general will already be thought of and handled by using ActivityPub. I think that NotABug differs in this aspect, since it tries to think of and handle all these concerns by itself. This may end up with a more suitable and streamlined protocol, but will likely take much more effort to complete. I hope this clarifies things a bit, and if I still misunderstand you, let me know. |
@yookoala
Subscribing to the repo on Server A can be done by Server B. It has all the information required to do this. But this is not necessary for forks. A fork doesn't get any updates from its parent. This is only done through git, but pulling from upstream and pushing to your own fork. So once forked, the parent and its fork do not need to be synchronized. The only time this "parent" information is used again, is when you want to create a Pull/Merge Request from your fork to the parent. For the Pull/Merge Requests, I think it makes more sense to generate the diff (or a patch) on Server B and send that to Server A. In ActivityPub, this would result in Lennon sending a "Create" Activity with a "Pull/Merge Request" Object to the parent "Repository" Object on Server A. This allows for very simple communication where you don't have to keep track of too many things. Implementation-wise, this may require Server B to pull the latest commits of the branch it needs to compare against from Server A. So perhaps Server B can create a branch (e.g. "federated_master") on the fork, pull the latest commits to it, generate the diff, then delete the branch again. Hopefully, there are even better ways to do this. This is just a first-guess to indicate that there might be some more complexity, but it should be solvable. |
I like the idea of starting from the server A. But then:
How does Server B know that Server A is allowed to create that particular Fork object? If Lennon has had to tell it first, then it already has everything it needs to know without Server A telling it. Also, what if Server A refuses to tell Server B anything? My opinion is that this shouldn't prevent Server B from forking.
While not necessary, I like very much the GitHub feature showing: “This branch is x commits ahead, y commits behind upstream:master.” |
Note: Still talking about how I'm imagining it. The work group will discuss over it before anything is finalized. So think of everything I said at this point as "a possible solution" only. Lennon must know the repository URL first. Server B would need Lennon to at least give it the "foobar/hello" full URL on server A. But having the URL does not mean it know which protocol or endpoint that it should look for. Not does it know, at this point, if Server A support federation at all. With ActivityPub specifications, Server B can simply send an HTTP request to that exact URL and inspect it:
The URL should response to the "Accept" header and response a valid JSON-LD data. As specified in ActivityStream 2.0 and our GitPub extension context, the information would include:
When received this, Server B can tell Lennon that we can start forking. If received any invalid response, then Server B will have to tell Lennon, "Sorry, that URL does not support GitPub and federated forking".
I think the point is that Server B CAN choose to subscribe to upstream's update for such display. And of course it can do any other thing to achieve the same result (e.g. frequently fetching from Server A). If Server A is pushing message to Server B anyway, Server B might not like it. |
@yookoala I agree. I concur with your “Totally optional”s above. |
This is about permissions and should be handled by authentication and authorization mechanisms. ActivityPub recommends (non-normatively) using OAuth 2.0 for client-to-server authentication and authorization. That means that Lennon will likely get an OAuth 2.0 authorization request on Server B from the Repository Actor (e.g. "Do you want to allow Repository X from Server A to create a fork?"). Lennon can then, on Server B, authorize the Repository Actor to create the repository. This is just a rough indication, perhaps some details may need to be changed in order to make this work correctly. |
@arucard21 - that was very clear thanks i quite assumed that activity-pub was more powerful than that - the reason i said those things is that those seemed to be the main concern the first day - i was mostly trying to ensure that the scope of discussion was not limited to those simple use-cases i understand now what you meant by different approaches - the notabug document was really no approach though - it was just the requirements overview - there were multiple implementations separate from that - the second of which has settled on using activity-pub and is implementing it now |
@arucard21: You're talking about the Client-Server protocol of ActivityPub. But I think we're dealing with Server-Server communication in Fork and PRs. So I might not go for that path. Especially for public repositories, forking do not require any specific permission at all. If we're talking about private repositories, then certain authentication and authorization would be needed. In which case, OAuth2 would be a nice choice for initial permission granting. In that case, a user might need to have an account on server A to be able to fork it to anywhere else. |
I don't quite follow why User B who is on her own server needs access/account to the repo of User A who is on his own server. Assuming of course the repository is public. But even if it is private, if User B has access to the repository, why should they have to go through User A server? This would not allow users to act purely from their own server which IMHO is the whole point of decentralization. I don't need to visit/know about another server when I write a reply or share someones social media post. I don't want to visit/know about any other servers when I fork a repository I follow already either.
IMHO, not really. First of course we need to know how does User B know about the repository of User A that is private. If they know about it - they already have access. This is like limited visibility social media posts, which are not public. The way activitypub handles private messaging is through targeting. If you don't want to say something is public, you assign it an audience. It's simple. If someone receives it, it has been sent to them, so they are allowed to see it. Now when User B forks the private repo of User A they can do so without authentication since they already have the repository from before, because User A sent it to them. Thus, User A has no control any more whether User B can or cannot fork the repo. Naturally User A can remove visibility of User B at any time. Unfortunately (or luckily!) in a decentralized world, when you share something outside your server, you permanently lose control of that information - at least for that particular version of it. |
@jaywink: A private post in ActivityPub might not be a correct analogy. In SNS context, you'd read a post from feed listing. But in a Git services federverse, you're most likely browsing to that particular repository on some website (or if you're a member of that repo). So its more like a Mastodon account behind a good old fashion HTTP authentication. In the current paradigm of Git services, private repositories are forbidden to all who has no valid authentication or authorization. So if you have not login, or if you're not a member of the private repository So to fit with that paradigm, user Lennon on
I think @arucard21 model solve this situation by having either server be an OAuth2 application to the other server. That way, if you have login to one server, you can use OAuth2 to prove you have authentication to the other one. Problem solved. |
it does not need to be oauth dance either - there are other ways of doing cross-site auth - one of the notabug-2.0 implementations uses "macaroons" signed by each user's home-server's SSL cert - users can present their macaroon to foreign hosts directly and the foreign host needs only to verify it's authenticity against the user's home-server's SSL cert |
I suggested the workflow where you go to Server A to fork the repo to Server B because that fits more closely to the way forks are currently created. Also, since the parent repo (on Server A) doesn't actually need to know about its forks (one of which would be on Server B), you actually don't need any federation API to create a fork on Server B. If you start the workflow from Server B, you just:
As you can see, this is already possible right now. No new API or federation communication is needed. But even here, you can't really work entirely from Server B, your own server. You have to go to Server A to get the git clone URL. So if you already have to go there, I thought it'd be nicer to just be able to press a button and provide your own server's URL and user credentials there. Presumably, these are more familiar to you. And whenever you have to fork another git repo, you just browse to that repo and provide the same URL and credentials to fork it. Seemed more intuitive that way.
I'm not familiar with macaroons but I quickly looked them up to get an idea of them. As far as I understand it, a macaroon can only be verified by the machine that issued it. So a foreign server, Server A in the running example, would not be able to verify a macaroon issued by Server B. In the workflow that starts from Server A (the one I suggested), it could replace OAuth 2.0 but it seems less suitable for that. |
I really like the features that are currently build around forks
Very often a repo becomes unmaintained and by looking into the fork graph somebody can find an active repo |
@arucard21: The workflow I originally imagined can simply translate to the "Fork" button in server A by some UI modification:
So I didn't think your workflow is necessary for public repo. But I do think its simpler to have only 1 workflow that can apply to both public and private repos. |
I think those features are also nice. But I consider them separate from the normal forking process. You could make this possible by having the parent repo subscribe to all forks. I think this is better supported in the workflow that starts from Server A (where the parent repo is hosted), because then Server A will always know when a fork is created. If you start the workflow from Server B, then Server B would have to notify Server A that a fork was created, so it could subscribe to it. Of course, there is no guarantee that Server B will provide this notification (either accidentally or maliciously).
This is actually also what I had in mind. But instead of going to an endpoint on Server B, this modified button would just send the "Create Fork" ActivityPub request directly.
Agreed. |
the mailing list is now fully functional - a thread has been started on the mailing list to continue the discussion in this issue - for those who are subscribed to the mailing list, check your email for the thread titled "extending the activity-pub protocol"; and reply to it to continue this discussion begun on this issue |
maybe it would be a good idea to ask the activitypub authors for general guidelines on how to extend the protocol, to make sure it won't end in a big mess
The text was updated successfully, but these errors were encountered: