Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for distributed notes updates #1473

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dscho
Copy link
Member

@dscho dscho commented Nov 8, 2023

One of the biggest blockers preventing GitGitGadget from moving away from Azure Pipelines toward GitHub workflows is the complication that the refs/notes/gitgitgadget note holds global state and must therefore be updated sequentially.

In the Azure Pipelines approach, we manage that by having an agent pool of one, which naturally can only serve one request at a time. That way, every job can first fetch the GitGitGadget notes ref, do whatever it needs to do, leisurely update the notes ref to reflect the new state and push the ref. Then, the next job can do the same and there is never a situation where a job tries to push the notes ref and fails because another, concurrent job has updated said ref in the meantime.

For various reasons, adding a pool of one to GitHub Actions is impractical.

In this PR, I propose a solution that allows for jobs to run concurrently, fetching the GitGitGadget notes, doing their thing, updating the notes ref along the way, and once it is time to push the notes ref and a situation is detected where it does not fast-forward, combine the changes in a way that is inspired by Conflict-free replicated data types (CRTD), so that it can be pushed without problems after all.

The strategy implemented in this PR is to prefer the local changes in case of a hard conflict.

This means we will still have to be careful to avoid the same GitHub workflow to run concurrently: for example, we really will only ever want at most a single instance of the job to run whose responsibility it is to read new mails from the Git mailing list and mirror the relevant ones into the corresponding PRs. Should two jobs of that workflow run at the same time, they would end up adding PR comments twice, which is undesirable.

However, this PR unlocks the idea to let the workflow that mirrors the Git mailing list to the PRs run concurrently with multiple workflows that submit PRs as patches to the Git mailing list.

Note that this PR only adds that functionality, but it is in no way used yet. This will be the purpose of a follow-up PR.

To handle the situation when concurrent GitGitGadget operations want to
modify the state that is persisted in Git notes (such as
`refs/notes/gitgitgadget`), we need a way to reconcile diverging Git
notes.

Happily, the way GitGitGadget works makes this relatively
straight-forward, as concurrent operations usually do not conflict and
merely need to add/modify/remove independent attributes and/or values.
This allows us to treat these Git notes as kind of a Conflict-free
Replicated Data Type (https://en.wikipedia.org/wiki/CRDT).

The strategy implemented in this patch is relatively simple, and follows
the "local-first" idea (where state is applied locally first and only
then persisted remotely): The newly added method determines what changes
were made locally, and then reapplies them on top of the upstream note
commit. The local operations need to fall into one of the following
categories:

- a note has been appended to

- a single line of text (which may, or may not, be a stringified JSON
  object) is added as a note on an object that had not had a note yet

- a note containing a single line of text has been replaced with a new,
  single line of text

- a note that contained a JSON object (stringified as a single line of
  text) has been modified. Valid modifications include:

  - an attribute that was added

  - an attribute that has been removed

  - an attribute containing a primitive type (string, number or boolean)
    has been modified (if local and upstream diverged, local wins)

  - items from an attribute containing an array have been removed and/or
    added

  - an attribute containing a Plain Old Javascript Object has seen any
    of these modifications

Other modifications are as yet unhandled.

For historical reasons (*cough* Git commands implemented in Unix shell
scripts *cough*) it is impossible to add a Git note that does not end in
a newline character. Let's ensure that we don't do that, either.

To avoid tampering with a Git index, let's use a temporary index. This
requires the environment variable `GIT_INDEX_FILE` to be passed to the
Git processes, via the `env` attribute of the `IGitOptions` (which only
need to be declared and no other changes are needed because the
`options` are then used as dugite's `IGitExecutionOptions`, which
already has the `env` attribute).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We actually do want to work exclusively on bare repositories. A recent
Git version requires either the `safe.bareRepository` config setting to
indicate that it is safe, or the git dir to be specified explicitly via
`--git-dir`. Let's do the latter.

This will become very important when we switch to using GitHub Actions
with ephemeral (and partial) clones of `gitgitgadget/git`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Copy link
Member Author

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some grammar fix and clearer code comment.

const tmpIndex = await this.makeTemporaryNotesIndex(upstreamCommit, this.workDir);

// Need to do a 3-way merge
// not as easy as `git merge-tree` because some file contain JSON objects that need special handling
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// not as easy as `git merge-tree` because some file contain JSON objects that need special handling
// not as easy as `git merge-tree` because some files contain JSON objects that need special handling

export type POJO = { [name: string]: string | string[] | number | number[] | boolean | POJO };

/*
* Represents a temporary Git index that reflects a note tip commit, ready
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Represents a temporary Git index that reflects a note tip commit, ready
* Represents a temporary Git index that reflects the revision at the tip of the `refs/notes/gitgitgadget` branch, ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant