Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New subcommand that allows exports from git-annex? #7625

Open
dmcardle opened this issue Feb 10, 2024 · 4 comments
Open

New subcommand that allows exports from git-annex? #7625

dmcardle opened this issue Feb 10, 2024 · 4 comments

Comments

@dmcardle
Copy link
Contributor

dmcardle commented Feb 10, 2024

Hi folks, is there any interest in a new rclone subcommand that acts as a git-annex remote? The term "remote" is unfortunately overloaded by git-annex and rclone; I'm describing an rclone subcommand that speaks the git-annex external special remote protocol, enabling git-annex to store and retrieve content from rclone remotes.

A similar project called git-annex-remote-rclone already exists, but my primary complaint is that it does not support git-annex-export, aka "tree exports". As a result, the content exported from git-annex to an rclone remote is opaque to humans -- just a bunch of files with names like "GPGHMACSHA512--${DIGEST}", not a browseable file tree.

I think there's also some potential performance improvements if we move the client into rclone. For instance, it could support git-annex's ASYNC protocol extension rather than transferring files one at a time. I assume there's also some overhead to exec-ing rclone once per file that could be eliminated with a more persistent client.

This is admittedly pretty niche, but I'd be pleasantly surprised if there's any community interest!


The associated forum post URL from https://forum.rclone.org

https://forum.rclone.org/t/new-subcommand-for-git-annex-remote/44546

What is your current rclone version (output from rclone version)?

N/a.

What problem are you are trying to solve?

Export content from git-annex.

How do you think rclone should be changed to solve that?

With a new subcommand.

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@ncw
Copy link
Member

ncw commented Feb 18, 2024

This certainly sounds interesting.

I don't know how the git annex protocol works but if it involves calling rclone many times then it would be most efficient to run an rclone server.

Is this something you are interested in writing?

@dmcardle
Copy link
Contributor Author

dmcardle commented Feb 18, 2024

Glad to see there's some interest!

Is this something you are interested in writing?

Yep, I'm already working on it!

@ncw Would you accept this as a new subcommand, or would you prefer it to be a separate binary?

I find motivation/time in bursts, so it's hard to estimate when I'll have something worth sharing. Regardless, here's a rough sequence of milestones I'd like to hit:

  1. ✔️ : Minimal support for the external special remote protocol. I believe this would achieve feature parity with git-annex-remote-rclone.
  2. Support user migrations from git-annex-remote-rclone and improve end-to-end testing.
    1. ✔️ Improve the end-to-end test script to not make assumptions about the HOME directory.
    2. ✔️ Support git-annex-remote-rclone's repository layout options to enable user migration, e.g. frankencase.
    3. Consider supporting aliases of configs for compatibility with git-annex-remote-rclone. This should make transitioning remotes less painful.
    4. Support PROGRESS messages.
  3. Add support for the ASYNC protocol extension. This should save us the overhead of starting up rclone N times over N transfers.
    • Maybe develop a benchmark to measure the performance improvement.
  4. Add support git-annex's simple export interface. This will enable exporting human-browseable file trees to rclone remotes.

dmcardle added a commit to dmcardle/rclone that referenced this issue Feb 29, 2024
This PR hits milestone 1 from issue rclone#7625, namely "minimal support for
the external special remote protocol".

It adds a new go package in contrib/gitannex/ that implements the
support. It also adds the git-annex-remote-rclone-goyle command, or
"garrgoyle" for short.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Feb 29, 2024
This PR hits milestone 1 from issue rclone#7625, namely "minimal support for
the external special remote protocol".

It adds a new go package in contrib/gitannex/ that implements the
support. It also adds the git-annex-remote-rclone-goyle command, or
"garrgoyle" for short.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 3, 2024
This PR hits milestone 1 from issue rclone#7625, namely "minimal support for
the external special remote protocol".

It adds a new go package in contrib/gitannex/ that implements the
support. It also adds the git-annex-remote-rclone-goyle command, or
"garrgoyle" for short.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 4, 2024
This PR hits milestone 1 from issue rclone#7625, namely "minimal support for
the external special remote protocol".

It adds a new go package in contrib/gitannex/ that implements the
support. It also adds the git-annex-remote-rclone-goyle command, or
"garrgoyle" for short.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 4, 2024
This commit adds a new go package in contrib/gitannex/ and a new program
within named "git-annex-remote-rclone-goyle", or "garrgoyle" for short.

This accomplishes milestone 1 from issue rclone#7625, namely "minimal support
for the external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 10, 2024
This commit adds a new subcommand named "gitannex". It is also called
"git-annex-remote-rclone-goyle", or "garrgoyle" for short.

This accomplishes milestone 1 from issue rclone#7625, namely "minimal support
for the external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 10, 2024
This commit adds a new subcommand named "gitannex". It is also called
"git-annex-remote-rclone-goyle", or "garrgoyle" for short.

This accomplishes milestone 1 from issue rclone#7625, namely "minimal support
for the external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 10, 2024
This commit adds a new subcommand named "gitannex". It is also called
"git-annex-remote-rclone-goyle", or "garrgoyle" for short.

This accomplishes milestone 1 from issue rclone#7625, namely "minimal support
for the external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 10, 2024
This commit adds a new subcommand named "gitannex". It is also called
"git-annex-remote-rclone-goyle", or "garrgoyle" for short.

This accomplishes milestone 1 from issue rclone#7625, namely "minimal support
for the external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 11, 2024
This commit adds a new subcommand named "gitannex". It is also called
"git-annex-remote-rclone-goyle", or "garrgoyle" for short.

This accomplishes milestone 1 from issue rclone#7625, namely "minimal support
for the external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 11, 2024
This commit adds a new subcommand named "gitannex". It is also called
"git-annex-remote-rclone-goyle", or "garrgoyle" for short.

This accomplishes milestone 1 from issue rclone#7625, namely "minimal support
for the external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 23, 2024
This commit adds a new subcommand named "gitannex", aka
"git-annex-remote-rclone-goyle" when invoked via a symlink.

This accomplishes milestone 1 from issue rclone#7625: "minimal support for the
external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 23, 2024
This commit adds a new subcommand named "gitannex", aka
"git-annex-remote-rclone-goyle" when invoked via a symlink.

This accomplishes milestone 1 from issue rclone#7625: "minimal support for the
external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 23, 2024
This commit adds a new subcommand named "gitannex", aka
"git-annex-remote-rclone-builtin" when invoked via a symlink.

This accomplishes milestone 1 from issue rclone#7625: "minimal support for the
external special remote protocol".

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Mar 25, 2024
This commit adds a new subcommand named "gitannex", aka
"git-annex-remote-rclone-builtin" when invoked via a symlink.

This accomplishes milestone 1 from issue rclone#7625: "minimal support for the
external special remote protocol".

Issue rclone#7625
ncw pushed a commit that referenced this issue Mar 26, 2024
This commit adds a new subcommand named "gitannex", aka
"git-annex-remote-rclone-builtin" when invoked via a symlink.

This accomplishes milestone 1 from issue #7625: "minimal support for the
external special remote protocol".

Issue #7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 1, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about /tmp.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 1, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about /tmp.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 1, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about /tmp.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 4, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about the contents of /tmp.

However, it now assumes that an rclone built from the HEAD commit is on
the PATH. It makes a best-effort attempt to verify this assumption, but
I'm not sure it's bulletproof.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 4, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about the contents of /tmp.

However, it now assumes that an rclone built from the HEAD commit is on
the PATH. It makes a best-effort attempt to verify this assumption, but
I'm not sure it's bulletproof.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 4, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about the contents of /tmp.

However, it now assumes that an rclone built from the HEAD commit is on
the PATH. It makes a best-effort attempt to verify this assumption, but
I'm not sure it's bulletproof.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 4, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about the contents of /tmp.

However, it now assumes that an rclone built from the HEAD commit is on
the PATH. It makes a best-effort attempt to verify this assumption, but
I'm not sure it's bulletproof.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 4, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about the contents of /tmp.

However, it now assumes that an rclone built from the HEAD commit is on
the PATH. It makes a best-effort attempt to verify this assumption, but
I'm not sure it's bulletproof.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 4, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about the contents of /tmp.

However, it now assumes that an rclone built from the HEAD commit is on
the PATH. It makes a best-effort attempt to verify this assumption, but
I'm not sure it's bulletproof.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 4, 2024
This commit implements milestone 2.1 for the gitannex subcommand:
rclone#7625 (comment)

This rewrite makes a few improvements over the old shell script:

(1) It no longer uses the system's rclone.conf. Now, it writes the
    rclone.conf file in an ephemeral directory.

(2) It no longer makes any assumptions about the contents of /tmp.

However, it now assumes that an rclone built from the HEAD commit is on
the PATH. It makes a best-effort attempt to verify this assumption, but
I'm not sure it's bulletproof.

I'm hoping that writing this in Go will enable more cross-platform
support in the future, but for now we're still restricted to Unixy
systems due to reliance on the HOME environment variable.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 30, 2024
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 30, 2024
For each layout mode, these tests start with a git-annex-remote-rclone
remote, migrate it to a git-annex-remote-rclone-builtin remote. They
verify that a file copied pre-migration is still present and that `git
annex testremote` passes.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 30, 2024
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 30, 2024
Verbose output enables us to see which tests were skipped, which is
useful for skip-happy end-to-end tests in cmd/gitannex.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 30, 2024
This enables gitannex end-to-end tests to run on CI. Otherwise, the
version would not match and tests that check the rclone version would
fail like so:

```
=== RUN   TestEndToEnd
    e2e_test.go:199: Skipping due to rclone version: expected version "v1.67.0-DEV", but got "v1.67.0-beta.7905.220bbe24d.merge"
--- SKIP: TestEndToEnd (0.07s)
```

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue Apr 30, 2024
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
This commit adds support for the same repo layouts supported by
git-annex-remote-rclone. This should enable git-annex users with remotes
of type "rclone" to switch to a "rclone-builtin" without needing to
retransfer content.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
TestEndToEndRepoLayoutCompat exercises git-annex-remote-rclone-builtin
and git-annex-remote-rclone on the same rclone remote to ensure they are
compatible. It repeats the same test for all known layout modes.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
I'm hopeful that running these in parallel will not impact CI runtime
very much, but that likely depends on the number of CPU cores and
whether the tmp filesystem is backed by memory vs a physical disk.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
Now that e2e tests are running in parallel, undoing the chdir to the
temp dir was causing flaky failures on cleanup. We don't need it anyway
because the worrisome subcommands have their working directory
controlled by `runInRepo()`.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
For each layout mode, these tests start with a git-annex-remote-rclone
remote, migrate it to a git-annex-remote-rclone-builtin remote. They
verify that a file copied pre-migration is still present and that `git
annex testremote` passes.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
Verbose output enables us to see which tests were skipped, which is
useful for skip-happy end-to-end tests in cmd/gitannex.

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
This enables gitannex end-to-end tests to run on CI. Otherwise, the
version would not match and tests that check the rclone version would
fail like so:

```
=== RUN   TestEndToEnd
    e2e_test.go:199: Skipping due to rclone version: expected version "v1.67.0-DEV", but got "v1.67.0-beta.7905.220bbe24d.merge"
--- SKIP: TestEndToEnd (0.07s)
```

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
This enables gitannex end-to-end tests to run on CI. Otherwise, the
version would not match and tests that check the rclone version would
fail like so:

```
=== RUN   TestEndToEnd
    e2e_test.go:199: Skipping due to rclone version: expected version "v1.67.0-DEV", but got "v1.67.0-beta.7905.220bbe24d.merge"
--- SKIP: TestEndToEnd (0.07s)
```

Issue rclone#7625
dmcardle added a commit to dmcardle/rclone that referenced this issue May 13, 2024
ncw pushed a commit that referenced this issue May 13, 2024
This commit adds support for the same repo layouts supported by
git-annex-remote-rclone. This should enable git-annex users with remotes
of type "rclone" to switch to a "rclone-builtin" without needing to
retransfer content.

Issue #7625
ncw pushed a commit that referenced this issue May 13, 2024
TestEndToEndRepoLayoutCompat exercises git-annex-remote-rclone-builtin
and git-annex-remote-rclone on the same rclone remote to ensure they are
compatible. It repeats the same test for all known layout modes.

Issue #7625
ncw pushed a commit that referenced this issue May 13, 2024
I'm hopeful that running these in parallel will not impact CI runtime
very much, but that likely depends on the number of CPU cores and
whether the tmp filesystem is backed by memory vs a physical disk.

Issue #7625
ncw pushed a commit that referenced this issue May 13, 2024
Now that e2e tests are running in parallel, undoing the chdir to the
temp dir was causing flaky failures on cleanup. We don't need it anyway
because the worrisome subcommands have their working directory
controlled by `runInRepo()`.

Issue #7625
ncw pushed a commit that referenced this issue May 13, 2024
For each layout mode, these tests start with a git-annex-remote-rclone
remote, migrate it to a git-annex-remote-rclone-builtin remote. They
verify that a file copied pre-migration is still present and that `git
annex testremote` passes.

Issue #7625
ncw pushed a commit that referenced this issue May 13, 2024
This enables gitannex end-to-end tests to run on CI. Otherwise, the
version would not match and tests that check the rclone version would
fail like so:

```
=== RUN   TestEndToEnd
    e2e_test.go:199: Skipping due to rclone version: expected version "v1.67.0-DEV", but got "v1.67.0-beta.7905.220bbe24d.merge"
--- SKIP: TestEndToEnd (0.07s)
```

Issue #7625
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants