Make ORA remote more aware of the remote OS when using SSH #7549

mslw · 2024-01-23T22:17:26Z

The ORA remote (more specifically, its SSHRemoteIO.ensure_writeable context manager) runs the stat command on the remote end to check file permissions. The command format and its output differ between macOS and Linux ("It doesn't even think about a windows server").

Previously, the format was chosen based on the local OS, which makes no sense for a command that is executed on the remote end. This led to situations where it was impossible to push data from Mac to Linux because the stat command errored out, and permissions were left incorrect, when moving a file from transfer to its desired location - see #7536

With this change, the ORA remote will run uname -S on the remote end (of an SSH connection) to determine where it is operating, if it needs to. To the best of my knowledge (https://en.wikipedia.org/wiki/Uname), checking for "Darwin" should be sufficient to detect macOS.

This PR adds remote_uname as a lazy property of ORA remote's SSHRemoteIO class, which is accessed before running said stat commands (replacing a locally operating check of on_osx).

Having access to the property avoids re-running uname for every file that needs to be touched, and having the property lazily
resolved avoids running uname for operations which don't need it.

This should fix #7536

mslw · 2024-01-24T18:16:16Z

I was surprised by the CI failures. These were the failed tests:

Test on macOS / test (snapshot) (pull_request): ../local/tests/test_diff.py::test_path_diff
Test on macOS / test (brew) (pull_request): ../tests/test_annexrepo.py::test_annex_copy_to
Appveyor Ubu20a1: ../datalad/downloaders/tests/test_http.py::test_download_ftp
Appveyor Ubu20P37b: ../datalad/downloaders/tests/test_http.py::test_download_ftp

The Appveyor macOS tests pass. I also noticed that 1, 2, and 4 are using Python 3.7, despite its EOL.

yarikoptic · 2024-02-02T21:58:25Z

makes sense to me, rerunning failed jobs but might be related given the specific of failure (both OSX) ;-)

codecov · 2024-02-02T22:45:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.24%. Comparing base (8639b90) to head (df864e0).
Report is 3 commits behind head on maint.

Additional details and impacted files

@@            Coverage Diff             @@
##            maint    #7549      +/-   ##
==========================================
- Coverage   91.25%   91.24%   -0.02%     
==========================================
  Files         325      325              
  Lines       43428    43430       +2     
  Branches     5778     5779       +1     
==========================================
- Hits        39630    39626       -4     
- Misses       3783     3789       +6     
  Partials       15       15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mslw · 2024-02-05T11:16:45Z

Looking at the results after the rerun, the only failing test is now:

../distributed/tests/test_push.py::test_force_checkdatapresent (that's core/distributed/...)

which did not fail before.

Looking at the recent macOS test runs for different PRs, I see that the same error cropped up at least once ¹ (run 7645277233) but passed in most of them (e.g. all green run 7782008144). All these runs use the same git-annex version.

I don't see the connection to the code changes, and will therefore re-run the macOS tests once again 🤞

One observation in case the failure persists or happens elsewhere: why would the test_force_checkdatapresent produce spurious results but only sometimes? In the test's code I see comments addressing changes in how git-annex handles timestamps (nb. the comment includes a link, but the link target says "HEAD" instead of the full ID of this commit), and the test asserts either "ok" or "notneeded" result depending on the git-annex version. I did not fully understand the nature of the changes, but maybe the behavior can be affected by when things happen (within a second or not), and the test does not sufficiently account for that?

For the methodical log greppers: FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result ↩

yarikoptic · 2024-02-05T16:51:29Z

indeed we had it failed elsewhere as well, but there it seems to fail consistently and only on OSX and this PR does concern only with OSX... didn't look deeper but it might be just that this PR unravels otherwise not previously observed issue

edit: this particular FAILED is quite recent

(git)smaug:/mnt/datasets/datalad/ci/logs[master]git
$> datalad foreach-dataset --o-s relpath -r -J10 git grep 'FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result'
2024/01/24/pr/7550/4abd2b2/github-Test on macOS-4857-failed/0_test (brew).txt:2024-01-24T21:10:29.4942580Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result                                                  
2024/01/24/pr/7550/4abd2b2/github-Test on macOS-4857-failed/test (brew)/8_Run tests.txt:2024-01-24T21:10:29.4942580Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
2024/01/31/pr/7553/ac778d5/github-CrippledFS-4249-failed/0_test.txt:2024-01-31T21:36:28.8425784Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
2024/01/31/pr/7553/ac778d5/github-CrippledFS-4249-failed/test/8_Run tests.txt:2024-01-31T21:36:28.8425781Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
2024/02/05/pr/7556/6411506/github-Test on macOS-4873-failed/1_test (snapshot).txt:2024-02-05T02:50:12.4644640Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result                                              
2024/02/05/pr/7556/6411506/github-Test on macOS-4873-failed/test (snapshot)/8_Run tests.txt:2024-02-05T02:50:12.4644640Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
datalad foreach-dataset --o-s relpath -r -J10 git grep   67.02s user 79.67s system 153% cpu 1:35.31 total

mslw · 2024-02-05T17:45:30Z

Thanks for checking!

Alas, re-running test on macOS for this PR after my previous comment I got test_force_checkdatapresent failing with both snapshot and brew. I still don't think it's related (the test pushes between two local repos) but I don't have good arguments and I understand that you first want to investigate further for underlying issues, related or not.

I looked a little deeper but I'm on Debian. So far I can't reproduce locally (using latest autobuild git-annex), and still don't have a good clue of what might be going on.

@mslw

Analysis and patch are taken more-or-less exactly from datalad/datalad#7549. Thanks @mslw! The patch is slightly modified to be more compact, and work by replacing `SSHRemoteIO.ensure_writeable()` alone. Fixes datalad/datalad#7536

The ORA remote (more specifically, its `SSHRemoteIO.ensure_writeable` context manager) runs the `stat` command on the remote end to check file permissions. The command format and its output differ between macOS and Linux. Previously, the format was chosen based on the *local* OS, which makes no sense for a command that is executed on the *remote* end. This led to situations where it was impossible to push data from Mac to Linux because the stat command errored out, and permissions were left incorrect, when moving a file from `transfer` to its desired location. With this change, the ORA remote will run `uname -S` on the remote end (of an SSH connection) to determine where it is operating. To the best of my knowledge (https://en.wikipedia.org/wiki/Uname), checking for "Darwin" should be good.

This adds a property `remote_uname`, with lazy resolution (by running `uname -s` on the remote end), to the ORA remote's `SHRemoteIO` class. Thus, the SSH commands can operate with an awareness of the remote OS, if needed. One situation when this is needed is the class's `ensure_writable`, which needs to run `stat` with an OS-specific command (and output) format. Having access to the property avoids re-running `uname` for every file that needs to be touched, and having the property lazily resolved avoids running `uname` for operations which don't need it. Note that although `uname` is UNIX-specific, the class docstring already says that "It doesn't even think about a windows server".

mslw · 2024-04-24T13:53:13Z

Hey, I rebased the pull request onto the 1.0 maint, and all tests passed, no doubt thanks to #7581 (see checklist for HEAD~1). The only thing that was red is codecov/project, and I don't think I can do anything about it.

The mac tests now fail at setting up python 3.7, but that is a completely new storyline...

One big caveat is that the datalad test setup can not test operation between OSs, so we have to rely on manual testing. I don't have a Mac, but @christian-monch does and I heard from him (please confirm) that the patch indeed works as intended.

@yarikoptic would you consider merging on that basis?

mslw added the semver-patch Increment the patch version when merged label Jan 24, 2024

mih mentioned this pull request Apr 17, 2024

Fix RIA store operations from Mac clients datalad/datalad-next#653

Merged

mslw added 2 commits April 22, 2024 22:39

mslw force-pushed the ria-darwin branch from a207acf to db9403b Compare April 22, 2024 20:40

Add changelog for remote OS detection in RIA (ORA)

df864e0

mslw force-pushed the ria-darwin branch from b049fdd to df864e0 Compare April 24, 2024 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ORA remote more aware of the remote OS when using SSH #7549

Make ORA remote more aware of the remote OS when using SSH #7549

mslw commented Jan 23, 2024

mslw commented Jan 24, 2024

yarikoptic commented Feb 2, 2024

codecov bot commented Feb 2, 2024 •

edited

mslw commented Feb 5, 2024

yarikoptic commented Feb 5, 2024 •

edited

mslw commented Feb 5, 2024

mslw commented Apr 24, 2024 •

edited

Make ORA remote more aware of the remote OS when using SSH #7549

Are you sure you want to change the base?

Make ORA remote more aware of the remote OS when using SSH #7549

Conversation

mslw commented Jan 23, 2024

mslw commented Jan 24, 2024

yarikoptic commented Feb 2, 2024

codecov bot commented Feb 2, 2024 • edited

Codecov Report

mslw commented Feb 5, 2024

Footnotes

yarikoptic commented Feb 5, 2024 • edited

mslw commented Feb 5, 2024

mslw commented Apr 24, 2024 • edited

codecov bot commented Feb 2, 2024 •

edited

yarikoptic commented Feb 5, 2024 •

edited

mslw commented Apr 24, 2024 •

edited