-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ORA remote more aware of the remote OS when using SSH #7549
base: maint
Are you sure you want to change the base?
Conversation
I was surprised by the CI failures. These were the failed tests:
The Appveyor macOS tests pass. I also noticed that 1, 2, and 4 are using Python 3.7, despite its EOL. |
makes sense to me, rerunning failed jobs but might be related given the specific of failure (both OSX) ;-) |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## maint #7549 +/- ##
==========================================
- Coverage 91.25% 91.24% -0.02%
==========================================
Files 325 325
Lines 43428 43430 +2
Branches 5778 5779 +1
==========================================
- Hits 39630 39626 -4
- Misses 3783 3789 +6
Partials 15 15 ☔ View full report in Codecov by Sentry. |
Looking at the results after the rerun, the only failing test is now:
which did not fail before. Looking at the recent macOS test runs for different PRs, I see that the same error cropped up at least once 1 (run 7645277233) but passed in most of them (e.g. all green run 7782008144). All these runs use the same git-annex version. I don't see the connection to the code changes, and will therefore re-run the macOS tests once again 🤞 One observation in case the failure persists or happens elsewhere: why would the Footnotes
|
indeed we had it failed elsewhere as well, but there it seems to fail consistently and only on OSX and this PR does concern only with OSX... didn't look deeper but it might be just that this PR unravels otherwise not previously observed issue edit: this particular FAILED is quite recent (git)smaug:/mnt/datasets/datalad/ci/logs[master]git
$> datalad foreach-dataset --o-s relpath -r -J10 git grep 'FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result'
2024/01/24/pr/7550/4abd2b2/github-Test on macOS-4857-failed/0_test (brew).txt:2024-01-24T21:10:29.4942580Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
2024/01/24/pr/7550/4abd2b2/github-Test on macOS-4857-failed/test (brew)/8_Run tests.txt:2024-01-24T21:10:29.4942580Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
2024/01/31/pr/7553/ac778d5/github-CrippledFS-4249-failed/0_test.txt:2024-01-31T21:36:28.8425784Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
2024/01/31/pr/7553/ac778d5/github-CrippledFS-4249-failed/test/8_Run tests.txt:2024-01-31T21:36:28.8425781Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
2024/02/05/pr/7556/6411506/github-Test on macOS-4873-failed/1_test (snapshot).txt:2024-02-05T02:50:12.4644640Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
2024/02/05/pr/7556/6411506/github-Test on macOS-4873-failed/test (snapshot)/8_Run tests.txt:2024-02-05T02:50:12.4644640Z FAILED ../distributed/tests/test_push.py::test_force_checkdatapresent - AssertionError: Desired result
datalad foreach-dataset --o-s relpath -r -J10 git grep 67.02s user 79.67s system 153% cpu 1:35.31 total |
Thanks for checking! Alas, re-running test on macOS for this PR after my previous comment I got I looked a little deeper but I'm on Debian. So far I can't reproduce locally (using latest autobuild git-annex), and still don't have a good clue of what might be going on. |
Analysis and patch are taken more-or-less exactly from datalad/datalad#7549. Thanks @mslw! The patch is slightly modified to be more compact, and work by replacing `SSHRemoteIO.ensure_writeable()` alone. Fixes datalad/datalad#7536
The ORA remote (more specifically, its `SSHRemoteIO.ensure_writeable` context manager) runs the `stat` command on the remote end to check file permissions. The command format and its output differ between macOS and Linux. Previously, the format was chosen based on the *local* OS, which makes no sense for a command that is executed on the *remote* end. This led to situations where it was impossible to push data from Mac to Linux because the stat command errored out, and permissions were left incorrect, when moving a file from `transfer` to its desired location. With this change, the ORA remote will run `uname -S` on the remote end (of an SSH connection) to determine where it is operating. To the best of my knowledge (https://en.wikipedia.org/wiki/Uname), checking for "Darwin" should be good.
This adds a property `remote_uname`, with lazy resolution (by running `uname -s` on the remote end), to the ORA remote's `SHRemoteIO` class. Thus, the SSH commands can operate with an awareness of the remote OS, if needed. One situation when this is needed is the class's `ensure_writable`, which needs to run `stat` with an OS-specific command (and output) format. Having access to the property avoids re-running `uname` for every file that needs to be touched, and having the property lazily resolved avoids running `uname` for operations which don't need it. Note that although `uname` is UNIX-specific, the class docstring already says that "It doesn't even think about a windows server".
Hey, I rebased the pull request onto the 1.0 maint, and all tests passed, no doubt thanks to #7581 (see checklist for The mac tests now fail at setting up python 3.7, but that is a completely new storyline... One big caveat is that the datalad test setup can not test operation between OSs, so we have to rely on manual testing. I don't have a Mac, but @christian-monch does and I heard from him (please confirm) that the patch indeed works as intended. @yarikoptic would you consider merging on that basis? |
The ORA remote (more specifically, its
SSHRemoteIO.ensure_writeable
context manager) runs thestat
command on the remote end to check file permissions. The command format and its output differ between macOS and Linux ("It doesn't even think about a windows server").Previously, the format was chosen based on the local OS, which makes no sense for a command that is executed on the remote end. This led to situations where it was impossible to push data from Mac to Linux because the stat command errored out, and permissions were left incorrect, when moving a file from
transfer
to its desired location - see #7536With this change, the ORA remote will run
uname -S
on the remote end (of an SSH connection) to determine where it is operating, if it needs to. To the best of my knowledge (https://en.wikipedia.org/wiki/Uname), checking for "Darwin" should be sufficient to detect macOS.This PR adds
remote_uname
as a lazy property of ORA remote'sSSHRemoteIO
class, which is accessed before running saidstat
commands (replacing a locally operating check ofon_osx
).Having access to the property avoids re-running
uname
for every file that needs to be touched, and having the property lazilyresolved avoids running
uname
for operations which don't need it.This should fix #7536