Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Recursive mkdir (-p/--parents) support #475

Open
sirosen opened this issue Sep 30, 2019 · 5 comments
Open

Feature request: Recursive mkdir (-p/--parents) support #475

sirosen opened this issue Sep 30, 2019 · 5 comments
Labels
blocked There's some reason we can't do this enhancement New feature or improvement needs discussion Some decision is needed

Comments

@sirosen
Copy link
Member

sirosen commented Sep 30, 2019

We've had this come in as a feature request via support, and it is desirable, but not as "obvious" or "trivial" as it might at first appear.

Without diving into why Transfer itself doesn't support recursive mkdir (or, by extension, why FTP and GridFTP don't), here are my basic thoughts on doing this purely client side:

  1. In the interactive case with text output, it's near-trivial to implement. "Just recursively call mkdir."

This is the easy case and hardly worth discussing.

  1. This would be similar to recursive ls.

Proper JSON output format handling requires proxifying the transfer interaction to wrap multiple responses. Recursive ls does this and works fine (we've never had a complaint about it), but it also makes globus ls probably our third most complicated command, after login and activate.

  1. Unifying recursive ls and recursive mkdir seems like a good goal.

If it's not too hard, the thing we probably would want to do is treat recursive ls and recursive mkdir as a family of commands which iterate over API calls and return wrapped responses. This will lead to less code and a better overall user experience as command behaviors stay consistent.

  1. This smells like pagination. Should wrapping of multiple calls and wrapping of paginated calls be unified in some way?

Could this even make its way into the SDK? (Is that something we want? I don't think so. Helping people do recursive ls and watching them blow up their memory space seems really unwise.)

@sirosen sirosen added enhancement New feature or improvement blocked There's some reason we can't do this needs discussion Some decision is needed labels Sep 30, 2019
@jaswilli
Copy link
Contributor

I'd probably punt on "-p" support for mkdir until the API itself supported it. Doing an artificial recursive ls is one thing but needing to handle errors in the middle of what's supposed to be an atomic operation doesn't sound great.

For example, if you create one of n directories and get an error, what do you do? Try to issue more API requests to roll it back? What if those also fail? Not really something I want to own as a client library.

@sirosen
Copy link
Member Author

sirosen commented Sep 30, 2019

For example, if you create one of n directories and get an error, what do you do? Try to issue more API requests to roll it back? What if those also fail? Not really something I want to own as a client library.

This is a great point, and one that I hadn't even considered since I don't plan on hopping to it today.

If we were dedicated to doing this client-side, I would say we should just error-out in this case, leaving the endpoint in whatever state the series of mkdir calls had left it. It's unreliable and "bad" behavior, but current users are likely to just

# use a for-loop to create "${epid}:/some/path/to/file/name"
cur="${epid}:/some"
for x in path to file name; do
  cur="${cur}/${x}"
  globus mkdir "$cur"
done

which is the same exact thing (slightly worse if you don't set -e).

If it's documented that globus mkdir -p may create some directories and fail, I don't think it's a huge deal -- anyone using it probably doesn't care, and if you do care you can implement your own cleanup -- but it's the sort of thing that I'd rather not wade into if Transfer adding recursive: true or parents: true is something we're going to add. (I just don't know that it is, and ultimately the CLI is a convenience tool.)

@jimPruyne
Copy link

My two cents FWIW: I'd agree with @jaswilli that the various error conditions make this scary and punt worthy unless there's someone requesting it with good reason.

I'd imagine in most cases if it is going to fail, it would at the first-level, but those intermediate states seem unhappy. Is actual mkdir -p atomic? What is its behavior on failure (and can we even create a situation where we can test it except maybe filling a filesystem to just the right amount)?

If per @sirosen 's argument that this is a convenience tool then the key is documentation especially when reporting the error at run-time (rather than relying on /a priori/ documented behavior that explains what could happen). Ideally it would report what we believe to be the final state after the error-based abort.

@lgorenstein
Copy link

From the utilitarian user prospective, I'd say that it would be totally OK if a failed mkdir -p would just throw a failed exit code but leave behind whatever breadcrumbs it was able to do by then, and do not try to undo.

My C is rusty and almost non-existent, but a cursory read of mkdir(1) sources
(https://github.com/coreutils/coreutils/blob/master/src/mkdir.c and https://github.com/coreutils/gnulib/blob/master/lib/mkdir-p.c) does not look like they are atomic either. I.e. if
mkdir -p this/is/a/nested/dir runs into an error condition while creating nested, the preceding this/is/a would still be made and not rolled back. So the "If you die trying, still leave the legacy of what you managed to accomplish" seems like a reasonable behavior.

@sirosen
Copy link
Member Author

sirosen commented Oct 2, 2019

Either done in a client library or done in the Transfer backend, this will have to be done in stages and run the risk of failing partway through. GridFTP MKDIR doesn't support recursive mkdir, so it's just a matter of who does the work of supporting it.

However, I'd like to know whether or not we're possibly going to get backend support for this before we seriously consider a pure client-side solution. Otherwise, we'll have a client-side implementation of the --parents option which doesn't match the backend behavior and things become a mess.

I have several reasons for preferring API support. Primarily it being available to all API consumers and the possibility of defining a distinct response payload for mkdir -p with partial success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked There's some reason we can't do this enhancement New feature or improvement needs discussion Some decision is needed
Projects
None yet
Development

No branches or pull requests

4 participants