Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support renaming #202

Closed
wants to merge 2 commits into from
Closed

Conversation

davidbrochart
Copy link
Collaborator

Closes #201.

@davidbrochart davidbrochart marked this pull request as draft November 4, 2022 11:19
@codecov
Copy link

codecov bot commented Nov 4, 2022

Codecov Report

Merging #202 (ae3ed0d) into main (ae3ed0d) will not change coverage.
The diff coverage is n/a.

❗ Current head ae3ed0d differs from pull request most recent head d1dcba1. Consider uploading reports for the commit d1dcba1 to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #202   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          397       397           
  Branches        82        82           
=========================================
  Hits           397       397           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae3ed0d...d1dcba1. Read the comment docs.

Copy link
Owner

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise looks like a good start.

@@ -37,7 +40,7 @@ enum WatcherEnum {

#[pyclass]
struct RustNotify {
changes: Arc<Mutex<HashSet<(u8, String)>>>,
changes: Arc<Mutex<HashSet<(u8, String, String)>>>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
changes: Arc<Mutex<HashSet<(u8, String, String)>>>,
changes: Arc<Mutex<HashSet<(u8, Option<String>, String)>>>,

maybe? Then you can use None for the src for added, moved and deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I tried that before, and run into issues that I didn't really understand.

Comment on lines 29 to +32
const CHANGE_ADDED: u8 = 1;
const CHANGE_MODIFIED: u8 = 2;
const CHANGE_DELETED: u8 = 3;
const CHANGE_MOVED: u8 = 4;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably use an enum for these.

Either an enum with values, or:

enum Change {
  Added(String),
  Modified(String),
  Deleted(String),
  Moved(String, String),
}

Then implement ToPyObject on Change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will do that later. This is the first time I write Rust, so it's all new for me 😄

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then I'm impressed.

Let me know if you have more questions, or if you want some help with this.

Lots of examples of ToPyObject in pydantic-core.

Also github copolit is very useful when you're new to rust (or indeed not new to it), it makes some very helpful suggestions.

}
EventKind::Modify(ModifyKind::Name(RenameMode::Both)) => {
let mut changes = changes_clone.lock().unwrap();
changes.remove(&(CHANGE_DELETED, path.clone(), EMPTY_STRING));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually happen? If so, I guess we should remove added here too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens when a file is moved out of the directory. In this case we want to see it as deleted, which is handled in the RenameMode::From event. But if the file is moved in the same directory, we want to see it as moved, so we need to remove the CHANGE_DELETED. I didn't find a better way to do it.
When a file is moved in the directory from outside, we want to see it as added, this works fine already.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

@davidbrochart
Copy link
Collaborator Author

davidbrochart commented Nov 4, 2022

It's getting difficult to handle the macOS case, where renaming a file doesn't provide the src/dst paths. I'm starting to think that the other option you mentioned would be better:

process the simplified stream of events that we produce now and try to identify renaming.

@davidbrochart
Copy link
Collaborator Author

Yes, I'll open another PR where we post-process events in Python, but we keep track in Rust of "rename from X" and "rename to Y" events, and I'll get rid of the "rename from X to Y" event. I think it will make it easier to handle these special cases like the one on macOS.

@samuelcolvin
Copy link
Owner

Are you really sure macos events don't include src and dst? Where did you find that?

I know events aren't very good in macos (compared to linux), but I'm surprised by that. I'll check on my mac when I get home.

Sorry you didn't have more luck with this.

@samuelcolvin
Copy link
Owner

One option could be to collect Rename::To events and Rename:From events, then only try to fit them together to form rename changes, rather than look at all changes. The problem with this is we loose rename events for the polling watcher which would be sad given I think quite a few people are using the poll watcher.

See RenameMode

@davidbrochart
Copy link
Collaborator Author

Are you really sure macos events don't include src and dst? Where did you find that?

From these comments, and the tests confirm that.

Sorry you didn't have more luck with this.

No worries, maybe this is how my journey to Rust begins 😄

One option could be to collect Rename::To events and Rename:From events, then only try to fit them together to form rename changes, rather than look at all changes.

Yes, that's what I meant in my comment.

@davidbrochart
Copy link
Collaborator Author

Thinking about it, it is impossible to only use Rename:From and Rename:To to infer that a file was renamed from one path to another, because there could be multiple files renamed in a single change set, right?

@samuelcolvin
Copy link
Owner

Thinking about it, it is impossible to only use Rename:From and Rename:To to infer that a file was renamed from one path to another, because there could be multiple files renamed in a single change set, right?

Well that's true, there can be hundreds of files changed in a single set (e.g. when you change branch in git), and we have no file hash to de-duplicate with.

The only options are:

  • match "move froms" to "move tos" via file name for all files (e.g. added, deleted, Rename:From, Rename:To) and accept we'll get lots of false positives - this would have the added advantage of working for the polling watcher as well as other watchers
  • match "move froms" to "move tos" via file name for Rename:From & Rename:To events only and therefore - with this we could get some false matches, but we wouldn't get lots of false positives, but I guess it wouldn't work for the polling watcher

I think we cal also get date modified and maybe even size on a file from the OS without reading the file - if we had them, we could do a much better job of matching files. WDYT?

@davidbrochart
Copy link
Collaborator Author

I think we cal also get date modified and maybe even size on a file from the OS without reading the file - if we had them, we could do a much better job of matching files. WDYT?

Yes, that's what is done in jupyter_server_fileid, but I thought we wouldn't have to do that if we used Notify. In the end, it seems that move events only work on Linux?

@samuelcolvin
Copy link
Owner

It sounds like we have to go the whole hot:

  • detect move events with from and to on linux and mark them as a "move"
  • collect "move from" and "move to" events and put them in a set to be matched at the end of a step
  • optionally do the same for all events if we have the polling watcher

I think to avoid this being a breaking change, and to avoid extra processing when this isn't required, we should put this logic behind a flag/argument parameter.

Do you agree with that?

I haven't reviewed this PR recently, do you want me to review it or should we work on this separeately?

Also, are you willing to have a crack at this, or do you want me to work on it? Would be great if you could, but I understand it's quite a complex feature to start your rust career. I'm happy to work on it, but won't be for a few weeks at least.

@davidbrochart
Copy link
Collaborator Author

Do you agree with that?

This seems like a good plan.

I haven't reviewed this PR recently, do you want me to review it or should we work on this separeately?

I reverted my last attempts and got back to the start, where I just assumed that Rename:Both was working on all platforms. But I think it's better to open a new PR.

Also, are you willing to have a crack at this, or do you want me to work on it?

I'm not sure, it seems that it will take me a lot of time, and you will probably do a much better work.

@samuelcolvin
Copy link
Owner

okay, makes sense. I'll try and work on it when I get time.

Thanks so much for investigating this, it's made my work much easier.

I'll leave this open to remind me about it.

@samuelcolvin
Copy link
Owner

closing this as I haven't needed it so haven't work on it, happy to review a new PR if someone wants to work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking file rename
2 participants