Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix removeSync() to eliminate spurious ENOTEMPTY errors on Windows #646

Merged
merged 4 commits into from Nov 7, 2018

Conversation

octogonz
Copy link
Contributor

@octogonz octogonz commented Nov 2, 2018

Lately I've been seeing a lot of errors like this on Windows:

ERROR: ENOTEMPTY: directory not empty, rmdir 'C:\some\folder'

The Windows OS prevents a folder from being removed while file handles are still open. Certain processes such as virus scanners and compiler environments watch folders for changes. After a folder is removed, they take a little while to realize this and unregister their handlers. In my empirical tests, it can occasionally take around 100ms - 200ms.

fs-extra already provides some protection against this:

rimraf.js

  // We only end up here once we got ENOTEMPTY at least once, and
  // at this point, we are guaranteed to have removed all the kids.
  // So, we know that it won't be ENOENT or ENOTDIR or anything else.
  // try really hard to delete stuff on windows, because it has a
  // PROFOUNDLY annoying habit of not closing handles promptly when
  // files are deleted, resulting in spurious ENOTEMPTY errors.
  const retries = isWindows ? 100 : 1
  let i = 0
  do {
    let threw = true
    try {
      const ret = options.rmdirSync(p, options)
      threw = false
      return ret
    } finally {
      if (++i < retries && threw) continue // eslint-disable-line
    }
  } while (true)

However, this algorithm relies on a retry count to decide when to give up. If the PC is idle or has very fast hardware, then 100 retries is way too little. We could increase the limit, but the units are arbitrary.

This PR changes the loop to measure actual time, since that's more representative of the underlying problem.

@octogonz octogonz changed the title Fix removeSync() to measure its retry interval in milliseconds instead of CPU cycles Fix removeSync() to eliminate spurious ENOTEMPTY errors on Windows Nov 2, 2018
standard: Use JavaScript Standard Style (https://standardjs.com)
standard: Run `standard --fix` to automatically fix some problems.
  /home/travis/build/jprichardson/node-fs-extra/lib/remove/rimraf.js:300:33: Extra semicolon.
  /home/travis/build/jprichardson/node-fs-extra/lib/remove/rimraf.js:302:42: Extra semicolon.
  /home/travis/build/jprichardson/node-fs-extra/lib/remove/rimraf.js:309:5: Block must not be padded by blank lines
```
@octogonz
Copy link
Contributor Author

octogonz commented Nov 2, 2018

@RyanZim @manidlou One of the CI jobs is failing with this error?

> fs-extra@7.0.0 lint /home/travis/build/jprichardson/node-fs-extra
> standard && standard-markdown
path.js:28
    throw new TypeError('Path must be a string. Received ' + inspect(path));
    ^
TypeError: Path must be a string. Received undefined
    at assertPath (path.js:28:11)
    at Object.resolve (path.js:1171:7)
    at Command.<anonymous> (/home/travis/build/jprichardson/node-fs-extra/node_modules/standard-markdown/cli.js:32:28)
    at Command.listener (/home/travis/build/jprichardson/node-fs-extra/node_modules/commander/index.js:315:8)

This seems unrelated to my change. Any idea what it means?

@RyanZim
Copy link
Collaborator

RyanZim commented Nov 3, 2018

Yeah, linter isn't working right.

@octogonz
Copy link
Contributor Author

octogonz commented Nov 6, 2018

@RyanZim what's the expected turnaround time for merging PRs? This issue is currently a nuisance for many people. If it will take a while, we can try to find another solution for deleting the folders in the meantime.

@octogonz
Copy link
Contributor Author

octogonz commented Nov 7, 2018

@jprichardson

@RyanZim
Copy link
Collaborator

RyanZim commented Nov 7, 2018

Does remove() have this same problem? If so, we should fix both at the same time.

@octogonz
Copy link
Contributor Author

octogonz commented Nov 7, 2018

@RyanZim hmmm...

Its heuristic does measure clock time, but in a strange way that also counts the time spent iterating the loop:

function rimraf (p, options, cb) {
  let busyTries = 0

  if (typeof options === 'function') {
    cb = options
    options = {}
  }

  assert(p, 'rimraf: missing path')
  assert.strictEqual(typeof p, 'string', 'rimraf: path should be a string')
  assert.strictEqual(typeof cb, 'function', 'rimraf: callback function required')
  assert(options, 'rimraf: invalid options argument provided')
  assert.strictEqual(typeof options, 'object', 'rimraf: options should be object')

  defaults(options)

  rimraf_(p, options, function CB (er) {
    if (er) {
      if ((er.code === 'EBUSY' || er.code === 'ENOTEMPTY' || er.code === 'EPERM') &&
          busyTries < options.maxBusyTries) {
        busyTries++
        const time = busyTries * 100
        // try again, with the same exact callback as this one.
        return setTimeout(() => rimraf_(p, options, CB), time)
      }

      // already gone
      if (er.code === 'ENOENT') er = null
    }

    cb(er)
  })
}

Ideally it should be improved to use the same measurement strategy and time limit constant as removeSync(). However, my projects generally avoid NodeJS's async APIs for performance reasons. So if I fix that one, I don't have a real world project handy to test the PR.

That said, I'd be willing to make that fix. But it may take me a few days to get some time. Since the original issue is causing a real problem, and people are pestering me for a fix, would you be okay with merging this PR, and handling that issue as a separate PR?

@RyanZim
Copy link
Collaborator

RyanZim commented Nov 7, 2018

I'd prefer to keep sync and async methods in lockstep together.

@iclanton
Copy link

iclanton commented Nov 7, 2018

@pgonzal mentioned that file handles usually take 100ms to 200ms to close. In its default configuration, the async rimraf function tries, waits 100ms and retries, and then waits 200ms and retries again comfortably covering the time it should take for the file handle to be released.

For a symmetrical change to happen to the async function, I think we'd have to make an actual API change to make the retry logic based on clock time instead of number of retries, which seems pretty out-of-scope for this change and unnecessary.

@RyanZim
Copy link
Collaborator

RyanZim commented Nov 7, 2018

OK, merging by itself.

@RyanZim RyanZim merged commit ddc1a2f into jprichardson:master Nov 7, 2018
@RyanZim
Copy link
Collaborator

RyanZim commented Nov 7, 2018

Published in 7.0.1

@octogonz
Copy link
Contributor Author

octogonz commented Nov 7, 2018

@RyanZim thanks! How do you feel about changing the API for the async remove() to ask for number of milliseconds instead of number of retries? Is it worth the trouble? (What happens to the old api?) If so I'm still willing to help out.

@RyanZim
Copy link
Collaborator

RyanZim commented Nov 7, 2018

Since there's an existing option for this, I'd say we leave it as it is for now.

octogonz pushed a commit to microsoft/rushstack that referenced this pull request Nov 7, 2018
…EMPTY: directory not empty, rmdir" for FileSystem.deleteFolder()

See jprichardson/node-fs-extra#646
facebook-github-bot pushed a commit to facebook/flipper that referenced this pull request Nov 14, 2018
Summary:
Changes are mostly bug fixes, that shouldn't affect us. From the change log:

7.0.1 / 2018-11-07
------------------

- Fix `removeSync()` on Windows, in some cases, it would error out with `ENOTEMPTY` ([#646](jprichardson/node-fs-extra#646))
- Document `mode` option for `ensureDir*()` ([#587](jprichardson/node-fs-extra#587))
- Don't include documentation files in npm package tarball ([#642](jprichardson/node-fs-extra#642), [#643](jprichardson/node-fs-extra#643))

7.0.0 / 2018-07-16
------------------

- **BREAKING:** Refine `copy*()` handling of symlinks to properly detect symlinks that point to the same file. ([#582](jprichardson/node-fs-extra#582))
- Fix bug with copying write-protected directories ([#600](jprichardson/node-fs-extra#600))
- Universalify `fs.lchmod()` ([#596](jprichardson/node-fs-extra#596))
- Add `engines` field to `package.json` ([#580](jprichardson/node-fs-extra#580))

6.0.1 / 2018-05-09
------------------

- Fix `fs.promises` `ExperimentalWarning` on Node v10.1.0 ([#578](jprichardson/node-fs-extra#578))

6.0.0 / 2018-05-01
------------------

- Drop support for Node.js versions 4, 5, & 7 ([#564](jprichardson/node-fs-extra#564))
- Rewrite `move` to use `fs.rename` where possible ([#549](jprichardson/node-fs-extra#549))
- Don't convert relative paths to absolute paths for `filter` ([#554](jprichardson/node-fs-extra#554))
- `copy*`'s behavior when `preserveTimestamps` is `false` has been OS-dependent since 5.0.0, but that's now explicitly noted in the docs ([#563](jprichardson/node-fs-extra#563))
- Fix subdirectory detection for `copy*` & `move*` ([#541](jprichardson/node-fs-extra#541))
- Handle case-insensitive paths correctly in `copy*` ([#568](jprichardson/node-fs-extra#568))

Reviewed By: jknoxville

Differential Revision: D13023753

fbshipit-source-id: 1ecc6f40be4c8146f92dd29ede846b5ab56765ea
@robross0606
Copy link

robross0606 commented Apr 9, 2020

Running into this issue with unit test cleanup on fs-extra v9.0.0. Is there any workaround for this problem? I don't think it is just a timing issue because I am able to set a breakpoint just prior to calling removeSync. I can go look in the folder and manually delete files, proving there are no locks or file handles. This breakpoint also provides more than enough delay for handles to be released. If I manually delete all the files in the folder, the call to removeSync deletes the folder. If I simply progress forward from my breakpoint without deleting anything in the folder, removeSync cleans all the files out of the folder but leaves the folder itself in place. Also, calling removeSync twice in a row with a delay between them deletes the files out of the folder with the first call and deletes the folder itself on the second call.

@octogonz
Copy link
Contributor Author

octogonz commented Apr 9, 2020

I don't think it is just a timing issue because I am able to set a breakpoint just prior to calling removeSync. I can go look in the folder and manually delete files, proving there are no locks or file handles. This breakpoint also provides more than enough delay for handles to be released. If I manually delete all the files in the folder, the call to removeSync deletes the folder.

Is it possible that the unit test code has a race condition? For example, an async function that is suspended while you are stopped in the debugger? The behavior you described sounds like it might be a problem with the calling code, not with removeSync.

To eliminate that concern, you could try to make a small isolated script that reproduces the bug without relying on any test frameworks. I can attest that removeSync has been working pretty reliably since we made this fix.

@RyanZim
Copy link
Collaborator

RyanZim commented Apr 9, 2020

If you get a reduced test case, please open a new issue about it, instead of commenting on this old thread. Thanks!

Repository owner locked as resolved and limited conversation to collaborators Apr 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants