Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Workers Sites asset sync reliability #3098

Merged
merged 5 commits into from
Apr 28, 2023
Merged

Conversation

mrbbot
Copy link
Contributor

@mrbbot mrbbot commented Apr 27, 2023

Fixes #2223
Fixes #2245

What this PR solves / how to test:

This PR aims to improve the reliability of syncing assets to a Workers Sites KV namespace when publishing a Worker. In particular...

  • When splitting upload into buckets, just store the file path, not the full content. This means we have to read the contents twice, but avoids buffering all content before uploading, causing OOMs.
  • Limit in-flight bulk upload requests to 5, avoiding Too many bulk operations already in progress. error.
  • Fix logging of upload progress. Previous, progress was logged per upload bucket, which doesn't really make sense to end users. Now, upload progress is across all files in all buckets.
  • Only log first 100 changed assets by default. The rest can be shown by setting WRANGLER_LOG=debug. This avoids console spam when uploading sites with 1000s of files. A little bit of colour has also been added to the diff. :)

To test, follow the reproduction instructions here: https://github.com/jdddog/wrangler-kv-upload-failure. You should be able to upload the site. Massive thank you to @jdddog for the super detailed reproduction, and reporting these issues in the first place. 😃

Associated docs issue(s)/PR(s):

None yet. Do people think we need one for the first 100 assets thing? If users hit this, they'll see a message telling them to set WRANGLER_LOG=debug to see the rest.

Author has included the following, where applicable:

Reviewer has performed the following, where applicable:

  • Checked for inclusion of relevant tests
  • Checked for inclusion of a relevant changeset
  • Checked for creation of associated docs updates
  • Manually pulled down the changes and spot-tested

- When splitting upload into buckets, just store the file path, not
  the full content. This means we have to read the contents twice,
  but avoids buffering all content before uploading, causing OOMs.
- Limit in-flight bulk upload requests to 5, avoiding
  `Too many bulk operations already in progress.` error.
- Fix logging of upload progress. Previous, progress was logged per
  upload bucket, which doesn't really make sense to end users. Now,
  upload progress is across all files in all buckets.
- Only log first 100 changed assets by default. The rest can be shown
  by setting `WRANGLER_LOG=debug`. This avoids console spam when
  uploading sites with 1000s of files. A little bit of colour has
  also been added to the diff. :)

Closes #2223
Closes #2245
@mrbbot mrbbot requested a review from a team as a code owner April 27, 2023 11:22
@changeset-bot
Copy link

changeset-bot bot commented Apr 27, 2023

🦋 Changeset detected

Latest commit: 1881f74

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
wrangler Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Contributor

github-actions bot commented Apr 27, 2023

A wrangler prerelease is available for testing. You can install this latest build in your project with:

npm install --save-dev https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/4829801399/npm-package-wrangler-3098

You can reference the automatically updated head of this PR with:

npm install --save-dev https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/prs/3098/npm-package-wrangler-3098

Or you can use npx with this latest build directly:

npx https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/4829801399/npm-package-wrangler-3098 dev path/to/script.js
Additional artifacts:
npm install https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/4829801399/npm-package-cloudflare-pages-shared-3098

Note that these links will no longer work once the GitHub Actions artifact expires.

@codecov
Copy link

codecov bot commented Apr 27, 2023

Codecov Report

Merging #3098 (1881f74) into main (6f5259f) will increase coverage by 0.16%.
The diff coverage is 98.79%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3098      +/-   ##
==========================================
+ Coverage   74.26%   74.42%   +0.16%     
==========================================
  Files         168      168              
  Lines       10530    10585      +55     
  Branches     2815     2831      +16     
==========================================
+ Hits         7820     7878      +58     
+ Misses       2710     2707       -3     
Impacted Files Coverage Δ
packages/wrangler/src/sites.ts 95.33% <98.75%> (+1.13%) ⬆️
packages/wrangler/src/kv/helpers.ts 93.38% <100.00%> (ø)

... and 3 files with indirect coverage changes

Copy link
Contributor

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor concerns but no blocking.
Any chance of a test of one of the buckets failing and the others being aborted?

const title = `__${scriptName}-workers_sites_assets${
preview ? "_preview" : ""
}`;

logger.info("Fetching list of already uploaded assets...");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this log in the correct place? Shouldn't it go after the call to createKVNamespaceIfNotAlreadyExisting()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// doesn't matter *too* much: we know buckets will be about 100MB, so
// with 5 uploaders, we could load about 500MB into memory (+ extra
// object keys/tags/copies/etc).
const bucket = await Promise.all(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we try to read all the files in a bucket in parallel at the same time, right?
If the files were all very small then you could end up with 1000s in a single bucket.
Is there a concern that the OS/node.js will blow up due to running out of file handles?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see the bucket will never have more than 5000 files. But still...?

Copy link
Contributor Author

@mrbbot mrbbot Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this seems to work...

import fs from "fs/promises";

for (let i = 0; i < 10_000; i++) {
    await fs.writeFile(`dir/${i.toString().padStart(5, "0")}.txt`, "value");    
}

const bucket = await Promise.all(
    Array.from({length: 10_000}).map(async (_, i) => (
        fs.readFile(`dir/${i.toString().padStart(5, "0")}.txt`, "base64")
    ))
);

console.log(bucket);

It's possible Node.js is queuing these internally, or the files are too small so they finish reading really quickly and we never hit the limit. In any case, happy to make this serial, I don't think it will slow things down too much.

31902ec

Ensure publish tests not dependent on bucket upload order
@mrbbot
Copy link
Contributor Author

mrbbot commented Apr 28, 2023

Added a test for upload failure in 7ec9937. A little concerned all uploads might get through and the expect(requestCount).toBeLessThan(3); assertion might fail, but seems unlikely. Could you think of alternative way of checking for aborted uploads? I've added a log for aborting that we look for in the snapshot.

Also fixed a flakey test in 1881f74 that required a specific bucket upload order.

@mrbbot mrbbot merged commit 8818f55 into main Apr 28, 2023
11 checks passed
@mrbbot mrbbot deleted the bcoll/fix-sites-assets-sync branch April 28, 2023 12:20
@github-actions github-actions bot mentioned this pull request Apr 28, 2023
@jdddog
Copy link

jdddog commented Jun 6, 2023

You're welcome @mrbbot, thanks so much for fixing the issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants