Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try fixing e2e network flakiness #4082

Merged
merged 3 commits into from Jan 9, 2022
Merged

Conversation

jtoar
Copy link
Contributor

@jtoar jtoar commented Jan 8, 2022

This is an attempt to fix network flakiness using the strategy in vercel/next.js#28264. I tried this out on my fork of redwood and it seemed to actually work (PR here: https://github.com/jtoar/redwood/pull/155), but I have no idea what this really does. (The original comment says that it "disables TCP/UCP offloading", but I don't really know what that entails.)

@jtoar jtoar added release:chore This PR is a chore (means nothing for users) topic/ops-&-contributing-dx v1/priority and removed triage/processing labels Jan 8, 2022
@jtoar jtoar self-assigned this Jan 8, 2022
@thedavidprice
Copy link
Contributor

@jtoar This is definitely interesting, especially given that 1) we've had external networking issues in the past (npm package installation) and 2) we don't have any visibility/control over how the services/containers are networked within the workflow, which could definitely be causing issues given that we spin up processes outside of other services.

Let's give it a shot!

Research link:

tx off rx off
Related to network receive/transfer and turning off the default pause. Seems reasonable to try.

Looks like we'll need to run some tests regarding eth0. See:

Not sure if we can assume unique name is same each time without further research. Will need to run sudo ip link show within a workflow to verify. And might need to use the return output to pass sudo ethtool -K <NIC_name> tx off rx off

Comment on lines +27 to +28
- name: Tune linux network
run: sudo ethtool -K eth0 tx off rx off
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this works, we should add to add workflows, setting a condition to run in case of Linux only.

@jtoar
Copy link
Contributor Author

jtoar commented Jan 9, 2022

Ok! I'll merge and rebase some of the existing PRs that are failing to see if this needs any more of the tweaks you mentioned.

@jtoar jtoar merged commit 79864e0 into main Jan 9, 2022
@jtoar jtoar deleted the ds-try-fixing-network-flakiness branch January 9, 2022 04:24
@redwoodjs-bot redwoodjs-bot bot added this to the next-release milestone Jan 9, 2022
@jtoar
Copy link
Contributor Author

jtoar commented Jan 10, 2022

@thedavidprice Here's the output of sudo ip link show:

image

Maybe we need to tune docker0 too?

       # For network flakiness. See https://github.com/vercel/next.js/pull/28264
       - name: Tune linux networks
         run: |
           sudo ethtool -K eth0 tx off rx off
+          sudo ethtool -K docker0 tx off rx off

@jtoar jtoar mentioned this pull request Jan 10, 2022
@thedavidprice thedavidprice modified the milestones: next-release, v0.42.0 Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:chore This PR is a chore (means nothing for users) topic/ops-&-contributing-dx
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants