Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix socket cpu migration #2076

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EvgeniiMekhanik
Copy link
Contributor

@EvgeniiMekhanik EvgeniiMekhanik commented Mar 11, 2024

Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:

  • we exclude loopback interfaces from setup, because
    we don't take into account response reordering
    problem.
  • we don't take into account that some interfaces
    have some suffix lile @if14, and we should remove
    it from device name in our scripts.
  • we don't try to setup combined RSS queues, only
    RX queues, but there are a lot of cases when network
    interface has only combined queues.
  • we don't take into account overflow when we calculate
    1 << x, when x is greater or equal then 64.
  • When we setup RPS we use cpu_mask, which is calculated
    as $(perl -le 'printf("%x", (1 << '$CPUS_N') - 1)').
    But in this case we use all cpus, not only one.
  • we don't setup RPS for network interface if, RSS setup
    fails.
    This patch fix all this problems.

Closes #2075

@EvgeniiMekhanik EvgeniiMekhanik marked this pull request as draft March 11, 2024 12:47
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-socket-cpu-migration branch 4 times, most recently from edef1b5 to 8c5c9d1 Compare March 12, 2024 16:57
@EvgeniiMekhanik EvgeniiMekhanik marked this pull request as ready for review March 12, 2024 16:58
@EvgeniiMekhanik EvgeniiMekhanik marked this pull request as draft March 15, 2024 16:49
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-socket-cpu-migration branch from 1d90152 to 678759f Compare March 18, 2024 11:55
@EvgeniiMekhanik EvgeniiMekhanik marked this pull request as ready for review March 18, 2024 11:55
@EvgeniiMekhanik
Copy link
Contributor Author

When we use all existing CPUs, when we set RPS it is ok. CPU will be changed only one time and only for loopback, becase for loopback we set skb_hash in tcp_make_synack and this skb will return to us! It is not a problem, because later CPU will be calulated correctly in netif_rx_skb_internal for all other skbs.

@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-socket-cpu-migration branch 2 times, most recently from 56e90a7 to 8bd8ac8 Compare March 18, 2024 13:33
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-socket-cpu-migration branch 3 times, most recently from 964be21 to 1899417 Compare April 3, 2024 11:10
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-socket-cpu-migration branch from 1899417 to 24bf0ff Compare April 8, 2024 12:27
Copy link
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functionfor typo

@@ -92,19 +91,83 @@ irqbalance_ban_irqs()
systemctl restart irqbalance.service >/dev/null
}

# This function prepares cpu mask for RSS and RPS.
# It takes into account that we can't calculate
# value, which is greater when (1 << 63) and cam't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cam't typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@EvgeniiMekhanik
Copy link
Contributor Author

Do not review the second commit. I move it in separate PR.

@const-t const-t self-requested a review April 19, 2024 09:00
Copy link
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please don't forget to remove second commit.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# and broken HTTP1. Some network interfaces have some strange suffix
# like @if14, and we should remove it from device name.
declare devs=$(ip addr show up | grep -P '^[0-9]+' | awk '{ sub(/:/, "", $2); print $2}' \
| awk '{ split($0,a,"@"); print a[1] }')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awk is a nice and powerful language, so this can be done in one awk program

scripts/tfw_lib.sh Outdated Show resolved Hide resolved
@@ -46,12 +46,12 @@ tls_mod=tempesta_tls
tdb_mod=tempesta_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the copyright year

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

scripts/tfw_lib.sh Outdated Show resolved Hide resolved
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-socket-cpu-migration branch 4 times, most recently from 570878e to 68bf486 Compare May 6, 2024 08:14
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-socket-cpu-migration branch 2 times, most recently from 24f76b2 to ea303e5 Compare May 7, 2024 07:55
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-socket-cpu-migration branch from ea303e5 to 8edd7e9 Compare May 24, 2024 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Socket cpu migration
3 participants