Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket cpu migration #2075

Open
EvgeniiMekhanik opened this issue Mar 11, 2024 · 0 comments · May be fixed by #2076
Open

Socket cpu migration #2075

EvgeniiMekhanik opened this issue Mar 11, 2024 · 0 comments · May be fixed by #2076
Assignees
Labels
performance question Questions and support tasks
Milestone

Comments

@EvgeniiMekhanik
Copy link
Contributor

EvgeniiMekhanik commented Mar 11, 2024

Currently there are a lot of cases when we can observe socket CPU migration. This situation can leads to two problems: performance degradation upon capture socket lock and response reordering (which leads to violation of the HTTP1 RFC).

@EvgeniiMekhanik EvgeniiMekhanik self-assigned this Mar 11, 2024
EvgeniiMekhanik added a commit that referenced this issue Mar 11, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering.
WE use RPS and RSS to fix this problem.

Closes #2075
@EvgeniiMekhanik EvgeniiMekhanik linked a pull request Mar 11, 2024 that will close this issue
EvgeniiMekhanik added a commit that referenced this issue Mar 11, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering.
WE use RPS and RSS to fix this problem.

Closes #2075
@krizhanovsky krizhanovsky added question Questions and support tasks performance labels Mar 11, 2024
@krizhanovsky krizhanovsky added this to the 0.9 - LA milestone Mar 11, 2024
EvgeniiMekhanik added a commit that referenced this issue Mar 12, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering.
WE use RPS and RSS to fix this problem.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Mar 12, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering.
WE use RPS and RSS to fix this problem.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Mar 12, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- When we setup RPS we use cpu_mask, which is calculated
  as $(perl -le 'printf("%x", (1 << '$CPUS_N') - 1)').
  But in this case we use all cpus, not only one.
- we don't setup RPS for network interface if, RSS setup
  fails.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Mar 18, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Mar 18, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Mar 18, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Mar 28, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Apr 3, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Apr 3, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Apr 8, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Apr 29, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue Apr 29, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue May 1, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue May 6, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue May 6, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue May 7, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
EvgeniiMekhanik added a commit that referenced this issue May 24, 2024
Socket cpu migration can lead to two problems:
performance degradation and response reordering,
which leads to broken HTTP1.
Previously we use RSS and RPS to prevent it, but
there were several problems in our scripts:
- we exclude loopback interfaces from setup, because
  we don't take into account response reordering
  problem.
- we don't take into account that some interfaces
  have some suffix lile @if14, and we should remove
  it from device name in our scripts.
- we don't try to setup combined RSS queues, only
  RX queues, but there are a lot of cases when network
  interface has only combined queues.
- we don't take into account overflow when we calculate
  1 << x, when x is greater or equal then 64.
- we don't take into account overflow when we write
  value, which is greater then (1 << 32) - 1 in
  rps_cpus, when we setup RPS.
- we don't setup RPS for network interface if, RSS setup
  fails.
- we don't ban irqs for irqbalance for each network device
  immediately. But if there are a lot of devices there is a
  big race between setting RSS for first device and ban irqs
  for it. This race is anought for irqbalance daemon to change
  our settings.
This patch fix all this problems.

Closes #2075
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance question Questions and support tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants