Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash related to dynamic updates to UDP proxy via CDS/EDS #33824

Merged
merged 6 commits into from May 3, 2024

Conversation

adams-shaun
Copy link
Contributor

@adams-shaun adams-shaun commented Apr 26, 2024

Ref: #26206

Commit Message:

We've been experiencing a crash similar to the issue reported above. Duplicating this scenario involved causing updates to the cluster HostSet via xDS.

When the CM posts the updates to workers, a new ThreadLocalCluster object is created and the previous one will be destructed soon thereafter. We expect a new ClusterInfo which uses this new TLC to be constructed and stored in the infos map; however, it seems that is not the case:

  // flat_hash_map::emplace()
  //
  // Inserts an element of the specified value by constructing it in-place
  // within the `flat_hash_map`, provided that no element with the given key
  // already exists.
  //
  // The element may be constructed even if there already is an element with the
  // key in the container, in which case the newly constructed element will be
  // destroyed immediately. Prefer `try_emplace()` unless your key is not
  // copyable or moveable.

Instead, I think this is a better option:

  // flat_hash_map::insert_or_assign()
  //
  // Inserts an element of the specified value into the `flat_hash_map` provided
  // that a value with the given key does not already exist, or replaces it with
  // the element value if a key for that value already exists, returning an
  // iterator pointing to the newly inserted element.  If rehashing occurs due
  // to the insertion, all existing iterators are invalidated. Overloads are
  // listed below.
  //

By using emplace with a pre-existing key, we actually do not create a new object and future calls that reference the stale ThreadLocalCluster (i.e. the call to info() down the onData() path) will cause segfault.

Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Ref: envoyproxy#26206

Signed-off-by: s.adams@f5.com <s.adams@f5.com>
Copy link

Hi @adams-shaun, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #33824 was opened by adams-shaun.

see: more, trace.

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!
Please also add a release note.
Assigning @danzh2010 as codeownder.
/assign @danzh2010

@@ -826,6 +826,51 @@ stat_prefix: foo
EXPECT_EQ(0, config_->stats().downstream_sess_active_.value());
}

// Test updates to existing cluster (e.g. priority set changes, etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style-nit: comments (here and below) end with '.'

Signed-off-by: s.adams@f5.com <s.adams@f5.com>
Signed-off-by: s.adams@f5.com <s.adams@f5.com>
adisuissa
adisuissa previously approved these changes Apr 29, 2024
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@adisuissa
Copy link
Contributor

/retest

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this!
Can you please add a release note.
/assign-from @envoyproxy/senior-maintainers

Copy link

@envoyproxy/senior-maintainers assignee is @lizan

🐱

Caused by: a #33824 (review) was submitted by @adisuissa.

see: more, trace.

Signed-off-by: s.adams@f5.com <s.adams@f5.com>
changelogs/current.yaml Outdated Show resolved Hide resolved
Co-authored-by: phlax <phlax@users.noreply.github.com>
Signed-off-by: Shaun Adams <shaun.adams@volunteers.acasi.info>
lizan
lizan previously approved these changes May 1, 2024
Signed-off-by: Shaun Adams <shaun.adams@volunteers.acasi.info>
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@adisuissa
Copy link
Contributor

This was approved by a senior-maintainer, merging.

@adisuissa adisuissa merged commit 8d1ab63 into envoyproxy:main May 3, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants