-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/appsec/dyngo: atomic instrumentation swapping #1873
internal/appsec/dyngo: atomic instrumentation swapping #1873
Conversation
8cfd53b
to
e731317
Compare
df9813b
to
1cb84b8
Compare
f4c9ec8
to
6017257
Compare
1cb84b8
to
01e6cd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's one concerning change that potentially breaks remote activation.
log.Debug("remoteconfig: gracefully stopping the client") | ||
c.stop <- struct{}{} | ||
select { | ||
case <-c.stop: | ||
log.Debug("remoteconfig: client stopped successfully") | ||
case <-time.After(time.Second): | ||
log.Debug("remoteconfig: client stopping timeout") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind explaining the rationale behind this change? What's the problem with close(c.stop)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is done to make sure that the goroutine has stopped after exiting this function. Since RC updates allow callbacks to modify anything at any point, making sure that updates aren't still getting applied after exiting Stop()
allows a clean shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the point is to have Stop()
"gracefully" stopping, by block the caller until it is done. This basically adds concurrency guarantees to avoid getting remote config updates while stopping. For instance here, without this change, when stopping RC, you could get a RC background goroutine tick at the same time available, leading to an actual RC update. So without blocking the Stop() operation, the subsequent code could be releasing resources needed by the RC goroutine still alive.
…ch as queuename tags contrib: upgrade labstack/echo/v4 from v4.2.0 to v4.9.0 (#1891) ci: fix flaky lint job (#1892) contrib/elasticsearch: use naming schema (#1897) ci: introduce golangci (#1898) appsec: suspicious request blocking (#1797) Co-authored-by: Julio Guerra <julio@datadog.com> ci/golangci-lint: skip google.golang.org/grpc.v12 (#1899) .github/workflows: run ASM and RC system-tests scenarios (#1900) contrib/hashicorp/vault: use naming schema (#1868) contrib/database/sql: add WithIgnoreQueryTypes option (#1823) Co-authored-by: Zarir Hamza <zarir.hamza@datadoghq.com> Co-authored-by: Rodrigo Argüello <rodrigo.arguello@datadoghq.com> contrib/database/sql: use naming schema (#1895) internal/appsec: add server.request.method address (#1893) Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com> Co-authored-by: François Mazeau <francois.mazeau@datadoghq.com> internal/appsec/dyngo: atomic instrumentation swapping (#1873) Co-authored-by: François Mazeau <francois.mazeau@datadoghq.com> go.mod: datadog-agent/pkg/remoteconfig/state 7.45.0-rc.1 (#1902) internal/version: bump to v1.51.0 (#1912) ddtrace/tracer: don't set empty tracestate propagation tag (#1910) go.mod: github.com/DataDog/datadog-agent/pkg/obfuscate 7.45.0-rc.1 (#1916) appsec: add blocking SDK body operation (#1901) * Modifying the appsec api: adding appsec.MonitorParsedHTTPBody an error as return value * Adding a call to the WAF to check for security event synchronously with a call to appsec.MonitorParsedHTTPBody on the body passed as parameter * Removing the call to the WAF done on the body an the end of a request because we moved it. * Refactoring the waf addresses storage and access Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com> ddtrace/{opentelemetry,opentracer}: add telemetry (#1909) internal/appsec: fix user ID event detection (#1918) internal/telemetry: track tracer init time metric (#1896) Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com> internal/appsec/remoteconfig: fix rules overrides (#1921) contrib/mongodb: use naming schema (#1908) contrib/syndtr/goleveldb/leveldb: use naming schema (#1914) contrib/tidwall/buntdb: use naming schema (#1913) internal/appsec: do not ignore the appsec events rate limiter (#1927) remoteconfig: remove empty products and don't override appsec rules data (#1925) contrib/kafka: refactor tests (#1907) contrib/google.golang.org/grpc: use naming schema (#1919) contrib/twitchtv/twirp: use naming schema (#1920) contrib/http: use naming schema (#1929) ddtrace/tracer: reset decision maker during fallback behavior of w3c header extraction (#1933) contrib/cassandra: use naming schema (#1911) Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com> contrib/redis: use naming schema (#1906) Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com> ci/system-tests: more scenarios with parallel jobs (#1938) ci: update linter job and add bodyclose (#1942) contrib/redis/go-redis.v9: support v9 (#1730) Add support for new go-redis version v9. It does 2 things: Copy existing version 8 files to a new path, /redis/go-redis.v9. Make changes to support version 9. Fixes #1710 format and rerun go tidy get rid of prints add topLevelRegion assertions remove confusing named return values and todo comment ddtrace/tracer: ensure access to trace tags is concurrency-safe (#1948) Spancontext marshaling was accessing tracer internal structures without a lock, resulting in a data race and panic. This commit adds a few methods to trace to allow safe access to the tags and propagatingTags members of trace to the marshaling code. Fixes #1944 ddtrace/tracer: mark context updated when SetUser is called (#1949) Fixes a minor logic mistake when setting a user on a span lint and add default switch case refactor resourceNameKey and value assignments restructure functions to be left aligned use internal logger, be less verbose with function names go back to normal switch type and format Set keyTraceID128 on first span in the chunk only (#1946) go.mod: upgrade go-libddwaf to v1.2.0 (#1953) Co-authored-by: Julio Guerra <julio@datadog.com> contrib/database/sql: fix bug where options were always overwritten by register options (#1904) Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com> ci/smoke-tests: update the go.sum file after go get -u (#1957) contrib/net/http: don't set empty string values as span tags (#1956) Do not set span fields when they are not configured so the tracer can put the defaults in. use normal string then derefence rever go.mod and go.sum changes contrib/internal/httptrace: remove naming schema from init (#1960) contrib/graphql: use naming schema (#1926) internal/telemetry: trim the dependencies version prefix v (#1963) contrib/aws: use naming schema (#1931) contrib/cloud.google.com/go/pubsub.v1: use naming schema (#1937) go mod tidy lint and fix test
What does this PR do?
Atomic swapping of AppSec's instrumentations (currently HTTP WAF and gRPC WAF) by reviewing the way dyngo event listeners should be managed. Instead of allowing registering and unregistering event listeners, we rather simplified it so that we can simply atomically swap the root operation instead. This allows a cleaner approach where a root operation can be created async, along with the new set of event listeners, and later on live-swapped, with no more need to deal with concurrent event listeners swapping. The current implementation simply is an atomic pointer swapping.
Overall, this allows a cleaner "pure programming" approach, where an operation is no longer modified once running (assuming all its event listeners are registered on the start events, which is not enforced today), which gives new guarantees on the state of our instrumentation now where new security rules can no longer be partially applied. This PR rather ensures that N security rules updates can safely concurrently live all together.
Motivation
By making the appsec instrumentation modification atomic, we avoid:
Describe how to test/QA your changes
Reviewer's Checklist