Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/appsec/dyngo: atomic instrumentation swapping #1873

Merged
merged 13 commits into from
Apr 17, 2023

Conversation

Julio-Guerra
Copy link
Contributor

@Julio-Guerra Julio-Guerra commented Apr 6, 2023

What does this PR do?

Atomic swapping of AppSec's instrumentations (currently HTTP WAF and gRPC WAF) by reviewing the way dyngo event listeners should be managed. Instead of allowing registering and unregistering event listeners, we rather simplified it so that we can simply atomically swap the root operation instead. This allows a cleaner approach where a root operation can be created async, along with the new set of event listeners, and later on live-swapped, with no more need to deal with concurrent event listeners swapping. The current implementation simply is an atomic pointer swapping.

Overall, this allows a cleaner "pure programming" approach, where an operation is no longer modified once running (assuming all its event listeners are registered on the start events, which is not enforced today), which gives new guarantees on the state of our instrumentation now where new security rules can no longer be partially applied. This PR rather ensures that N security rules updates can safely concurrently live all together.

Motivation

By making the appsec instrumentation modification atomic, we avoid:

  • Partial instrumentation states while new rules get applied
  • Partial security rules updates while new rules get applied

Describe how to test/QA your changes

Reviewer's Checklist

  • Changed code has unit tests for its functionality.
  • If this interacts with the agent in a new way, a system test has been added.

Sorry, something went wrong.

@Hellzy Hellzy force-pushed the francois.mazeau/suspicious-request-blocking branch 4 times, most recently from 8cfd53b to e731317 Compare April 11, 2023 16:26
@pr-commenter
Copy link

pr-commenter bot commented Apr 11, 2023

Benchmarks

Comparing candidate commit f9ee4bc in PR branch julio.guerra/dyngo-atomic-callback-swapping with baseline commit b438db9 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 18 metrics, 0 unstable metrics.

@Julio-Guerra Julio-Guerra force-pushed the julio.guerra/dyngo-atomic-callback-swapping branch from df9813b to 1cb84b8 Compare April 11, 2023 20:35
@Hellzy Hellzy force-pushed the francois.mazeau/suspicious-request-blocking branch 6 times, most recently from f4c9ec8 to 6017257 Compare April 13, 2023 12:05
Base automatically changed from francois.mazeau/suspicious-request-blocking to main April 13, 2023 19:34

Verified

This commit was signed with the committer’s verified signature.
rarguelloF Rodrigo Argüello

Verified

This commit was signed with the committer’s verified signature.
rarguelloF Rodrigo Argüello

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was signed with the committer’s verified signature.
rarguelloF Rodrigo Argüello
@Julio-Guerra Julio-Guerra force-pushed the julio.guerra/dyngo-atomic-callback-swapping branch from 1cb84b8 to 01e6cd9 Compare April 13, 2023 19:42

Verified

This commit was signed with the committer’s verified signature.
rarguelloF Rodrigo Argüello

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was signed with the committer’s verified signature.
rarguelloF Rodrigo Argüello
@Julio-Guerra Julio-Guerra marked this pull request as ready for review April 13, 2023 20:47
@Julio-Guerra Julio-Guerra requested review from a team as code owners April 13, 2023 20:47

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Copy link
Contributor

@Hellzy Hellzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's one concerning change that potentially breaks remote activation.

Comment on lines +114 to +121
log.Debug("remoteconfig: gracefully stopping the client")
c.stop <- struct{}{}
select {
case <-c.stop:
log.Debug("remoteconfig: client stopped successfully")
case <-time.After(time.Second):
log.Debug("remoteconfig: client stopping timeout")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind explaining the rationale behind this change? What's the problem with close(c.stop) ?

Copy link
Contributor

@Hellzy Hellzy Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is done to make sure that the goroutine has stopped after exiting this function. Since RC updates allow callbacks to modify anything at any point, making sure that updates aren't still getting applied after exiting Stop() allows a clean shutdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the point is to have Stop() "gracefully" stopping, by block the caller until it is done. This basically adds concurrency guarantees to avoid getting remote config updates while stopping. For instance here, without this change, when stopping RC, you could get a RC background goroutine tick at the same time available, leading to an actual RC update. So without blocking the Stop() operation, the subsequent code could be releasing resources needed by the RC goroutine still alive.

@Julio-Guerra Julio-Guerra enabled auto-merge (squash) April 17, 2023 13:59
@Julio-Guerra Julio-Guerra merged commit 8d92f3e into main Apr 17, 2023
@Julio-Guerra Julio-Guerra deleted the julio.guerra/dyngo-atomic-callback-swapping branch April 17, 2023 14:14
zARODz11z pushed a commit that referenced this pull request May 8, 2023
…ch as queuename tags

contrib: upgrade labstack/echo/v4 from v4.2.0 to v4.9.0 (#1891)

ci: fix flaky lint job (#1892)

contrib/elasticsearch: use naming schema (#1897)

ci: introduce golangci (#1898)

appsec: suspicious request blocking (#1797)

Co-authored-by: Julio Guerra <julio@datadog.com>

ci/golangci-lint: skip google.golang.org/grpc.v12 (#1899)

.github/workflows: run ASM and RC system-tests scenarios (#1900)

contrib/hashicorp/vault: use naming schema (#1868)

contrib/database/sql: add WithIgnoreQueryTypes option (#1823)

Co-authored-by: Zarir Hamza <zarir.hamza@datadoghq.com>
Co-authored-by: Rodrigo Argüello <rodrigo.arguello@datadoghq.com>

contrib/database/sql: use naming schema (#1895)

internal/appsec: add server.request.method address (#1893)

Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com>
Co-authored-by: François Mazeau <francois.mazeau@datadoghq.com>

internal/appsec/dyngo: atomic instrumentation swapping (#1873)

Co-authored-by: François Mazeau <francois.mazeau@datadoghq.com>

go.mod: datadog-agent/pkg/remoteconfig/state 7.45.0-rc.1 (#1902)

internal/version: bump to v1.51.0 (#1912)

ddtrace/tracer: don't set empty tracestate propagation tag (#1910)

go.mod: github.com/DataDog/datadog-agent/pkg/obfuscate 7.45.0-rc.1 (#1916)

appsec: add blocking SDK body operation (#1901)

* Modifying the appsec api: adding appsec.MonitorParsedHTTPBody an error as return value
* Adding a call to the WAF to check for security event synchronously with a call to appsec.MonitorParsedHTTPBody on the body passed as parameter
* Removing the call to the WAF done on the body an the end of a request because we moved it.
* Refactoring the waf addresses storage and access

Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com>

ddtrace/{opentelemetry,opentracer}: add telemetry (#1909)

internal/appsec: fix user ID event detection (#1918)

internal/telemetry: track tracer init time metric (#1896)

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>

internal/appsec/remoteconfig: fix rules overrides (#1921)

contrib/mongodb: use naming schema (#1908)

contrib/syndtr/goleveldb/leveldb: use naming schema (#1914)

contrib/tidwall/buntdb: use naming schema (#1913)

internal/appsec: do not ignore the appsec events rate limiter (#1927)

remoteconfig: remove empty products and don't override appsec rules data (#1925)

contrib/kafka: refactor tests (#1907)

contrib/google.golang.org/grpc: use naming schema (#1919)

contrib/twitchtv/twirp: use naming schema (#1920)

contrib/http: use naming schema (#1929)

ddtrace/tracer: reset decision maker during fallback behavior of w3c header extraction (#1933)

contrib/cassandra: use naming schema (#1911)

Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com>

contrib/redis: use naming schema (#1906)

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>

ci/system-tests: more scenarios with parallel jobs (#1938)

ci: update linter job and add bodyclose (#1942)

contrib/redis/go-redis.v9: support v9 (#1730)

Add support for new go-redis version v9.

It does 2 things:
Copy existing version 8 files to a new path, /redis/go-redis.v9.
Make changes to support version 9.

Fixes #1710

format and rerun go tidy

get rid of prints

add topLevelRegion assertions

remove confusing named return values and todo comment

ddtrace/tracer: ensure access to trace tags is concurrency-safe (#1948)

Spancontext marshaling was accessing tracer internal structures without a
lock, resulting in a data race and panic.

This commit adds a few methods to trace to allow safe access to the tags
and propagatingTags members of trace to the marshaling code.

Fixes #1944

ddtrace/tracer: mark context updated when SetUser is called (#1949)

Fixes a minor logic mistake when setting a user on a span

lint and add default switch case

refactor resourceNameKey and value assignments

restructure functions to be left aligned

use internal logger, be less verbose with function names

go back to normal switch type and format

Set keyTraceID128 on first span in the chunk only (#1946)

go.mod: upgrade go-libddwaf to v1.2.0 (#1953)

Co-authored-by: Julio Guerra <julio@datadog.com>

contrib/database/sql: fix bug where options were always overwritten by register options (#1904)

Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com>

ci/smoke-tests: update the go.sum file after go get -u (#1957)

contrib/net/http: don't set empty string values as span tags (#1956)

Do not set span fields when they are not configured so the tracer can put the defaults in.

use normal string then derefence

rever go.mod and go.sum changes

contrib/internal/httptrace: remove naming schema from init (#1960)

contrib/graphql: use naming schema (#1926)

internal/telemetry: trim the dependencies version prefix v (#1963)

contrib/aws: use naming schema (#1931)

contrib/cloud.google.com/go/pubsub.v1: use naming schema (#1937)

go mod tidy

lint and fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants