New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USM: ssl: Ignore ELF files for other architectures #25505
Conversation
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=34436718 --os-family=ubuntu |
Ignore ELF files which are not for the architecture we're running on. Without this, we could end up installing a uprobe on, for example, an arm64 binary on an amd64 machine, thus corrupting the arm64 instruction and leading to segmentation faults.
27798c0
to
fcb01b1
Compare
Regression DetectorRegression Detector ResultsRun ID: 9dccc378-9789-4f01-aa7d-05dd21098025 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
✅ | file_to_blackhole | % cpu utilization | -9.74 | [-14.78, -4.69] |
Fine details of change detection per experiment
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | idle | memory utilization | +0.49 | [+0.45, +0.52] |
➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | +0.02 | [-2.85, +2.90] |
➖ | uds_dogstatsd_to_api | ingress throughput | +0.01 | [-0.20, +0.21] |
➖ | trace_agent_json | ingress throughput | +0.00 | [-0.01, +0.01] |
➖ | trace_agent_msgpack | ingress throughput | -0.00 | [-0.00, +0.00] |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.02 | [-0.06, +0.02] |
➖ | pycheck_1000_100byte_tags | % cpu utilization | -0.05 | [-4.72, +4.63] |
➖ | otel_to_otel_logs | ingress throughput | -0.12 | [-0.48, +0.25] |
➖ | file_tree | memory utilization | -0.18 | [-0.26, -0.11] |
➖ | basic_py_check | % cpu utilization | -0.23 | [-2.79, +2.34] |
➖ | tcp_syslog_to_blackhole | ingress throughput | -4.48 | [-25.15, +16.19] |
✅ | file_to_blackhole | % cpu utilization | -9.74 | [-14.78, -4.69] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
|
||
for { | ||
// To allow time to attach | ||
time.Sleep(1000 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: time.Second
advanced nit: make that configurable (via command line), the CI can be slow sometimes (and USM won't be fast enough to hook)
advanced advanced nit: The binary can block on getting a character in stdin, to allow the test to signal when to start
advanced nit: how does this file differ from fmapper
(pkg/network/usm/sharedlibraries/testutil/fmapper/fmapper.go) or prefetch_file
(pkg/network/usm/testutil/prefetch_file/prefetch_file.go)
on a side note, why do we need both prefetch_file
and fmapper
? 😶🌫️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I got rid of the new program and just used fmapper
which already has a waiting mechanism.
pkg/network/usm/ebpf_ssl_test.go
Outdated
func waitForProgramNotToBeTraced(t *testing.T, cmd *exec.Cmd) { | ||
programType := "shared_libraries" | ||
pid := cmd.Process.Pid | ||
|
||
time.Sleep(3000 * time.Millisecond) | ||
|
||
traced := utils.GetTracedPrograms(programType) | ||
for _, prog := range traced { | ||
require.False(t, slices.Contains[[]uint32](prog.PIDs, uint32(pid))) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if we can remove the sleep in the code, and check if the path or the pid appear in the block-list (blocklistByID
) we have in the different FileRegistries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
So that kmt will copy it to the VMs.
/merge |
🚂 MergeQueue Pull request added to the queue. This build is next! (estimated merge in less than 53m) Use |
* USM: ssl: Ignore ELF files for other architectures Ignore ELF files which are not for the architecture we're running on. Without this, we could end up installing a uprobe on, for example, an arm64 binary on an amd64 machine, thus corrupting the arm64 instruction and leading to segmentation faults. * Use blockList in test * Use OpenFromAnotherProcess instead of libmmap * Remove unused libmmap * Simplify testArch * Skip test on unsupported platforms * Format fakessl.c with clang-format * Add shebang link to script * Move libs to testdata So that kmt will copy it to the VMs. (cherry picked from commit a6c060e) Conflicts: pkg/network/usm/utils/debugger.go
What does this PR do?
Ignore ELF files which are not for the architecture we're running on. Without this, we could end up installing a uprobe on, for example, an arm64 binary on an amd64 machine, thus corrupting the arm64 instruction and leading to segmentation faults.
Motivation
https://datadoghq.atlassian.net/browse/USMON-981
Additional Notes
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
Test cases are included in the PR.