USM: ssl: Ignore ELF files for other architectures #25505

vitkyrka · 2024-05-10T13:16:42Z

What does this PR do?

Ignore ELF files which are not for the architecture we're running on. Without this, we could end up installing a uprobe on, for example, an arm64 binary on an amd64 machine, thus corrupting the arm64 instruction and leading to segmentation faults.

Motivation

https://datadoghq.atlassian.net/browse/USMON-981

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Test cases are included in the PR.

pr-commenter · 2024-05-10T14:07:42Z

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=34436718 --os-family=ubuntu

pkg/network/usm/ebpf_ssl.go

Ignore ELF files which are not for the architecture we're running on. Without this, we could end up installing a uprobe on, for example, an arm64 binary on an amd64 machine, thus corrupting the arm64 instruction and leading to segmentation faults.

pr-commenter · 2024-05-14T16:07:38Z

Regression Detector

Regression Detector Results

Run ID: 9dccc378-9789-4f01-aa7d-05dd21098025
Baseline: 2cacfc1
Comparison: f2e1b88

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI
✅	file_to_blackhole	% cpu utilization	-9.74	[-14.78, -4.69]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	idle	memory utilization	+0.49	[+0.45, +0.52]
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	+0.02	[-2.85, +2.90]
➖	uds_dogstatsd_to_api	ingress throughput	+0.01	[-0.20, +0.21]
➖	trace_agent_json	ingress throughput	+0.00	[-0.01, +0.01]
➖	trace_agent_msgpack	ingress throughput	-0.00	[-0.00, +0.00]
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.02	[-0.06, +0.02]
➖	pycheck_1000_100byte_tags	% cpu utilization	-0.05	[-4.72, +4.63]
➖	otel_to_otel_logs	ingress throughput	-0.12	[-0.48, +0.25]
➖	file_tree	memory utilization	-0.18	[-0.26, -0.11]
➖	basic_py_check	% cpu utilization	-0.23	[-2.79, +2.34]
➖	tcp_syslog_to_blackhole	ingress throughput	-4.48	[-25.15, +16.19]
✅	file_to_blackhole	% cpu utilization	-9.74	[-14.78, -4.69]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

guyarb · 2024-05-14T16:39:53Z

pkg/network/usm/testutil/libmmap/libmmap.go

+
+	for {
+		// To allow time to attach
+		time.Sleep(1000 * time.Millisecond)


nit: time.Second
advanced nit: make that configurable (via command line), the CI can be slow sometimes (and USM won't be fast enough to hook)
advanced advanced nit: The binary can block on getting a character in stdin, to allow the test to signal when to start
advanced nit: how does this file differ from fmapper (pkg/network/usm/sharedlibraries/testutil/fmapper/fmapper.go) or prefetch_file (pkg/network/usm/testutil/prefetch_file/prefetch_file.go)

on a side note, why do we need both prefetch_file and fmapper? 😶‍🌫️

Thanks, I got rid of the new program and just used fmapper which already has a waiting mechanism.

guyarb · 2024-05-14T16:43:19Z

pkg/network/usm/ebpf_ssl_test.go

+func waitForProgramNotToBeTraced(t *testing.T, cmd *exec.Cmd) {
+	programType := "shared_libraries"
+	pid := cmd.Process.Pid
+
+	time.Sleep(3000 * time.Millisecond)
+
+	traced := utils.GetTracedPrograms(programType)
+	for _, prog := range traced {
+		require.False(t, slices.Contains[[]uint32](prog.PIDs, uint32(pid)))
+	}
+}


nit: I wonder if we can remove the sleep in the code, and check if the path or the pid appear in the block-list (blocklistByID) we have in the different FileRegistries

So that kmt will copy it to the VMs.

vitkyrka · 2024-05-16T12:36:02Z

/merge

dd-devflow · 2024-05-16T12:36:09Z

🚂 MergeQueue

Pull request added to the queue.

This build is next! (estimated merge in less than 53m)

Use /merge -c to cancel this operation!

* USM: ssl: Ignore ELF files for other architectures Ignore ELF files which are not for the architecture we're running on. Without this, we could end up installing a uprobe on, for example, an arm64 binary on an amd64 machine, thus corrupting the arm64 instruction and leading to segmentation faults. * Use blockList in test * Use OpenFromAnotherProcess instead of libmmap * Remove unused libmmap * Simplify testArch * Skip test on unsupported platforms * Format fakessl.c with clang-format * Add shebang link to script * Move libs to testdata So that kmt will copy it to the VMs. (cherry picked from commit a6c060e) Conflicts: pkg/network/usm/utils/debugger.go

vitkyrka added changelog/no-changelog team/usm The USM team qa/done Skip QA week as QA was done before merge and regressions are covered by tests labels May 10, 2024

github-actions bot added the component/system-probe label May 10, 2024

guyarb reviewed May 10, 2024

View reviewed changes

pkg/network/usm/ebpf_ssl.go Show resolved Hide resolved

vitkyrka force-pushed the vincent.whitchurch/fix-qemu-segfault branch from 27798c0 to fcb01b1 Compare May 14, 2024 14:58

vitkyrka changed the title ~~USM: ssl: Fix segfault with QEMU~~ USM: ssl: Ignore ELF files for other architectures May 14, 2024

vitkyrka marked this pull request as ready for review May 14, 2024 15:03

vitkyrka requested review from a team as code owners May 14, 2024 15:04

guyarb requested changes May 14, 2024

View reviewed changes

vitkyrka added 5 commits May 16, 2024 09:44

Use blockList in test

e52edd4

Use OpenFromAnotherProcess instead of libmmap

bcb247b

Remove unused libmmap

bbd7516

Simplify testArch

f1444c3

Skip test on unsupported platforms

e3f9f0f

guyarb approved these changes May 16, 2024

View reviewed changes

vitkyrka added 3 commits May 16, 2024 13:06

Format fakessl.c with clang-format

4612cc1

Add shebang link to script

bc3636c

Move libs to testdata

f2e1b88

So that kmt will copy it to the VMs.

dd-mergequeue bot merged commit a6c060e into main May 16, 2024
272 checks passed

dd-mergequeue bot deleted the vincent.whitchurch/fix-qemu-segfault branch May 16, 2024 13:00

github-actions bot added this to the 7.55.0 milestone May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USM: ssl: Ignore ELF files for other architectures #25505

USM: ssl: Ignore ELF files for other architectures #25505

vitkyrka commented May 10, 2024 •

edited

pr-commenter bot commented May 10, 2024 •

edited

pr-commenter bot commented May 14, 2024 •

edited

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

guyarb May 14, 2024

vitkyrka May 16, 2024

guyarb May 14, 2024

vitkyrka May 16, 2024

vitkyrka commented May 16, 2024

dd-devflow bot commented May 16, 2024

USM: ssl: Ignore ELF files for other architectures #25505

USM: ssl: Ignore ELF files for other architectures #25505

Conversation

vitkyrka commented May 10, 2024 • edited

What does this PR do?

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

pr-commenter bot commented May 10, 2024 • edited

Test changes on VM

pr-commenter bot commented May 14, 2024 • edited

Regression Detector

Regression Detector Results

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

guyarb May 14, 2024

Choose a reason for hiding this comment

vitkyrka May 16, 2024

Choose a reason for hiding this comment

guyarb May 14, 2024

Choose a reason for hiding this comment

vitkyrka May 16, 2024

Choose a reason for hiding this comment

vitkyrka commented May 16, 2024

dd-devflow bot commented May 16, 2024

vitkyrka commented May 10, 2024 •

edited

pr-commenter bot commented May 10, 2024 •

edited

pr-commenter bot commented May 14, 2024 •

edited