Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] Fix panic in load_scraper_windows when stopping #28678

Conversation

pjanotti
Copy link
Contributor

Description:
Fix a panic when the load scraper for Windows is stopped before being started. This can happen when the collector fails at startup. In this case the components are shutdown even if they were not started. This was encountered in real world usage.

2023-10-23T13:13:23.137-0500    info    service@v0.86.0/service.go:170  Starting shutdown...
2023-10-23T13:13:23.138-0500    info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "unavailable"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0x30c4028]

goroutine 1 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/loadscraper.stopSampling({0x0?, 0x6000103?})
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver@v0.86.0/internal/scraper/loadscraper/load_scraper_windows.go:145 +0xc8
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/loadscraper.(*scraper).shutdown(...)
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver@v0.86.0/internal/scraper/loadscraper/load_scraper.go:78
go.opentelemetry.io/collector/component.ShutdownFunc.Shutdown(...)
        go.opentelemetry.io/collector/component@v0.86.0/component.go:84
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).Shutdown(0xc0000c3a40, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/receiver@v0.86.0/scraperhelper/scrapercontroller.go:149 +0x97
go.opentelemetry.io/collector/service/internal/graph.(*Graph).ShutdownAll(0x0?, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/service@v0.86.0/internal/graph/graph.go:358 +0xc9
go.opentelemetry.io/collector/service.(*Service).Shutdown(0xc0008373b0, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/service@v0.86.0/service.go:176 +0xd4
go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents(0xc000dc6180, {0x71b0d50, 0xc00006c0e0})        go.opentelemetry.io/collector/otelcol@v0.86.0/collector.go:187 +0x708
go.opentelemetry.io/collector/otelcol.(*Collector).Run(0xc000dc6180, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/otelcol@v0.86.0/collector.go:221 +0x65
go.opentelemetry.io/collector/otelcol.NewCommand.func1(0xc00229cf00, {0x6636c91?, 0x0?, 0x3?})
        go.opentelemetry.io/collector/otelcol@v0.86.0/command.go:27 +0x96
github.com/spf13/cobra.(*Command).execute(0xc00229cf00, {0xc0000ac050, 0x0, 0x3})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x862
github.com/spf13/cobra.(*Command).ExecuteC(0xc00229cf00)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).Execute(0xc0022dd860?)
        github.com/spf13/cobra@v1.7.0/command.go:992 +0x19
main.runInteractive({{0xc0022dd860, 0xc0022ddad0, 0xc0022dda10, 0xc0022dd5f0, 0xc0022ddb00}, {{0x663630d, 0x7}, {0x0, 0x0}, {0x713fdf8, ...}}, ...})
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main.go:100 +0x5d
main.run({{0xc0022dd860, 0xc0022ddad0, 0xc0022dda10, 0xc0022dd5f0, 0xc0022ddb00}, {{0x663630d, 0x7}, {0x0, 0x0}, {0x713fdf8, ...}}, ...})
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main_windows.go:33 +0x58
main.main()
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main.go:93 +0xcba

Link to tracking Issue:
N/A

Testing:
Local test runs.

Documentation:
N/A

@crobert-1 crobert-1 added the bug Something isn't working label Oct 30, 2023
@dmitryax dmitryax merged commit 0e08a1c into open-telemetry:main Oct 31, 2023
88 checks passed
@github-actions github-actions bot added this to the next release milestone Oct 31, 2023
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this pull request Nov 12, 2023
open-telemetry#28678)

**Description:** 
Fix a panic when the load scraper for Windows is stopped before being
started. This can happen when the collector fails at startup. In this
case the components are shutdown even if they were not started. This was
encountered in real world usage.

```terminal
2023-10-23T13:13:23.137-0500    info    service@v0.86.0/service.go:170  Starting shutdown...
2023-10-23T13:13:23.138-0500    info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "unavailable"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0x30c4028]

goroutine 1 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/loadscraper.stopSampling({0x0?, 0x6000103?})
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver@v0.86.0/internal/scraper/loadscraper/load_scraper_windows.go:145 +0xc8
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/loadscraper.(*scraper).shutdown(...)
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver@v0.86.0/internal/scraper/loadscraper/load_scraper.go:78
go.opentelemetry.io/collector/component.ShutdownFunc.Shutdown(...)
        go.opentelemetry.io/collector/component@v0.86.0/component.go:84
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).Shutdown(0xc0000c3a40, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/receiver@v0.86.0/scraperhelper/scrapercontroller.go:149 +0x97
go.opentelemetry.io/collector/service/internal/graph.(*Graph).ShutdownAll(0x0?, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/service@v0.86.0/internal/graph/graph.go:358 +0xc9
go.opentelemetry.io/collector/service.(*Service).Shutdown(0xc0008373b0, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/service@v0.86.0/service.go:176 +0xd4
go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents(0xc000dc6180, {0x71b0d50, 0xc00006c0e0})        go.opentelemetry.io/collector/otelcol@v0.86.0/collector.go:187 +0x708
go.opentelemetry.io/collector/otelcol.(*Collector).Run(0xc000dc6180, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/otelcol@v0.86.0/collector.go:221 +0x65
go.opentelemetry.io/collector/otelcol.NewCommand.func1(0xc00229cf00, {0x6636c91?, 0x0?, 0x3?})
        go.opentelemetry.io/collector/otelcol@v0.86.0/command.go:27 +0x96
github.com/spf13/cobra.(*Command).execute(0xc00229cf00, {0xc0000ac050, 0x0, 0x3})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x862
github.com/spf13/cobra.(*Command).ExecuteC(0xc00229cf00)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).Execute(0xc0022dd860?)
        github.com/spf13/cobra@v1.7.0/command.go:992 +0x19
main.runInteractive({{0xc0022dd860, 0xc0022ddad0, 0xc0022dda10, 0xc0022dd5f0, 0xc0022ddb00}, {{0x663630d, 0x7}, {0x0, 0x0}, {0x713fdf8, ...}}, ...})
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main.go:100 +0x5d
main.run({{0xc0022dd860, 0xc0022ddad0, 0xc0022dda10, 0xc0022dd5f0, 0xc0022ddb00}, {{0x663630d, 0x7}, {0x0, 0x0}, {0x713fdf8, ...}}, ...})
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main_windows.go:33 +0x58
main.main()
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main.go:93 +0xcba
```
**Link to tracking Issue:**
N/A

**Testing:**
Local test runs.

**Documentation:**
N/A
@pjanotti pjanotti deleted the fix-panic-on-load_scraper_windows-shutdown branch November 19, 2023 20:39
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this pull request Nov 24, 2023
open-telemetry#28678)

**Description:** 
Fix a panic when the load scraper for Windows is stopped before being
started. This can happen when the collector fails at startup. In this
case the components are shutdown even if they were not started. This was
encountered in real world usage.

```terminal
2023-10-23T13:13:23.137-0500    info    service@v0.86.0/service.go:170  Starting shutdown...
2023-10-23T13:13:23.138-0500    info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "unavailable"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0x30c4028]

goroutine 1 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/loadscraper.stopSampling({0x0?, 0x6000103?})
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver@v0.86.0/internal/scraper/loadscraper/load_scraper_windows.go:145 +0xc8
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/loadscraper.(*scraper).shutdown(...)
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver@v0.86.0/internal/scraper/loadscraper/load_scraper.go:78
go.opentelemetry.io/collector/component.ShutdownFunc.Shutdown(...)
        go.opentelemetry.io/collector/component@v0.86.0/component.go:84
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).Shutdown(0xc0000c3a40, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/receiver@v0.86.0/scraperhelper/scrapercontroller.go:149 +0x97
go.opentelemetry.io/collector/service/internal/graph.(*Graph).ShutdownAll(0x0?, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/service@v0.86.0/internal/graph/graph.go:358 +0xc9
go.opentelemetry.io/collector/service.(*Service).Shutdown(0xc0008373b0, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/service@v0.86.0/service.go:176 +0xd4
go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents(0xc000dc6180, {0x71b0d50, 0xc00006c0e0})        go.opentelemetry.io/collector/otelcol@v0.86.0/collector.go:187 +0x708
go.opentelemetry.io/collector/otelcol.(*Collector).Run(0xc000dc6180, {0x71b0d50, 0xc00006c0e0})
        go.opentelemetry.io/collector/otelcol@v0.86.0/collector.go:221 +0x65
go.opentelemetry.io/collector/otelcol.NewCommand.func1(0xc00229cf00, {0x6636c91?, 0x0?, 0x3?})
        go.opentelemetry.io/collector/otelcol@v0.86.0/command.go:27 +0x96
github.com/spf13/cobra.(*Command).execute(0xc00229cf00, {0xc0000ac050, 0x0, 0x3})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x862
github.com/spf13/cobra.(*Command).ExecuteC(0xc00229cf00)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).Execute(0xc0022dd860?)
        github.com/spf13/cobra@v1.7.0/command.go:992 +0x19
main.runInteractive({{0xc0022dd860, 0xc0022ddad0, 0xc0022dda10, 0xc0022dd5f0, 0xc0022ddb00}, {{0x663630d, 0x7}, {0x0, 0x0}, {0x713fdf8, ...}}, ...})
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main.go:100 +0x5d
main.run({{0xc0022dd860, 0xc0022ddad0, 0xc0022dda10, 0xc0022dd5f0, 0xc0022ddb00}, {{0x663630d, 0x7}, {0x0, 0x0}, {0x713fdf8, ...}}, ...})
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main_windows.go:33 +0x58
main.main()
        github.com/signalfx/splunk-otel-collector/cmd/otelcol/main.go:93 +0xcba
```
**Link to tracking Issue:**
N/A

**Testing:**
Local test runs.

**Documentation:**
N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/hostmetrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants