Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some alerts going to a webhook crash alertmanager - panic: runtime error: invalid memory address or nil pointer dereference #3798

Open
aned opened this issue Apr 3, 2024 · 4 comments

Comments

@aned
Copy link

aned commented Apr 3, 2024

What did you see instead? Under which circumstances?
Some alerts going to a webhook crash alertmanager

  • Alertmanager version:
alertmanager, version 0.27.0 (branch: HEAD, revision: 0aa3c2aad14cff039931923ab16b26b7481783b5)
  build user:       root@22cd11f671e9
  build date:       20240228-11:51:20
  go version:       go1.21.7
  platform:         linux/amd64
  tags:             netgo
  • Prometheus version:
Version	2.50.1
Revision	8c9b0285360a0b6288d76214a75ce3025bce4050
Branch	HEAD
BuildUser	root@6213bb3ee580
BuildDate	20240226-11:36:26
GoVersion	go1.21.7
  • Logs:
sudo journalctl -u alertmanager -r | more
Apr 03 22:04:10 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:04:10.827Z caller=cluster.go:700 level=info component=cluster msg="gossip settled; proceeding" elapsed=10.00
361905s
Apr 03 22:04:02 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:04:02.825Z caller=cluster.go:708 level=info component=cluster msg="gossip not settled" polls=0 before=0 now=
3 elapsed=2.001042721s
Apr 03 22:04:00 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:04:00.871Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9999
Apr 03 22:04:00 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:04:00.871Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9999
Apr 03 22:04:00 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:04:00.864Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configurat
ion file" file=/export/apps/prometheus/etc/Conf_Sync/sdi-infra-prometheus/configs_and_rules/alertmanager/alertmanager-config.yml
Apr 03 22:04:00 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:04:00.858Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" fil
e=/export/apps/prometheus/etc/Conf_Sync/sdi-infra-prometheus/configs_and_rules/alertmanager/alertmanager-config.yml
Apr 03 22:04:00 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:04:00.824Z caller=cluster.go:683 level=info component=cluster msg="Waiting for gossip to settle..." interval
=2s
Apr 03 22:03:59 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:03:59.382Z caller=cluster.go:186 level=info component=cluster msg="setting advertise address explicitly" add
r=10.187.4.247 port=9094
Apr 03 22:03:59 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:03:59.377Z caller=featurecontrol.go:94 level=warn msg="Experimental receiver name in metrics enabled"
Apr 03 22:03:59 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:03:59.377Z caller=main.go:182 level=info build_context="(go=go1.21.7, platform=linux/amd64, user=root@22cd11
f671e9, date=20240228-11:51:20, tags=netgo)"
Apr 03 22:03:59 abc1-wer2323.prod.host.blah alertmanager[3823174]: ts=2024-04-03T22:03:59.377Z caller=main.go:181 level=info msg="Starting Alertmanager" version="(version=0.27.0, branch=HEAD,
revision=0aa3c2aad14cff039931923ab16b26b7481783b5)"
Apr 03 22:03:59 abc1-wer2323.prod.host.blah systemd[1]: Started Alertmanager Server.
Apr 03 22:03:59 abc1-wer2323.prod.host.blah systemd[1]: Stopped Alertmanager Server.
Apr 03 22:03:59 abc1-wer2323.prod.host.blah systemd[1]: alertmanager.service: Scheduled restart job, restart counter is at 102.
Apr 03 22:03:54 abc1-wer2323.prod.host.blah systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Apr 03 22:03:54 abc1-wer2323.prod.host.blah systemd[1]: alertmanager.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/notify.go:482 +0x9d
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: created by github.com/prometheus/alertmanager/notify.FanoutStage.Exec in goroutine 411
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/notify.go:483 +0x53
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: github.com/prometheus/alertmanager/notify.FanoutStage.Exec.func1({0x1658320?, 0xc0009f92d8?})
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/notify.go:461 +0xd5
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: github.com/prometheus/alertmanager/notify.MultiStage.Exec({0xc00086fb80?, 0x4, 0x14?}, {0x1661110?, 0xc001942480?}, {0x1657c0
0, 0xc000875680}, {0xc0008f2000, 0x81, 0x100})
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/notify.go:760 +0x110
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: github.com/prometheus/alertmanager/notify.RetryStage.Exec({{{0x1657ea0, 0xc000a4b440}, {0x1657e80, 0xc0007ceff0}, {0x10e0c18,
 0x7}, 0x0, {0xc0009e60c0, 0xb}}, {0xc0009e60c0, ...}, ...}, ...)
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/notify.go:836 +0x666
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: github.com/prometheus/alertmanager/notify.RetryStage.exec({{{0x1657ea0, 0xc000a4b440}, {0x1657e80, 0xc0007ceff0}, {0x10e0c18,
 0x7}, 0x0, {0xc0009e60c0, 0xb}}, {0xc0009e60c0, ...}, ...}, ...)
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/notify.go:85
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: github.com/prometheus/alertmanager/notify.(*Integration).Notify(...)
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/webhook/webhook.go:123 +0x527
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: github.com/prometheus/alertmanager/notify/webhook.(*Notifier).Notify(0xc000a4b440, {0x1661110, 0xc001c52090}, {0xc0009a2000?,
 0x2?, 0xc000ce8001?})
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/util.go:239 +0x116
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: github.com/prometheus/alertmanager/notify.(*Retrier).Check(0x1661110?, 0xc001c52090?, {0x1657220, 0xc000b9d600})
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]:         /app/notify/webhook/webhook.go:60 +0x20
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: github.com/prometheus/alertmanager/notify/webhook.New.func1(0x10f7d7e?, {0x1657220, 0xc000b9d600})
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: goroutine 1399 [running]:
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xe73160]
Apr 03 22:03:54 abc1-wer2323.prod.host.blah alertmanager[3822203]: panic: runtime error: invalid memory address or nil pointer dereference
Apr 03 22:00:41 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:41.097Z caller=cluster.go:700 level=info component=cluster msg="gossip settled; proceeding" elapsed=10.00
2992536s
Apr 03 22:00:33 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:33.094Z caller=cluster.go:708 level=info component=cluster msg="gossip not settled" polls=0 before=0 now=
3 elapsed=2.000169786s
Apr 03 22:00:31 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:31.144Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9999
Apr 03 22:00:31 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:31.144Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9999
Apr 03 22:00:31 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:31.139Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configurat
ion file" file=/export/apps/prometheus/etc/Conf_Sync/sdi-infra-prometheus/configs_and_rules/alertmanager/alertmanager-config.yml
Apr 03 22:00:31 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:31.133Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" fil
e=/export/apps/prometheus/etc/Conf_Sync/sdi-infra-prometheus/configs_and_rules/alertmanager/alertmanager-config.yml
Apr 03 22:00:31 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:31.094Z caller=cluster.go:683 level=info component=cluster msg="Waiting for gossip to settle..." interval
=2s
Apr 03 22:00:30 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:30.126Z caller=cluster.go:186 level=info component=cluster msg="setting advertise address explicitly" add
r=10.187.4.247 port=9094
Apr 03 22:00:30 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:30.122Z caller=featurecontrol.go:94 level=warn msg="Experimental receiver name in metrics enabled"
Apr 03 22:00:30 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:30.122Z caller=main.go:182 level=info build_context="(go=go1.21.7, platform=linux/amd64, user=root@22cd11
f671e9, date=20240228-11:51:20, tags=netgo)"
Apr 03 22:00:30 abc1-wer2323.prod.host.blah alertmanager[3822203]: ts=2024-04-03T22:00:30.122Z caller=main.go:181 level=info msg="Starting Alertmanager" version="(version=0.27.0, branch=HEAD,
revision=0aa3c2aad14cff039931923ab16b26b7481783b5)"
Apr 03 22:00:30 abc1-wer2323.prod.host.blah systemd[1]: Started Alertmanager Server.
Apr 03 22:00:30 abc1-wer2323.prod.host.blah systemd[1]: Stopped Alertmanager Server.
Apr 03 22:00:30 abc1-wer2323.prod.host.blah systemd[1]: alertmanager.service: Scheduled restart job, restart counter is at 101.
Apr 03 22:00:24 abc1-wer2323.prod.host.blah systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Apr 03 22:00:24 abc1-wer2323.prod.host.blah systemd[1]: alertmanager.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 03 22:00:24 abc1-wer2323.prod.host.blah alertmanager[3820656]:         /app/notify/notify.go:482 +0x9d
Apr 03 22:00:24 abc1-wer2323.prod.host.blah alertmanager[3820656]: created by github.com/prometheus/alertmanager/notify.FanoutStage.Exec in goroutine 346
Apr 03 22:00:24 abc1-wer2323.prod.host.blah alertmanager[3820656]:         /app/notify/notify.go:483 +0x53
Apr 03 22:00:24 abc1-wer2323.prod.host.blah alertmanager[3820656]: github.com/prometheus/alertmanager/notify.FanoutStage.Exec.func1({0x1658320?, 0xc000b0c6f0?})
Apr 03 22:00:24 abc1-wer2323.prod.host.blah alertmanager[3820656]:         /app/notify/notify.go:461 +0xd5
@zecke
Copy link
Contributor

zecke commented Apr 4, 2024

Any chance you have the configuration around?

@aned
Copy link
Author

aned commented Apr 4, 2024

All webhook related config follows this:

- name: 'name_1'
  webhook_configs:
  - url: 'https://url.com:8028/api/v1/alert'
    http_config:
      tls_config:
        insecure_skip_verify: true
    send_resolved: false

- name: 'name_2'
  webhook_configs:
  - url_file: '/export/apps/alertmanager/path'
    send_resolved: false
    max_alerts: 20

zecke added a commit to zecke/alertmanager that referenced this issue Apr 6, 2024
…#3798)

When using url_file the conf.URL will be nil and when an error occurs
we will panic. Given that the URL is considered a secret, let's just
remove the custom details func.
zecke added a commit to zecke/alertmanager that referenced this issue Apr 6, 2024
…#3798)

When using url_file the conf.URL will be nil and when an error occurs
we will panic. Given that the URL is considered a secret, let's just
remove the custom details func.

Signed-off-by: Holger Hans Peter Freyther <holger@freyther.de>
@grobinson-grafana
Copy link
Contributor

A fix for this has just been merged, thanks @zecke! 👍

@grobinson-grafana
Copy link
Contributor

Fix is here #3800. Please close the issue 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants