Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

echo 2.40.0 SIGSEGV error when enabling pubsub on Spinnaker 1.33.0 #6924

Open
caongocthai opened this issue Jan 25, 2024 · 6 comments
Open

Comments

@caongocthai
Copy link

caongocthai commented Jan 25, 2024

Issue Summary:

I tried to upgrading Spinnaker to version 1.33.0 using spinnaker-operator. However, spin-echo pod fail to start. I have narrowed down the issue to be with pubsub. Whenever I disable pubsub configuration on the spinnakerservice.yaml file, echo-scheduler can start successfully. But when I enable pubsub, it fails with error below

   ('-.                ('-. .-.              
 _(  OO)              ( OO )  /              
(,------.    .-----.  ,--. ,--.  .-'),-----. 
 |  .---'   '  .--./  |  | |  | ( OO'  .-.  '
 |  |       |  |('-.  |   .|  | /   |  | |  |
(|  '--.   /_) |OO  ) |       | \_) |  |\|  |
 |  .--'   ||  |`-'|  |  .-.  |   \ |  | |  |
 |  `---. (_'  '--'\  |  | |  |    `'  '-'  '
 `------'    `-----'  `--' `--'      `-----' 
2024-01-25 08:27:18.225  WARN 1 --- [           main] o.s.b.c.config.ConfigDataEnvironment     : Property 'spring.profiles' imported from location 'class path resource [echo.yml]' is invalid and should be replaced with 'spring.config.activate.on-profile' [origin: class path resource [echo.yml] - 74:13]
2024-01-25 08:27:18.229  WARN 1 --- [           main] o.s.b.c.config.ConfigDataEnvironment     : Property 'spring.profiles' imported from location 'class path resource [echo.yml]' is invalid and should be replaced with 'spring.config.activate.on-profile' [origin: class path resource [echo.yml] - 65:13]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000003fd6, pid=1, tid=19
#
# JRE version: OpenJDK Runtime Environment (17.0.9+8) (build 17.0.9+8-alpine-r0)
# Java VM: OpenJDK 64-Bit Server VM (17.0.9+8-alpine-r0, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64)
# Problematic frame:
# C  [libio_grpc_netty_shaded_netty_transport_native_epoll_x86_641758931476565580154.so+0xbc1a]  parsePackagePrefix+0xba
#
# Core dump will be written. Default location: /core.%e.1.%t
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
#
# If you would like to submit a bug report, please visit:
#   https://gitlab.alpinelinux.org/alpine/aports/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

This is my current pubsub config:

      pubsub:
        enabled: true # If set this and google.enabled below to false, no issue
        google:
          enabled: true # If set this and pubsub.enabled above to false, no issue
          pubsubType: GOOGLE
          subscriptions:
          - name: my_value
            project: my_value
            subscriptionName: my_value
            jsonPath: my_value.json
            templatePath: my_value.jinja
            ackDeadlineSeconds: 60
            messageFormat: CUSTOM
          publishers:
          - name: my_value
            project: my_value
            topicName: my_value
            jsonPath: my_value.json
            content: NOTIFICATIONS

Note:

  • On previous Spinnaker version, echo-scheduler 2.37.1 , the pusub config above doesn’t throw any error
  • On Spinnaker version 1.33.0, echo-scheduler at 2.40.0, it has the error
  • echo-scheduler and echo-worker both face same error

Cloud Provider(s):

GCP/GKE

Environment:

I'm running Spinnaker in a GKE cluster. Deployment is using spinnaker-operator.
Storage is GCS and redis.

Feature Area:

  • echo-scheduler
  • pubsub

Description:

I want to upgrade Spinnaker to version 1.33.0 but echo-scheduler 2.40.0 fail to start.

Steps to Reproduce:

Additional Details:

@caongocthai caongocthai changed the title echo-scheduler SIGSEGV error when enabling pubsub on Spinnaker 1.33.0 echo-scheduler 2.40.0 SIGSEGV error when enabling pubsub on Spinnaker 1.33.0 Jan 25, 2024
@caongocthai caongocthai changed the title echo-scheduler 2.40.0 SIGSEGV error when enabling pubsub on Spinnaker 1.33.0 echo 2.40.0 SIGSEGV error when enabling pubsub on Spinnaker 1.33.0 Jan 25, 2024
@caongocthai
Copy link
Author

When manually modify the image to version 2.39.0 us-docker.pkg.dev/spinnaker-community/docker/echo:2.39.0, the container starts successfully. So there must be something added in 2.40.0 that breaks

@caongocthai
Copy link
Author

Slack chat here: https://spinnakerteam.slack.com/archives/C091CCWRJ/p1706171672331629

The issue is with alpine image: https://github.com/spinnaker/echo/blob/master/Dockerfile.java11.slim#L1

# Not using the alpine image because it lacks a package gRPC needed to establish pub/sub listeners.

Thre are 2 solutions:

  • Using ubuntu image:
      deploymentEnvironment:
        imageVariant: UBUNTU
  • Adding JAVA_OPTS for echo-scheduler and echo-worker:
env:
  JAVA_OPTS: "-Dio.grpc.netty.shaded.io.netty.transport.noNative=true"

@spinnakerbot
Copy link

This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:

@spinnakerbot remove-label stale

@spinnakerbot
Copy link

This issue is tagged as 'stale' and hasn't been updated in 45 days, so we are tagging it as 'to-be-closed'. It will be closed in 45 days unless updates are made. If you want to remove this label, comment:

@spinnakerbot remove-label to-be-closed

@jcurlier
Copy link

jcurlier commented Apr 25, 2024

+1 same issue with upgrade to 1.34.2, reverted back to 1.32.3

2024-04-25 21:13:23.041  INFO 1 --- [           main] c.n.s.e.p.google.GooglePubsubMonitor     : Starting async connections for Google Pubsub subscribers
2024-04-25 21:13:23.045  INFO 1 --- [           main] c.n.s.e.p.google.GooglePubsubMonitor     : Opening async connection to spinnaker-gcr-subscription
2024-04-25 21:13:23.126  INFO 1 --- [   scheduling-1] s.k.p.u.r.r.RemotePluginInfoReleaseCache : Cached 0 remote plugin configurations.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000003fd6, pid=1, tid=19
#
# JRE version: OpenJDK Runtime Environment (17.0.9+8) (build 17.0.9+8-alpine-r0)
# Java VM: OpenJDK 64-Bit Server VM (17.0.9+8-alpine-r0, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libio_grpc_netty_shaded_netty_transport_native_epoll_x86_6414632178936215212429.so+0xbc1a]  parsePackagePrefix+0xba
#
# Core dump will be written. Default location: /core.%e.1.%t

@jcurlier
Copy link

@spinnakerbot remove-label stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants