Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC 1.62.2 broke pubsub publishing from python multiprocessing process with ssl transport errors #36451

Closed
teije01 opened this issue Apr 25, 2024 · 4 comments

Comments

@teije01
Copy link

teije01 commented Apr 25, 2024

What version of gRPC and what language are you using?

gprcio==1.62.2

What operating system (Linux, Windows,...) and version?

Debian GNU/Linux 11 (bullseye)

What runtime / compiler are you using (e.g. python version or version of gcc)

Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
gcc version 12.3.0 (conda-forge gcc 12.3.0-6)
google-cloud-pubsub==2.21.1 (pip 24.0)

What did you do?

The deployments we run are publishing pubsub messages using the google-cloud-pubsub package. The publish client and the publish operation are running in a multiprocessing process.

Minimal code example (not tested explicitly, but reflects what happens in our process)

import multiprocessing
import time
from google.cloud.pubsub import PublisherClient

def send_message():
    topic_path = "project/my-project/topics/my-topic
    publisher_client = PublisherClient()
    publisher_client.publish(topic_path, b"some_message")

process = multiprocessing.Process(target=send_message)
process.start()
while process.is_alive():
    # SSL errors
    time.sleep(0.1)
if process.exitcode != 0:
    print("process failed")

The code we are running in production was working fine with grpcio==1.62.1 and broke on grpcio==1.62.2. We verified that downgrading to 1.62.1 was the only action required to resolve the issue.

What did you expect to see?

No SSL errors

What did you see instead?

This resulted in the following errors:

 Corruption detected.
 error:0A000119:SSL routines::decryption failed or bad record mac
  error:0A000139:SSL routines::record layer failure
              Decryption error: TSI_DATA_CORRUPTED

Anything else we should know about your project / environment?

The code is deployed in a gke cluster

@XuanWang-Amos
Copy link
Contributor

Hi, thanks for reporting this issue, we did made changes to google default credential flow in 1.62.2 but that doesn't seems to be the root cause.

Can you share logs with GRPC_VERBOSITY=debug GRPC_TRACE=all,-timer,-timer_check flags? It will be better if you have logs for both 1.62.1 and 1.62.2.

@gnossen
Copy link
Contributor

gnossen commented Apr 25, 2024

CC @parthea

@teije01
Copy link
Author

teije01 commented May 7, 2024

I'm sorry, due to time constraints on our end we have not been able to re-create the logs for you. We tried briefly, but there was so much logging that it was hard to weed out any sensitive information out of them. And we have not been able to pin down the relevant sections either.

I did notice that 1.63.0 was just released, I'll report back if this version causes any troubles for us.

@teije01
Copy link
Author

teije01 commented May 15, 2024

A subsequent try to re-surface the error using the same environment failed. We have been able to use 1.62.2 and up without any problems, so it seems that this issue is no longer relevant. Thanks for your assistance so far, and I'll reach out again if we see a similar issue in the future.

@teije01 teije01 closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants