Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP PubSub resulting in excessive "Administrator operations per minute" on publish #2539

Closed
CafeLungo opened this issue Feb 16, 2023 · 5 comments · Fixed by #3241
Closed
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed kind/bug Something isn't working
Milestone

Comments

@CafeLungo
Copy link

CafeLungo commented Feb 16, 2023

Our team is running into a problem when we start load testing Dapr using GCP PubSub to publish message. There is an admin action, which we believe is part of the ensureTopic call on every single publish when we are using disableEntityManagement: False on our pubsub.gcp.pubsub Component.

Expected Behavior

  • When publishing messages to GCP PubSub, the extra administrator action of getTopic should not be excessive.
  • Normal limit in GCP is only 6000/min

image

Actual Behavior

  • The call to getTopic is called multiple times in the process of publishing a message: https://github.com/dapr/components-contrib/blob/master/pubsub/gcp/pubsub/pubsub.go#L233-L261
    • It is called via ensureTopic()
    • And again in the after that to publish to the topic.
  • Our team believes this is excessive:
    • ensureTopic may be too aggressive to check that the topic exists on every single publish.
    • The result can likely be cached
    • Or when publishing to the topic via the Topic reference, the PublishResult should have an error if the topic doesn't exist.

Steps to Reproduce the Problem

  • Perform some load testing with some dapr message publishes connected to GCP PubSub, enough to exceed 6k publishes in a minute. May require many parallel containers/pods to achieve this.
with DaprClient() as client:
    # Publish events using Dapr PubSub
    for e in _events:
          client.publish_event(
              pubsub_name="gcp-pubsub",
              topic_name=topic,
              data=serialize_to_json_str(e),
              data_content_type="application/json",
          )
  • Eventually, you may get a warning from GCP for exceeding Administrator operations per minute:
ResourceExhausted desc = Quota exceeded for quota metric 'Administrator operations' and limit 'Administrator operations per minute' of service 'pubsub.googleapis.com' for consumer 'project_number:XXXXXXXXXXXXX'
  • And this results in an error such as this when publishing:
details = "error when publish to topic <your-topic> in pubsub gcp-pubsub: gcp pubsub error: could not get valid topic <your-topic>, rpc error: code = ResourceExhausted desc = Quota exceeded for quota metric 'Administrator operations' and limit 'Administrator operations per minute' of service 'pubsub.googleapis.com' for consumer 'project_number:XXXXXXXXXXXXXX'.
error details: name = ErrorInfo reason = RATE_LIMIT_EXCEEDED domain = googleapis.com metadata = map[consumer:projects/XXXXXXXXXXXXX quota_limit:administratorPerMinutePerProject quota_limit_value:6000 quota_location:global quota_metric:pubsub.googleapis.com/administrator service:pubsub.googleapis.com]"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:50001 {created_time:"2023-02-15T19:45:42.406510859+00:00", grpc_status:13, grpc_message:"error when publish to topic <your-topic> in pubsub gc...

Note: Our team was attempting to simulate traffic that we will experience in production.

Release Note

RELEASE NOTE: FIX Reduced number of administrator calls on publish to a GCP PubSub Topic.

@CafeLungo CafeLungo added the kind/bug Something isn't working label Feb 16, 2023
@CafeLungo
Copy link
Author

Known workaround: Setup a separate pubsub.gcp.pubsub Component with disableEntityManagement: True and use that for publishing. That eliminates the call to ensureTopic(). And it still returns a failure if the topic doesn't exist.

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
 	status = StatusCode.INTERNAL
	details = "error when publish to topic test-publish-missing-topic in pubsub stage-gcp-pubsub-publisher: rpc error: code = NotFound desc = Resource not found (resource=test-publish-missing-topic)."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:50001 {created_time:"2023-02-16T16:02:25.683510586+00:00", grpc_status:13, grpc_message:"error when publish to topic test-publish-missing-topic in pubsub stage-gcp-pubsub-publisher: rpc error: code = NotFound desc = Resource not found (resource=test-publish-missing-topic)."}"

@yaron2
Copy link
Member

yaron2 commented Feb 16, 2023

You are correct, this needs improving. Triaging for 1.11.

@yaron2 yaron2 added this to the v1.11 milestone Feb 16, 2023
@yaron2 yaron2 added the good first issue Good for newcomers label Feb 16, 2023
@berndverst berndverst added the help wanted Extra attention is needed label Feb 23, 2023
@berndverst berndverst modified the milestones: v1.11, v1.12 May 30, 2023
@berndverst
Copy link
Member

No activity here. Moving to 1.12

@artursouza
Copy link
Member

I propose the fix to have a local cache of all topics that went through "ensureTopic()" call and skip those going forward. It will reduce a lot since each instance will only "ensureTopic()" once per topic used.

@ItalyPaleAle ItalyPaleAle modified the milestones: v1.12, v1.13 Sep 12, 2023
@sadath-12
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants