Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge amount of upload errors on Grafana Cloud: resource_exhausted push rate limit #63

Open
f0o opened this issue Oct 29, 2023 · 3 comments

Comments

@f0o
Copy link

f0o commented Oct 29, 2023

Every few seconds pyroscope client using the reference config in README errors with:

upload profile: failed to upload. server responded with statusCode: '422' and body: '{"code":"unknown","message":"pushing IngestInput-pprof failed resource_exhausted: push rate limit (0 B) exceeded while adding 70 KiB"}'

We're not talking about hundreds of apps here; it's only 19-20.

What limit am I hitting and what config should I use to prevent/mitigate it?

//Edit:
image

Turns out 20 apps running for 4 days == 50G of data. How can I limit the sampling/reporting rate because this is insane

@kolesnikovae
Copy link
Contributor

kolesnikovae commented Oct 30, 2023

Thank you for reporting the issue @f0o. Indeed, Go profiles can be quite large depending on the workload.

The upload rate can be changed via the UploadRate configuration option. By default profiles are collected and uploaded every 15 seconds. If the application behaviour and load are stable (profiles do not change significantly), you could try to increase it up to, e.g, 30 seconds.

I'm wondering which profile types are enabled. Napkin math shows that each of the apps generates ~100KB of profiling data (uncompressed) every 15 seconds – this is an unexpectedly high data rate. Could you please tell us more about the workload? I'd also like to clarify how many individual processes you're profiling, and what you mean by apps – do you mean 20 instances (processes/hosts/pods) of the same service, or 20 logical services, represented by some fleet?

@f0o
Copy link
Author

f0o commented Oct 30, 2023

Hi @kolesnikovae

I'll look into the UploadRate parameter and tweak it once the retention expires those old profiles.

I'm using:

			ProfileTypes: []pyroscope.ProfileType{
				// these profile types are enabled by default:
				pyroscope.ProfileCPU,
				pyroscope.ProfileAllocObjects,
				pyroscope.ProfileAllocSpace,
				pyroscope.ProfileInuseObjects,
				pyroscope.ProfileInuseSpace,

				// these profile types are optional:
				pyroscope.ProfileGoroutines,
				pyroscope.ProfileMutexCount,
				pyroscope.ProfileMutexDuration,
				pyroscope.ProfileBlockCount,
				pyroscope.ProfileBlockDuration,
			},

With:

		runtime.SetMutexProfileFraction(5)
		runtime.SetBlockProfileRate(5)

And for clarification it's 3 services amounting to 19-20 pods, each very small in resource consumption (we're talking 0.05 cpu and 32-64mb memory). The workload is best described with signal/data forwarding without processing. I was about to write the processing service when I noticed these errors and started disabling profiling everywhere instead.

@kolesnikovae
Copy link
Contributor

kolesnikovae commented Oct 30, 2023

Hi @f0o, thank you for the feedback. I'll double-check everything and report back soon. In the meantime, please consider disabling goroutine, mutex, and block profiles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants