Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add initial profiling support #626

Merged
merged 57 commits into from Jun 12, 2023
Merged

Conversation

vaind
Copy link
Collaborator

@vaind vaind commented Apr 14, 2023

This brings profiling support based on calling runtime.Stack() periodically to capture stack traces for all goroutines. This is also what the "goroutine" profiler does in the pprof package but pprof only allows you to write the profile in full after it has been captured and does so to a string which we would then need to parse, see WriteTo().
Implements #630

How this implementation works:

  • starting or not starting the profiler is based on an option ProfilesSampleRate which defines a sampling rate
  • when the profiler starts, it launches a goroutine, captures the current stack trace and sets up a timer to capture profiles periodically (at 101Hz)
  • when the timer ticks, all stacktraces are captured with runtime.Stack() unfortunately, it only outputs strings. There's an alternative - runtime.GoroutineProfile() (which it uses under the hood). But we cannot use it because the returned value from that function doesn't give us any identification of individual goroutines. I've created an issue in the runtime - if that's ever implemented, we should make the switch to reduce profiling overhead (avoid serialization to string & parsing back)
  • the captured profile is parsed as-needed and produces directly the sample-profile format expected by Sentry

As can be seen from the benchmarks results, the profiling itself is very fast, with each collection taking fractions of a millisecond (depends on the number of goroutines). I've prioritized performance while trying to sacrifice readability as little as possible.

Followups:

  • profiler UI on Sentry.io doesn't show line numbers at the moment which is confusing because there will appear adjacent samples with seemingly the same stack traces (example event). Only after exporting to JSON, you can identify that the top frame has a different line number inside the same function. We could get around this limitation by dropping-out line numbers and merging these stacks on the client but that seems wasteful - we'd be losing important (to the user) information on what line of code is actually causing the slowdown.
  • profiler UI would benefit from showing the module, e.g. on this profile the worker shown is part of Sentry.
  • create a standalone example app to measure the overhead of running with & without profiling.
  • consider single-goroutine profiling - should be fairly straightforward to do - it really is more of a conceptual question rather than a technological limitation. Additionally, this would affect the following questions:
  • what goroutine to mark as an active one (currently we just pass 0 thread ID). In case we stay with global profiling, we can use the ID of the goroutine that starts the transaction
  • resolving the previous two followups would also lead to a clear path of skipping the profiler goroutine from processing - currently, we send it as well but it is useless because it always has the same function
  • the integration of the profile into event collection is a bit hacked together and adds a custom field to Event. This is because there's currently no captureEnvelope(), only captureEvent() - ideally we should make the refactoring and internally use captureEnvelope() which would allow us to compose any number of items to a single envelope before handing it over to transport

@bruno-garcia bruno-garcia mentioned this pull request Apr 17, 2023
@vaind vaind changed the title wip: profiling Profiling May 17, 2023
@vaind vaind marked this pull request as ready for review May 17, 2023 18:20
@tonyo tonyo requested review from tonyo and cleptric May 22, 2023 07:18
transport.go Fixed Show resolved Hide resolved
@vaind
Copy link
Collaborator Author

vaind commented Jun 7, 2023

The "biggest" issue I know of at the moment is the one with line numbers not showing up, resulting in broken-up stack traces (see above). Are you taking this into consideration with the core profiling team @phacops?

Are there any changes needed before this gets merged @tonyo?

@tonyo
Copy link
Member

tonyo commented Jun 7, 2023

Let us get back to you later today @vaind

@phacops
Copy link
Contributor

phacops commented Jun 7, 2023

The "biggest" issue I know of at the moment is the one with line numbers not showing up, resulting in broken-up stack traces (see above). Are you taking this into consideration with the core profiling team @phacops?

Line numbers are not displayed next to the function name. They are shown in the tooltip. Right now, they are not shown in the tooltip because we don't have a filename in the appropriate field of a frame object.

But that's not the reason it appears in several groups. findPrimeNumber appears several times because those are several different frame objects displayed. I counted 4 frames for findPrimeNumber, each having different line numbers associated to them and when looking at the stacks, when stacks contain a reference to the same frame, the frames are merged together as expected.

You should look at where those line numbers are coming from and if they are correct. The UI is behaving has expected.

@vaind
Copy link
Collaborator Author

vaind commented Jun 7, 2023

The "biggest" issue I know of at the moment is the one with line numbers not showing up, resulting in broken-up stack traces (see above). Are you taking this into consideration with the core profiling team @phacops?

Line numbers are not displayed next to the function name. They are shown in the tooltip. Right now, they are not shown in the tooltip because we don't have a filename in the appropriate field of a frame object.

All frames have a (relative) file name or abs_path. Not sure why this distinction was implemented in sentry-go but I didn't want to touch that. Is this a problem for the profiling product?

But that's not the reason it appears in several groups. findPrimeNumber appears several times because those are several different frame objects displayed. I counted 4 frames for findPrimeNumber, each having different line numbers associated to them and when looking at the stacks, when stacks contain a reference to the same frame, the frames are merged together as expected.

You should look at where those line numbers are coming from and if they are correct. The UI is behaving has expected.

The line numbers are correct. You see multiple frames for a single function because these represent different lines in the same function - they all relate to actual code being executed. For example for findPrimeNumber(), there are, as you've mentioned, four frames:

{"data":{},"function":"findPrimeNumber","in_app":true,"lineno":67,"module":"main","abs_path":"C:/dev/sentry-go/_examples/profiling/main.go"}
{"data":{},"function":"findPrimeNumber","in_app":true,"lineno":68,"module":"main","abs_path":"C:/dev/sentry-go/_examples/profiling/main.go"}
{"data":{},"function":"findPrimeNumber","in_app":true,"lineno":75,"module":"main","abs_path":"C:/dev/sentry-go/_examples/profiling/main.go"}
{"data":{},"function":"findPrimeNumber","in_app":true,"lineno":77,"module":"main","abs_path":"C:/dev/sentry-go/_examples/profiling/main.go"}

You can see that these frames refer to lines 67, 68, 75 and 77, and if you look at the function code you can match that exactly to the code being executed:

image

Most languages don't go into this much detail in their stack traces and only give you the leaf function once. This is not true for Go as it gives you actual lines of code being executed at the time the stack trace dump has been captured.

@phacops
Copy link
Contributor

phacops commented Jun 7, 2023

All frames have a (relative) file name or abs_path. Not sure why this distinction was implemented in sentry-go but I didn't want to touch that. Is this a problem for the profiling product?

I don't see frames having a relative filename and yes, it's a "problem" as we'll not show the filename + line number unless we have the filename and the line number. Not sure why the Go SDK doesn't send that. If you want to show filename + line number, the SDK will have to provide us with a relative filename (on top of the abs_path field).

Most languages don't go into this much detail in their stack traces and only give you the leaf function once. This is not true for Go as it gives you actual lines of code being executed at the time the stack trace dump has been captured.

So, that's an issue. We're expecting the line number where the function is called, not the line number executed at the time of capture. So, in this case, we'd expect 49 for findPrimeNumber (https://github.com/getsentry/sentry-go/pull/626/files#diff-37832c5c0750cd4ba798cab30223bae36e22a4e7ddfb2195eaef137f1902b3b0R49).

We can flag that as a known issue and move on though.

@phacops
Copy link
Contributor

phacops commented Jun 7, 2023

After checking with the team, it seems the situation is a bit more complicated. I think the idea was to get the data how we describe it but platforms reporting line numbers report them the way Go is.

We're discussing what we want to do about this, it seems it's something we haven't paid a lot of attention to. We wouldn't want to merge based on just name though.

@vaind
Copy link
Collaborator Author

vaind commented Jun 7, 2023

I don't see frames having a relative filename and yes, it's a "problem" as we'll not show the filename + line number unless we have the filename and the line number. Not sure why the Go SDK doesn't send that. If you want to show filename + line number, the SDK will have to provide us with a relative filename (on top of the abs_path field).

There is filename and line number, it is an absolute path because there's no way resolve it to a relative one in Go. It just just not the filename attribute of the frame because that one is supposed to relative, according to the specification, see docs. So AFAICT, what Go does is correct in this case.
Is there any problem with showing the filename from the absolute path together with the line number in this case? Do you see any issues with such a fallback when there's no relative path but there's the absolute one? Also, see the below example from a python profile stack trace that actually shows an absolute path with line numbers.

image
image

After checking with the team, it seems the situation is a bit more complicated. I think the idea was to get the data how we describe it but platforms reporting line numbers report them the way Go is.

We're discussing what we want to do about this, it seems it's something we haven't paid a lot of attention to. We wouldn't want to merge based on just name though.

OK, let me know if there's something that should be done here in the Go SDK

@phacops
Copy link
Contributor

phacops commented Jun 7, 2023

OK, let me know if there's something that should be done here in the Go SDK

I don't think so right now, I would proceed with this, we know the issue and it's common to the rest of the platforms so let's not consider it as part of this effort.

@phacops
Copy link
Contributor

phacops commented Jun 7, 2023

Is there any problem with showing the filename from the absolute path together with the line number in this case? Do you see any issues with such a fallback when there's no relative path but there's the absolute one? Also, see the below example from a python profile stack trace that actually shows an absolute path with line numbers.

There is filename and line number, it is an absolute path because there's no way resolve it to a relative one in Go. It just just not the filename attribute of the frame because that one is supposed to relative, according to the specification, see docs. So AFAICT, what Go does is correct in this case.

Yes, Go is correct to report things that way.

Is there any problem with showing the filename from the absolute path together with the line number in this case? Do you see any issues with such a fallback when there's no relative path but there's the absolute one? Also, see the below example from a python profile stack trace that actually shows an absolute path with line numbers.

In that case, the Python SDK is not being a "good citizen" since it's reporting the absolute path in the filename (which is supposed to not be an absolute path https://develop.sentry.dev/sdk/event-payloads/stacktrace/#frame-attributes). The reason we wanted to avoid showing the absolute path is because it can contain PII (usernames), it's useless to link to GitHub or else and it can be long.

Obviously, reality is different, but I don't think we want to show the absolute path in case the filename is missing. I think for now we should keep the status quo (not showing filename and line numbers) as we're having a lot of discussions around all that. Perhaps we'll start showing the line number directly after the frame name regardless if we have a filename or not.

I would proceed with marking that as known issues.

@vaind
Copy link
Collaborator Author

vaind commented Jun 7, 2023

thanks for the update, just wanted to remark on one bit:

The reason we wanted to avoid showing the absolute path is because it can contain PII (usernames), it's useless to link to GitHub or else and it can be long.

Just to note, this has been discussed many times in the past, also for other SDKs and while I don't know what's the general consensus at the moment, the concern of absolute paths being PII and thus not showing them is void because they're already part of the profile reported to Sentry.io and is also part of the downloadable JSON.
The same concern would apply to error reporting (and it's not hidden there)

@phacops
Copy link
Contributor

phacops commented Jun 7, 2023

Just to note, this has been discussed many times in the past, also for other SDKs and while I don't know what's the general consensus at the moment, the concern of absolute paths being PII and thus not showing them is void because they're already part of the profile reported to Sentry.io and is also part of the downloadable JSON.

Yes, I agree, they are not exactly private anyway, I was explaining how we came to not showing the absolute path when we built the UI when we joined Sentry. Now, I think there's still concerns around how long they are but the PII reason tends to fade away.

@vaind
Copy link
Collaborator Author

vaind commented Jun 8, 2023

We're discussing what we want to do about this, it seems it's something we haven't paid a lot of attention to. We wouldn't want to merge based on just name though.

As for merging frames, after thinking I wouldn't do that in, at least not in this case - if you think about it, the UI already gets the information for file and line numbers so if it's passed on to the user, they can figure out exactly which line of the function is the slow one, instead of just guessing based on the function name. But there are arguments in the other direction too si it's not a clear win either way.

Copy link
Member

@cleptric cleptric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create follow-up issues for the TODO comments?
Also, we will require changes to docs and the product itself.
Will you also work on those?

traces_profiler.go Outdated Show resolved Hide resolved
traces_profiler.go Show resolved Hide resolved
interfaces.go Outdated Show resolved Hide resolved
transport_test.go Outdated Show resolved Hide resolved
client.go Outdated Show resolved Hide resolved
vaind and others added 2 commits June 12, 2023 08:48
Co-authored-by: Michi Hoffmann <cleptric@users.noreply.github.com>
@vaind
Copy link
Collaborator Author

vaind commented Jun 12, 2023

Can we create follow-up issues for the TODO comments?

I think I've done so for the bigger issue of envelope build refactoring. For the smaller internal TODOs, I'll do a follow-up PR directly.

Also, we will require changes to docs and the product itself. Will you also work on those?

Sure, I can prepare docs. Not sure what you mean about the followups on the product itself.

@cleptric
Copy link
Member

We'll need to update the onboarding wizard and profiling landing page for Go.

@phacops Would you be ok driving this together with @vaind?
We can have a quick Discord discussion to explain the work required.

@tonyo tonyo changed the title Profiling feat: Add initial profiling support Jun 12, 2023
Copy link
Member

@tonyo tonyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏄

@vaind
Copy link
Collaborator Author

vaind commented Jun 12, 2023

We'll need to update the onboarding wizard and profiling landing page for Go.

@phacops Would you be ok driving this together with @vaind? We can have a quick Discord discussion to explain the work required.

docs PR: getsentry/sentry-docs#7127

@tonyo tonyo merged commit 4a965bc into getsentry:master Jun 12, 2023
14 checks passed
@tonyo
Copy link
Member

tonyo commented Jun 12, 2023

@vaind great work 💪
Thanks @phacops @cleptric for reviews 👍

@vaind vaind mentioned this pull request Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants