Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: filter profiling data using binary name #4554

Merged
merged 40 commits into from
May 23, 2024
Merged

*: filter profiling data using binary name #4554

merged 40 commits into from
May 23, 2024

Conversation

yomete
Copy link
Contributor

@yomete yomete commented Apr 25, 2024

This PR adds functionality to filter the icicle graph data by a binary. This was achieved in two parts:

  1. by adding a frameFilter parameter to the query request which is an array of strings so we can filter by one or many binaries.

  2. by implementing a new metadata call to fetch all mapping files (binary) attached to a query as well as other pprof labels. The new metadata call is implemented using a new REPORT_TYPE, REPORT_TYPE_PROFILE_METADATA on the query request. This was needed because if we filter the icicle graph data using the binary, after filtering we won't know the "total" set of binaries anymore, only those that we filtered by. But with this new call, we can always display all the binaries, irrespective of what it is being currently filtered by.

Additional things included:

  • We no longer add runtime functions as a binary.
  • The Runtime Filter dropdown (which was essentially a binary filter) has been removed in favor of this new filtering by binary implementation.

image

Copy link

alwaysmeticulous bot commented Apr 25, 2024

🤖 Meticulous spotted visual differences in 259 of 435 screens tested: view and approve differences detected.

Last updated for commit 95e799a. This comment will update as new commits are pushed.

@yomete yomete changed the title *: filter profiling data using binary name (wip) *: filter profiling data using binary name May 2, 2024
@@ -257,6 +257,9 @@ message QueryRequest {

// which runtime frames to filter out, often interpreter frames like python or ruby are not super useful by default
optional RuntimeFilter runtime_filter = 10;

// which binary to filter by
optional string binary_filter = 11;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a repeated field, and I think we can future proof here a bit, and make this a repeated "filter" and a filter can then be of varying kinds, and the first one would be the binary filter, that way we can then define it to be multiple binaries (which in the Go code we would turn into a set and in the actual filter don't do a byte-match but instead check if the binary is in the set)

// keep track of how many values we need to keep to reserve the space upfront
newLen++
}
keepValues[indices.(*array.Int32).Value(i)]++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're missing this line for the Uint32 case


var indices interface{}

switch arr.Indices().(type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do something much simpler here:

switch arr := arr.Indices().(type) {

then arr is already typed to the case and you no longer need to type assert anything

@@ -111,15 +131,26 @@ func CompactDictionary(mem memory.Allocator, arr *array.Dictionary) (*array.Dict
} else {
indexBuilder = array.NewUint64Builder(mem)
}
indexBuilder.Reserve(indices.Len())

switch arr.Indices().(type) {
Copy link
Member

@brancz brancz May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the first type switch

nevermind, .Len exists on array.Array, this type switch shouldn't be necessary at all

@@ -41,15 +40,36 @@ func CompactDictionary(mem memory.Allocator, arr *array.Dictionary) (*array.Dict

newLen := 0
keepValues := make([]int, arr.Dictionary().Len())
for i := 0; i < indices.Len(); i++ {

var indices interface{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this at all, it complicated the code further down as well

b.Append(uint64(newValueIndex))
}
}
case *array.Uint32:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case is duplicate

// keep track of how many values we need to keep to reserve the space upfront
newLen++
}
keepValues[indices.(*array.Int32).Value(i)]++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very important and is missing in both switch cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why but the linter/precommit keeps removing this line


newValues := compactedDict.Dictionary().(*array.Binary)

for i := 0; i < valueDict.Len(); i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to now use newValues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ughhh thanks for spotting that

}

val := stringCol.Value(i)
seen[strings.TrimPrefix(val, "labels.")] = struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be pprof_labels

@@ -905,3 +917,32 @@ func sliceRecord(r arrow.Record, indices []int64) []arrow.Record {
slices = append(slices, r.NewSlice(cur.Start, cur.End))
return slices
}

func (q *ColumnQueryAPI) GetMappingFiles(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to export these


// invert_call_stack inverts the call stacks in the flamegraph
optional bool invert_call_stack = 11;

// a varying set of filters to apply to the query request
Filters filters = 12;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh apologies for missing this earlier, this should be repeated, it should be possible to set multiple filters.

// FunctionNameStackFilter is a filter for filtering by function names
message FunctionNameStackFilter {
// filter is the list of function names to filter by
repeated string functions_to_filter = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be repeated, the top level filter should be repeated (what I commented in the other comment)

}

// Filters is a set of filters to apply to the query request
message Filters {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be singular

@yomete yomete merged commit cc16876 into main May 23, 2024
38 checks passed
@yomete yomete deleted the binary-legend branch May 23, 2024 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants