Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to add a new suboption --follow-calls to trace subcommand #3594

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

archanaravindar
Copy link
Contributor

@archanaravindar archanaravindar commented Dec 7, 2023

Support for an option to the trace subcommand, "--follow-calls " which traverses the children of the functions being traced upto the desired depth
specified by the user.
n=0 is the same as the plain trace subcommand
n=1 is print the trace of functions at level=1 which is again equivalent to the plain trace subcommand
n=2 is print the trace of functions at level=2 which prints the immediate child of the function being traced along with the function
n=3 is print the trace of functions at level=3 which prints the function being traced along with two levels of children down the call tree
and so on..

This option also indents the output according to depth for the calls occurring in the program
Calls such as runtime.morestack_noctxt are treated specially and do not adhere to the definition of depth like the other functions
Though we can also trace morestack_noctxt calls if desired

For example, if we have a simple .go program such as the following

package main
import "fmt"
func D(i int) int {
        return i*i*i
}
func C(i int) int {

        return i+20
}
func B(i int) int {
        d:=C(i)+40
        return d+D(i)
}
func A(i int) int {
        return 10+B(i)
}
func main() {
        j := 0
        j += A(2)
        fmt.Println(j)
}

building leaf4 and running delve as follows would
give the following trace output

go build -gcflags '-N -l' ./leaf4.go 


../dlv trace main.A --follow-calls 3 -e=./leaf4^M
> goroutine(1): runtime.morestack_noctxt()^M
> goroutine(1): main.A(2)^M
 > goroutine(1): main.B(2)^M
  > goroutine(1): main.C(2)^M
  >> goroutine(1): => (22)^M
  > goroutine(1): main.D(2)^M
  >> goroutine(1): => (8)^M
 >> goroutine(1): => (70)^M
>> goroutine(1): => (80)^M
> goroutine(1): runtime.morestack_noctxt()^M
80^M

cmd/dlv/cmds/commands.go Show resolved Hide resolved
@@ -941,6 +941,175 @@ func TestTrace2(t *testing.T) {
assertNoError(cmd.Wait(), t, "cmd.Wait()")
}

func TestTraceDepth(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to just have one of these end to end tests and then test the rest of the detailis in service/test/integration2_test.go by just calling ListFunctions directly.

pkg/terminal/command.go Outdated Show resolved Hide resolved
@@ -1467,6 +1467,65 @@ func (d *Debugger) Functions(filter string) ([]string, error) {
return funcs, nil
}

func traverse(t proc.ValidTargets, f *proc.Function, depth int, FollowCalls int) ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should keep track of which functions have been visited already and avoid revisiting them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Alessandro for functions which have a call graph pattern such as the following example we need to be able to print out D twice once from B and once from C hence we kept only the check for recursion

package main
import "fmt"
func D(i int) int {
        return i*i*i
}
func C(i int) int {

        return D(i+10)+20
}
func B(i int) int {
        return i*D(i)
}
func A(i int) int {
        d:= 10+B(i)
        return d+C(i)
}
func main() {
        j := 0
        j += A(2)
        fmt.Println(j)
}

../dlv trace main.A --follow-calls 3 -e=./leafcommon
> goroutine(1): runtime.morestack_noctxt()
> goroutine(1): runtime.morestack_noctxt()
> goroutine(1): main.A(2)
 > goroutine(1): main.B(2)
  > goroutine(1): main.D(2)
  >> goroutine(1): => (8)
 >> goroutine(1): => (16)
> goroutine(1): runtime.morestack_noctxt()
 > goroutine(1): main.C(2)
  > goroutine(1): main.D(12)
  >> goroutine(1): => (1728)
 >> goroutine(1): => (1748)
>> goroutine(1): => (1774)
> goroutine(1): runtime.morestack_noctxt()
1774

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but you can't set more than one breakpoint on D.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here. We could have a map like visited map[string]struct{}{} so that if we've already traversed a function, we don't waste time doing it again. Eventually those results will be removed by uniq later, but it's best to not even waste time going through that call graph again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion with @derekparker we realized that we may end up going further down the call graph of the function depending on how we got to it and how far down in depth we are... so there can be a situation we may miss on a child if we short circuit the traversal
alternatively looking at it, we can come to a function from two different parents and we would then need to traverse such a function twice anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion with @derekparker we realized that we may end up going further down the call graph of the function depending on how we got to it and how far down in depth we are

True, let's make this a breadth first visit instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to stress why this is important: tracing a function that calls fmt.Printf with --follow-calls=10 on my system takes 10 seconds, just to set up the breakpoints. The number of breakpoints it ends up setting is relatively small (222) but to do that it ends up disassembly the same functions thousands of times. For example fmt.(*buffer).writeString is visited by traverse 4727 times (and redisassembled each time), runtime.panicIndex is visited 7900 times.

This implementation is extremely inefficient.

service/debugger/debugger.go Outdated Show resolved Hide resolved
service/debugger/debugger.go Outdated Show resolved Hide resolved
service/debugger/debugger.go Outdated Show resolved Hide resolved
service/debugger/debugger.go Outdated Show resolved Hide resolved
@@ -1467,6 +1467,65 @@ func (d *Debugger) Functions(filter string) ([]string, error) {
return funcs, nil
}

func traverse(t proc.ValidTargets, f *proc.Function, depth int, FollowCalls int) ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here. We could have a map like visited map[string]struct{}{} so that if we've already traversed a function, we don't waste time doing it again. Eventually those results will be removed by uniq later, but it's best to not even waste time going through that call graph again.

service/debugger/debugger.go Outdated Show resolved Hide resolved
@@ -2845,12 +2845,25 @@ func printBreakpointInfo(t *Term, th *api.Thread, tracepointOnNewline bool) {
}
}

var (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be global?

pkg/terminal/command.go Outdated Show resolved Hide resolved
pkg/terminal/command.go Outdated Show resolved Hide resolved
pkg/terminal/command.go Outdated Show resolved Hide resolved
cmd/dlv/cmds/commands.go Show resolved Hide resolved
cmd/dlv/cmds/commands.go Outdated Show resolved Hide resolved
//sdepth:=mainindex-wantindex
sdepth := rootindex - wantindex + 1
//fmt.Printf("follow calls depth %d\n",th.Breakpoint.TraceFollowCalls)
//fmt.Printf("Root func name %s\n",th.Breakpoint.RootFuncName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is doing. Shouldn't you be counting all appearances of traced calls in the stack?

@@ -1467,6 +1467,65 @@ func (d *Debugger) Functions(filter string) ([]string, error) {
return funcs, nil
}

func traverse(t proc.ValidTargets, f *proc.Function, depth int, FollowCalls int) ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion with @derekparker we realized that we may end up going further down the call graph of the function depending on how we got to it and how far down in depth we are

True, let's make this a breadth first visit instead.

@@ -716,12 +730,20 @@ func traceCmd(cmd *cobra.Command, args []string, conf *config.Config) int {
continue
}
for i := range addrs {
var stackdepth int
if traceFollowCalls > 0 {
stackdepth = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if traceStackDepth is greater than 20?

}
}

func printTracepoint(t *Term, th *api.Thread, bpname string, fn *api.Function, args string, hasReturnValue bool) {
if t.conf.TraceShowTimestamp {
fmt.Fprintf(t.stdout, "%s ", time.Now().Format(time.RFC3339Nano))
}
// Simple Trace; no stack required to calculate depth of functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating all this code, calculate a prefix string and then change the Fprintf like this:

fmt.Fprintf(t.stdout, "> goroutine(%d): %s%s(%s)\n", th.GoroutineID, bpname, fn.Name(), args)

becomes:

fmt.Fprintf(t.stdout, "%s> goroutine(%d): %s%s(%s)\n", depthPrefix, th.GoroutineID, bpname, fn.Name(), args)

return
}

stack := th.BreakpointInfo.Stacktrace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the logic behind this calculation is? Why is wantindex even needed? Why are you calculating curi as len(stack) - 1 - i when you later use it in a way that will always cancel the len(stack) term?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were many reasons for this- we needed rootindex to mark the depth of the starting func name as without that, it would report depth relative to main and not the function which was being given as a parameter, calculating it as len(stack)-1-i helps us to traverse from bottom up so that we can set rootindex properly at the first match in case the same function gets recursively invoked later (to avoid overwriting it with a higher depth etc), wantindex was required to mark depth of the function corresponding to the tracepoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but why is wantindex the deepest call of the current tracepoint function and rootindex the least deep call? What if the current tracepoint function is recursively calling itself? Should that not increment the depth? What if the deepest call of the current tracepoint function is deeper than the least deep call of the root function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Alessandro, the way the stack is traversed being bottom up - it appears that rootindex is the least deep call because we are consider depth by subtracting off len(stack), hence when we are computing depth we do it by subtracting the index of the traced function from the root index. if a function calls itself the depth is incremented automatically by the curi being reassigned to curi based on the match in every iteration.
I have tested it for both direct and indirect recursion for instance if A calls itself we get an output trace like this-

./dlv trace main.A --follow-calls 3 -e=./leafrec
1> goroutine(1): main.A(5, 5)^M
 2> goroutine(1): main.A(4, 4)^M
  3> goroutine(1): main.A(3, 3)^M
  3>> goroutine(1):(main.A) => (6)^M 
 2>> goroutine(1):(main.A) => (24)^M
1>> goroutine(1):(main.A) => (120)^M

This is the qstn i did not understand? What if the deepest call of the current tracepoint function is deeper than the least deep call of the root function? can you give me an example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I undestand what this code is trying to do so little that I can't really tell what's intended behavior and what's a bug.

cmd/dlv/cmds/commands.go Outdated Show resolved Hide resolved
pkg/terminal/command.go Show resolved Hide resolved
service/rpc2/server.go Outdated Show resolved Hide resolved
service/test/integration2_test.go Outdated Show resolved Hide resolved
service/test/integration2_test.go Outdated Show resolved Hide resolved
service/debugger/debugger.go Outdated Show resolved Hide resolved
}
}
}
for _, fbinary := range t.BinInfo().Functions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we optimize this by building an index as we go instead of looping through all program functions again? This doesn't necessarily have to happen in this PR, but perhaps it would be better to represent all program functions using a tree data structure as opposed to a slice. Let's do this later actually, but something to keep in mind.

service/debugger/debugger.go Outdated Show resolved Hide resolved
service/debugger/debugger.go Outdated Show resolved Hide resolved
service/debugger/debugger.go Outdated Show resolved Hide resolved
if th.Breakpoint.Tracepoint {
fmt.Fprintf(t.stdout, "> goroutine(%d): %s%s(%s)\n", th.GoroutineID, bpname, fn.Name(), args)
if th.Breakpoint.TraceFollowCalls > 0 {
if sdepth <= th.Breakpoint.TraceFollowCalls && wantindex <= rootindex && rootindex != -1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check (sdepth <= th.Breakpoint.TraceFollowCalls) is inconsistent with the description of --follow-calls


stack := th.BreakpointInfo.Stacktrace
for i := range stack {
curi := len(stack) - 1 - i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either use a three clause for loop to iterate in reverse order, or change it to work in the normal order. This is really confusing to follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a bit confusing to read through.

So, rootindex is the frame closest to the top of the stack for the root function (e.g. foo in the case of dlv trace foo --follow-calls) from the current function, and wantindex is the frame of a child call of that function, correct?

E.g. given:

func f1() {
        f2()
}

func f2() {
        f3()
}

func f3() {}

Running dlv trace f1 --follow-calls=3 once we hit f3 the stack would look something like like:

[3] main.main
[2] f1 <-- rootindex
[1] f2
[0] f3 <-- wantindex

I think wantindex would always equal len(stack) right? Because we want to calculate depth from where we are at currently. In that case I don't think we even need to track it with a variable.

In which case couldn't we refactor the current algorithm like so:

// We do not need to visit stack[0] because that will always be `wantindex`, the child function
// which triggered the breakpoint.
for i := len(stack)-1; i > 0; i-- {
        if stack[i].Function.Name() == th.Breakpoint.RootFuncName {
		rootindex = i
                break // no need to continue looping at this point
	}
}
/*
depth we want == 2
len(stack) == 4
rootindex == 2
wantindex == len(stack)
len(stack) - rootindex == 2
*/
sdepth = len(stack) - rootindex

I think it may be worth extracting this code into it's own function and unit testing it to ensure correctness in different scenarios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this a bit more and there are two possible interpretation to this feature:

  1. the "static" interpretation: tracepoints are set on all functions reachable from the root function up to the specified depth. These tracepoints then behave like normal tracepoints, so, for example, if we set a tracepoint on f with --follow-calls=1 and f recursively calls itself to a depth of 10 we print all ten recursive calls
  2. the "dynamic" interpretation: all calls made from the root function up to the specified depth are traced. If we set a tracepoint on f with --follow-calls=1 and f recursively calls itself to a depth of 10 we only print the first call.

I thought we were going for the "static" interpretation (that's what it did originally and the "dynamic" interpretation is pretty hard to do), but the current implementation is a mix of both. It shouldn't be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated my comment above because I had a few things reversed which may have made my thought a bit more difficult to understand. Also, the issue with recursion isn't an issue at all as I thought about it more.

The more I think about this, I believe this would always be correct:

/*
Given the following stack:

[5] runtime.goexit
[4] runtime.main
[3] main.main
[2] f1 <-- rootindex
[1] f2
[0] f3 <-- wantindex
*/

// We do not need to visit stack[0] (current frame) because that will always be `wantindex`, the child function
// which triggered the breakpoint.
for i := len(stack)-1; i > 0; i-- {
        if stack[i].Function.Name() == th.Breakpoint.RootFuncName {
		sdepth = i
                break // no need to continue looping at this point
	}
}

In other words, we don't need to do any arithmetic at all. Since we're going from the top of the stack down towards the bottom (current frame), the iterator value of the rootindex will always be equal to the depth.

We could rework the algorithm to go from the bottom of the stack (current frame) up, and use the iterator as the depth:

for i := range stack {
    if stack[i].Function.Name() == th.Breakpoint.RootFuncName {
        sdepth = i // always overwrite sdepth whenever we see the root function
    }
}

However it is less efficient because we would have to traverse the entire stack every time, whereas with the current approach of going top down we can short circuit and break as soon as we see the root function appear.

@@ -1467,6 +1467,65 @@ func (d *Debugger) Functions(filter string) ([]string, error) {
return funcs, nil
}

func traverse(t proc.ValidTargets, f *proc.Function, depth int, FollowCalls int) ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to stress why this is important: tracing a function that calls fmt.Printf with --follow-calls=10 on my system takes 10 seconds, just to set up the breakpoints. The number of breakpoints it ends up setting is relatively small (222) but to do that it ends up disassembly the same functions thousands of times. For example fmt.(*buffer).writeString is visited by traverse 4727 times (and redisassembled each time), runtime.panicIndex is visited 7900 times.

This implementation is extremely inefficient.

@archanaravindar
Copy link
Contributor Author

Hi @aarzilli thanks for your suggestion to improve the performance of the traversal algorithm, we are working on this now, can you please give the same example as mentioned in #3594 (comment)
you had seen which was very inefficient so that we could try it with the new implementation and see if we got improvement, thanks!

@aarzilli
Copy link
Member

can you please give the same example as mentioned in #3594 (comment)

Just:

package main

import "fmt"

func f() {
    fmt.Printf("hello world\n")
}

func main() {
    f()
}

and trace main.f with --follow-calls=10, if that's fast for you try --follow-calls=10000.

@archanaravindar
Copy link
Contributor Author

Optimized traversal and tested with example provided in #3594 (comment)
Testing with depth of 10000 the execution time remains well below 1 second
image

@aarzilli
Copy link
Member

aarzilli commented Apr 29, 2024

Optimized traversal

I think that making it a BFS is still preferable to optimizing a DFS with caching:

d := make(map[string]int)
fringe := []string
add := func(fname string, dist int) {
	if curdist, ok := d[fname]; ok {
		if curdist > dist {
			d[fname] = dist
		}
		return
	}
	d[fname] = dist
	fringe = append(fringe, fname)
}

add(f.Name)

for len(fringe) > 0 {
	cur = fringe[0]
	fringe = fringe[1:] // leaking slightly but it should be fine
	if d[cur] >= depth {
		continue
	}
	dcur := d[cur]
	for _, next := range functionsCalledFromDisassembly(cur) {
		add(next, dcur+1)
	}
}

r := []string{}
for fname, dist := range d {
	if dist < depth {
		r = append(r, fname)
	}
}
return r

(not tested, didn't even compile it)

@archanaravindar
Copy link
Contributor Author

archanaravindar commented Apr 30, 2024

Optimized traversal

I think that making it a BFS is still preferable to optimizing a DFS with caching:

Hi @aarzilli if there are no major performance issues in the DFS code u see, can we go ahead and merge this code and later do a performance enhancement ? @derekparker

@aarzilli
Copy link
Member

The problem is not the performance, is that all these optimizations are unreadable and using the correct algorithm would render them unnecessary. Also, there are other comments on this code that have not been addressed in addition to traverse using the wrong algorithm, before we should merge this.

…cing branches by using depth prefix, wrap using %w and other comments
…ify trace output layout

and adjust tests accordingly
Copy link
Member

@derekparker derekparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I like the BFS approach, but I think due to using BFS we can simplify and clean this up even more.

@@ -63,6 +63,11 @@ type Breakpoint struct {
// ReturnInfo describes how to collect return variables when this
// breakpoint is hit as a return breakpoint.
returnInfo *returnBreakpointInfo

// Name of root function from where tracing needs to be done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please start this with the name of the struct member, e.g. RootFuncName is the name of ...


// Name of root function from where tracing needs to be done
RootFuncName string
// depth of tracing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above here, and please expand on this a bit.

stack := th.BreakpointInfo.Stacktrace
for i := len(stack) - 1; i >= 0; i-- {
if stack[i].Function.Name() == th.Breakpoint.RootFuncName {
if rootindex == -1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we break after the first time we set this I don't think we need this branch at all, just break after setting rootindex within the if stack[i].Function.Name() == th.Breakpoint.RootFuncName branch.

fmt.Fprintf(t.stdout, "> goroutine(%d): %s%s(%s)\n", th.GoroutineID, bpname, fn.Name(), args)
if th.Breakpoint.TraceFollowCalls > 0 {
if rootindex >= 0 {
for i := 1; i < sdepth; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this loop you can just write strings.Repeat(" ", sdepth-1).

fmt.Fprintf(t.stdout, ">> goroutine(%d): %s => (%s)\n", th.GoroutineID, fn.Name(), strings.Join(retVals, ","))
if th.Breakpoint.TraceFollowCalls > 0 {
if rootindex >= 0 {
for i := 1; i < sdepth; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment regarding use of strings.Repeat.

childnode := TraceMap[cf.Name]
if childnode == nil {
// childnodes should have already been computed in the above loop
fmt.Printf("unexpected error child nil \n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this would be a programmer error I think it's ok to panic here instead.

// This path has already been explored no need to revisit
continue
}
if childnode.Depth <= FollowCalls {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this always be true due to the above line:

if parent.Depth+1 > FollowCalls {
	// Skip disassembling and computing children if we are going to cross FollowCalls
	continue
}

rootnode.Func = f

TraceMap[f.Name] = rootnode
queue.PushBack(TraceMap[f.Name])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with a slice instead, you could do:

queue := make([]traceFuncPtr, 0, 20)
queue[0] = rootnode

childnode.Func = cf
TraceMap[cf.Name] = childnode
parent.children = append(parent.children, cf)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this else clause right? Since we're doing a BFS now, if we've seen the function before then we won't find anything new if we see it again at a larger depth. I think you can drop the loop below too (for _, cf := range parent.children {), we can just directly enqueue any new function right here, right?

}
parent.dis = true
}
for _, cf := range parent.children {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as noted in an earlier comment) I think this loop is unnecessary and we can directly enqueue in the above loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants