Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository.ResolveRevision is very slow #824

Open
pgeorgi opened this issue Aug 3, 2023 · 1 comment
Open

Repository.ResolveRevision is very slow #824

pgeorgi opened this issue Aug 3, 2023 · 1 comment

Comments

@pgeorgi
Copy link

pgeorgi commented Aug 3, 2023

I'm using go-git in a tool that parses two git repos, one containing test results referencing the other. Those references use commit hashes, sometimes in the short format, which finally works in go-git 5.8 (thanks!)

Before 5.8, I worked around some of the problems by calling out into the git executable, and I was thrilled to remove that environmental dependency. See https://review.coreboot.org/c/coreboot/+/68959/3 for the changes I made.

I expected speed to remain approximately the same, or maybe even improve a little because it's one process managing the git repo, not thousands of processes, each starting from scratch to parse the data.

Sadly, ResolveRevision is very slow in my scenario, increasing runtime from 4 seconds to 92 seconds.

I have a reproducer that is somewhat large (pulling two repos) but it's 100% what I was doing, so I'll start by presenting it here. I can look into creating a smaller test case if necessary.

$ git clone https://review.coreboot.org/board-status.git board-status.git
$ git clone https://review.coreboot.org/coreboot.git coreboot.git
$ cp -a coreboot.git coreboot-patched.git
$ (cd coreboot-patched.git && git fetch https://review.coreboot.org/coreboot refs/changes/59/68959/3 && git checkout FETCH_HEAD)
$ (cd coreboot.git/util/docker/coreboot.org-status/board-status.html && go build)
$ (cd coreboot-patched.git/util/docker/coreboot.org-status/board-status.html && go build)
$ (cd coreboot.git && time util/docker/coreboot.org-status/board-status.html/board-status.html > ../current.html 2>/dev/null)

real    0m4,312s
user    0m3,342s
sys     0m1,705s
$ (cd coreboot-patched.git && time util/docker/coreboot.org-status/board-status.html/board-status.html > ../patched.html 2>/dev/null)

real    1m32,850s
user    1m45,974s
sys     0m2,467s

I also added some cpu profiling code, closely following the cpuprofile example in https://go.dev/blog/pprof and ran that to see what is going on. That change can be found at https://review.coreboot.org/c/coreboot/+/76924

Running pprof on cpuprofile output shows:

$ go tool pprof coreboot-patched.git/util/docker/coreboot.org-status/board-status.html/board-status.html tool.prof
(pprof) tree 10 -cum
Showing nodes accounting for 33.88s, 31.67% of 106.98s total
Dropped 492 nodes (cum <= 0.53s)
Showing top 10 nodes out of 75
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                            93.41s   100% |   runtime.main
         0     0%     0%     93.41s 87.32%                | main.main
                                            92.14s 98.64% |   main.collectLogs
----------------------------------------------------------+-------------
         0     0%     0%     93.41s 87.32%                | runtime.main
                                            93.41s   100% |   main.main
----------------------------------------------------------+-------------
                                            92.14s   100% |   main.main
         0     0%     0%     92.14s 86.13%                | main.collectLogs
                                            91.76s 99.59% |   github.com/go-git/go-git/v5.(*Repository).ResolveRevision
----------------------------------------------------------+-------------
                                            91.76s   100% |   main.collectLogs
         0     0%     0%     91.76s 85.77%                | github.com/go-git/go-git/v5.(*Repository).ResolveRevision
                                            90.84s 99.00% |   github.com/go-git/go-git/v5.(*Repository).resolveHashPrefix
                                             0.19s  0.21% |   runtime.mallocgc
----------------------------------------------------------+-------------
                                            90.84s   100% |   github.com/go-git/go-git/v5.(*Repository).ResolveRevision
         0     0%     0%     90.84s 84.91%                | github.com/go-git/go-git/v5.(*Repository).resolveHashPrefix
                                            90.84s   100% |   github.com/go-git/go-git/v5.expandPartialHash
----------------------------------------------------------+-------------
                                            90.84s   100% |   github.com/go-git/go-git/v5.(*Repository).resolveHashPrefix
         0     0%     0%     90.84s 84.91%                | github.com/go-git/go-git/v5.expandPartialHash
                                            90.84s   100% |   github.com/go-git/go-git/v5/storage/filesystem.(*ObjectStorage).HashesWithPrefix
----------------------------------------------------------+-------------
                                            90.84s   100% |   github.com/go-git/go-git/v5.expandPartialHash
     1.96s  1.83%  1.83%     90.84s 84.91%                | github.com/go-git/go-git/v5/storage/filesystem.(*ObjectStorage).HashesWithPrefix
                                            63.49s 69.89% |   github.com/go-git/go-git/v5/plumbing/format/idxfile.(*idxfileEntryIter).Next
----------------------------------------------------------+-------------
                                            63.49s   100% |   github.com/go-git/go-git/v5/storage/filesystem.(*ObjectStorage).HashesWithPrefix
    13.94s 13.03% 14.86%     63.49s 59.35%                | github.com/go-git/go-git/v5/plumbing/format/idxfile.(*idxfileEntryIter).Next
                                            36.55s 57.57% |   runtime.newobject
----------------------------------------------------------+-------------
                                            36.55s 99.73% |   github.com/go-git/go-git/v5/plumbing/format/idxfile.(*idxfileEntryIter).Next
     2.18s  2.04% 16.90%     36.65s 34.26%                | runtime.newobject
                                            34.47s 94.05% |   runtime.mallocgc
----------------------------------------------------------+-------------
                                            34.47s 98.94% |   runtime.newobject
                                             0.19s  0.55% |   github.com/go-git/go-git/v5.(*Repository).ResolveRevision
    15.80s 14.77% 31.67%     34.84s 32.57%                | runtime.mallocgc
----------------------------------------------------------+-------------

If I understand that data correctly, the approach of creating new objects for every object in a pack file in

entry := new(Entry)
leads to excessive costs in runtime.newobject and runtime.mallocgc.

Anything I could improve in my code to avoid this behavior? Anything that needs to be done in go-git? Or does the go compiler need smarter handling for that pattern? For reference, I'm using go version go1.20.7 linux/amd64

@pjbgf
Copy link
Member

pjbgf commented Aug 5, 2023

@pgeorgi The opening of the repository may be improved by the changes included in #799. Would you be able to test that please?

Now for ref/revision resolution, go-git has a 10x cost when referring to a partial commit when comparing to a full hash:

get commit from full hash: in 50.015µs (via r.CommitObject)
get commit from partial hash: in 519.549µs (via storer.HashesWithPrefix)
get commit from revision: in 543.997µs (via r.ResolveRevision)

Depending on how many operations you are doing, just by resolving to a full hash first may provide some gains (as per above).

Similar tests using git CLI had only a 30% additional cost, so this is something that we could try to optimise in go-git. Feel free to propose a PR for that - and tag me for a review.

Code used for the checks above:

package main

import (
	"encoding/hex"
	"fmt"
	"os"
	"path/filepath"
	"time"

	"github.com/go-git/go-billy/v5/osfs"
	"github.com/go-git/go-git/v5"
	. "github.com/go-git/go-git/v5/_examples"
	"github.com/go-git/go-git/v5/plumbing"
	"github.com/go-git/go-git/v5/plumbing/cache"
	"github.com/go-git/go-git/v5/storage/filesystem"
)

func main() {
	url := "https://github.com/go-git/go-git.git"
	directory, _ := os.MkdirTemp("", "gogit-test")
	defer os.RemoveAll(directory)

	Info("git clone %s %s", url, directory)
	fs := osfs.New(directory)
	dot := osfs.New(filepath.Join(directory, git.GitDirName))
	storer := filesystem.NewStorage(dot, cache.NewObjectLRUDefault())
	r, err := git.Clone(storer, fs, &git.CloneOptions{
		URL: url,
	})

	CheckIfError(err)

	fullHash := "56c4bf4ca9789505db7a6eefb910a7259c1fcb79"
	ref := "56c4bf"

	t("get commit from full hash", func() string {
		h := plumbing.NewHash(fullHash)
		commit, err := r.CommitObject(h)
		CheckIfError(err)

		return commit.Hash.String()
	})

	t("get commit from partial hash", func() string {
		prefix, err := hex.DecodeString(ref)
		CheckIfError(err)
		hs, err := storer.HashesWithPrefix(prefix)
		CheckIfError(err)
		commit, err := r.CommitObject(hs[0])
		CheckIfError(err)

		return commit.Hash.String()
	})

	t("get commit from revision", func() string {
		rev, err := r.ResolveRevision(plumbing.Revision(ref))
		CheckIfError(err)

		return rev.String()
	})
}

func t(msg string, f func() string) {
	t1 := time.Now()
	h := f()
	t2 := time.Now()

	Info(fmt.Sprintf("%s: in %s", msg, t2.Sub(t1)))
	fmt.Println(h)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants