Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an Elisp binding for libgit2 #2959

Open
tarsius opened this issue Jan 12, 2017 · 134 comments
Open

Implement an Elisp binding for libgit2 #2959

tarsius opened this issue Jan 12, 2017 · 134 comments
Labels
area: abstraction enhancement New feature or request

Comments

@tarsius
Copy link
Member

tarsius commented Jan 12, 2017

This description was taken from #2956. I intend to replace it with a more in-depth description at a later time.

Magit is slow and part of fixing that involves the use of libgit2, "a portable, pure C implementation of the Git core methods provided as a re-entrant linkable library with a solid API, allowing you to write native speed custom Git applications in any language which supports C bindings." Unfortunately nobody has written that for Elisp yet and since improving performance is a top priority now, I'll to it.

This will be named just libgit.el (or libgit2.el) and be pretty basic, i.e. just expose the functions provided by libgit2 to Elisp.


Older discussions: #2539, #2442 (comment), #1327 (comment). (Yes, this goes back a while, but note that doing this is only even possible since Emacs v25.1, which was released in September 2016.)


Some resources:

And of course...

@jwiegley
Copy link
Contributor

@tarsius If you have any libgit2 related questions, feel free to ask me. I've used it extensively, both for personal projects and in production, and I maintain the Haskell bindings to libgit2.

@tarsius
Copy link
Member Author

tarsius commented May 30, 2017

That's awesome to hear! I suspect you also have some experience with the new module support. I intend to get started with this soon, but could definitely need some help.

@jwiegley
Copy link
Contributor

I haven't yet looked into the module support, but this would be a great way to learn it. Count me in!

@vermiculus
Copy link
Contributor

vermiculus commented Sep 1, 2017

@jwiegley I'm not current on emacs-devel, but have we thought at all about how modules will be distributed? Is package.el planning support for them?

I ask since they impact the structure of the implementation and I've been thinking about starting that project up again personally (now that magithub is stable-ish).

@mgalgs
Copy link
Contributor

mgalgs commented Sep 2, 2017

part of fixing that involves the use of libgit2

I would love to see some data to back up that claim. It sounds right to me, but it would be a shame for you to spend precious time on an optimization that might not bear fruit as anticipated. Perhaps we could start with just a minimal implementation of the libgit2 bindings and do some benchmarks to prove the concept.

I'm guessing you've already thought this through... But it would be nice to see the data. I can help out if there's any dividing and conquering that can be done.

@jwiegley
Copy link
Contributor

jwiegley commented Sep 3, 2017

This is a good question, and maybe we'll be the first ones to answer it for future package authors too. I just don't know yet. :)

@jwiegley
Copy link
Contributor

jwiegley commented Sep 3, 2017

@mgalgs If someone can show me a set of commands that are being used by magit, and which are presumed to be slow, I can tell you how libgit2 might affect the performance there and why. It's possible too that we could use more caching, and more low-level Git commands (for example, direct tree manipulation) to defer going the libgit2 route.

@vermiculus
Copy link
Contributor

@mgalgs @tarsius From my memory of prior conversations, rev-parse is probably the biggest single hitter. See also ksjogo/emacs-libgit2#4.

@tarsius
Copy link
Member Author

tarsius commented Sep 3, 2017

From my memory of prior conversations, rev-parse is probably the biggest single hitter.

... because we call it a lot. So it would be a good idea to implement support for that first.

If someone can show me a set of commands that are being used by magit, and which are presumed to be slow,

The problem isn't that certain git commands are slow on Windows, but that starting subprocesses is slow per se and Magit starts many.

It's possible too that we could use more caching,

We already do a lot of caching. Identical calls (same arguments and directory) during a single refresh (i.e. after every Magit command) get the value from a cache. I don't think there is much room for improvement here. Well there is--see #2982--but that goes much further than just a stupid cache.

and more low-level Git commands

There have been some reports that e.g. rebasing can be slow on Windows, I think. But we cannot do much about that--I certainly don't want to reimplement every Git command that is still implemented as a shell script.

However a few months ago a similar (but much less severe) instance of "starting a subprocess is slow" was fixed, but only on macOS/Darwin. I am hoping that something similar can be done on Windows.

Unfortunately I never got around asking the right people for help. We should dig up the old discussions and then bring those to their attention. The issue on macOS was that "the wrong fork" was being used and since that was being done for a very long time on macOS, the same thing might very well be true on Windows also.

But even if that gives us (not just Magit, but any package that uses many subprocesses) an amazing performance boost, I would still like to be able to use libgit from elisp.

@vermiculus
Copy link
Contributor

For anyone who's curious, I've implemented a type of benchmark in this gist. What I've done is I've redirected magit-git-executable to a shell script that logs the input and times it. I've got some elisp also that processes that; once I'm done running errands, I'll be doing more processing of that log output so we can say with certainty how long we spend doing git commands.

@vermiculus
Copy link
Contributor

More useful metrics would have to correlate these data with the timestamps of actual magit commands, but here are my own numbers (using the approach and code above) after using magit to review some history. First column is number of calls, second column is total time taken by that command.

231  1.6963  "rev-parse --show-toplevel"
196  1.4190  "rev-parse --show-cdup"
 19  0.2855  "show -p --cc --format=%n --no-prefix --numstat --stat --no-ext-diff <short-hash>^{commit} --"
 19  0.2591  "branch --merged <short-hash>"
 19  0.2483  "branch --contains <short-hash>"
 19  0.2398  "show --no-patch --format=%d --decorate=full <short-hash>^{commit} --"
 19  0.1996  "describe --contains <short-hash>"
 19  0.1855  "describe --long --tags <short-hash>"
 19  0.1669  "show --no-patch --format=Author:     %aN <%aE>\nAuthorDate: %ad\nCommit:     %cN <%cE>\nCommitDate: %cd\n <short-hash>^{commit} --"
 19  0.1601  "show --no-patch --format=%h %s <long-hash>^{commit} --"
 19  0.1584  "rev-parse --verify <short-hash>^{commit}"
 19  0.1535  "show --no-patch --format=%B <short-hash>^{commit} --"
 19  0.1506  "rev-parse <short-hash>^{commit}"
 19  0.1495  "rev-list -1 --parents <short-hash>"
 19  0.1485  "cat-file -t <short-hash>"
 19  0.1472  "notes show <short-hash>"
...

@chriscool
Copy link

Be careful with libgit2 as I don't think it implements file locks in a compatible way with Git itself. For example if git gc is run in the background and libgit2 is doing things at the same time, there could be problems.

@jwiegley
Copy link
Contributor

Based on issues like libgit2/libgit2#2902, I think the developers both think about these sorts of issues, and would be open to bug reports about them.

@chriscool
Copy link

Maybe but in general I think libgit2 development is lagging behind Git development. See:
https://github.com/git/git/graphs/contributors
https://github.com/libgit2/libgit2/graphs/contributors
(Disclosure: I am a Git developer)

@jwiegley
Copy link
Contributor

I can certainly believe that.

@ubolonton
Copy link
Contributor

ubolonton commented Dec 23, 2017

I started an experimental module for this https://github.com/ubolonton/magit-libgit2

It currently advises magit-rev-parse to use libgit2 where possible.

Some notes:

  • A quick benchmark on my laptop showed 40x speedup for that function. I'm going to check if the difference can be perceived in daily uses.
  • We should probably add some automated benchmarks, ideally integrated with CI, to identify slow parts.
  • Writing the module in Rust is quite nice. The tooling is good, and I got a live reloading setup going.
  • We can start with implementing only functionalities needed by magit. A generic libgit2.el can be extracted much later on.

@tarsius
Copy link
Member Author

tarsius commented Dec 25, 2017

@ubolonton I am taking a two week break, but am excited to look at this when I get back.

@tarsius tarsius added this to the 2.17.0 milestone Mar 29, 2018
@TheBB
Copy link

TheBB commented Apr 20, 2018

Hi everyone,

I was interested in working on this a bit. From what I see there are two cases of "prior art":

I played around with my own module here: https://github.com/TheBB/libegit2

Now since this is a relatively ambitious project, I'm wary to invite further fracturing, but in my defense, (a) I'm not comfortable with rust, (b) @ksjogo's repo seems abandoned, and (c) I had fun anyway.

I'm aiming for a thin wrapper. If you're familiar with PyQt, it's possible to read Qt's C++ documentation and translate directly to Python. That's the level I'd like to aim for: you can read libgit2's C documentation and use it directly from Emacs with no go-between.

I haven't yet tried to get magit to play with this module.

What's the current status on your side? Should I continue working on this?

@tarsius
Copy link
Member Author

tarsius commented Apr 20, 2018

I'm aiming for a thin wrapper. [...] That's the level I'd like to aim for: you can read libgit2's C documentation and use it directly from Emacs with no go-between.

That's exactly what I was hoping for and would have done eventually if you didn't beat me to it. But it would probably have taken me much longer than someone more familiar with C.

So far this is pretty incomplete but it is very promising that you have already outlined your plans on what you do or don't intend to implement and that you have added documentation that allows others to contribute.

I think I am going to add this to Emacs.g very soon - not just to the magit-directors-cut branch, but master.

I haven't yet tried to get magit to play with this module.

I have only played with it a tiny bit. But that already confirms that this is easily installable (when using borg 😀 ). Also I already ran into the first problem: you probably want to expand-file-name all paths before handing them to libgit2 so that not every caller has to do it (git and libgit2 don't understand ~/).

I am already quite sure that this is what I am going to use in Magit (*). If you would like to do that too, then I would like to welcome this project into the magit "organization". Combined with your instructions, that might help encourage contributions.

Beside the need for greater coverage, I think the most important tasks ahead are:

  1. Reconsider the symbol prefix. While this is the package that most deserves the git- prefix (especially now that the git.el that used to be part of Git itself has been removed), this is going to lead to a lot of conflicts with many existing packages and that could lead to a lot of unnecessary work. What about libgit-, lid-, lgit- or egit? (lgit was the name of my own pre-magit git library, and egit is a very old abandoned magit competitor of sorts.)

  2. Make it easily installable from Melpa. Again its important to get this right or else the price has to be payed later in the form of having to help lots of lost users.


(*) I don't want to discourage other efforts though. (By the way, @ubolonton sorry for not getting back to you.) But I do favor this implementation not least because @TheBB has maintained other important Emacs projects before and because, as I said, his approach is pretty much what I had hoped for. It also has less of a proof-of-concept feel to it.

@tarsius
Copy link
Member Author

tarsius commented Apr 20, 2018

Pinging some people who might be interested in contributing to this effort - @jwiegley @vermiculus @mgalgs @chriscool.

@tarsius
Copy link
Member Author

tarsius commented Apr 20, 2018

(I've added some useful resources to my initial post above.)

@TheBB
Copy link

TheBB commented Apr 21, 2018

Great!

you probably want to expand-file-name all paths before handing them to libgit2 so that not every caller has to do it

That's fair enough.

I would like to welcome this project into the magit "organization".

I'd be happy to.

Reconsider the symbol prefix.

If I can't use git I'd rather just go straight to libgit I think.

Make it easily installable from Melpa.

If the accepted route for packages with compiled components is still the way pdf-tools does it, there's going to be some trying and failing to get that to work. :-s

dandavison added a commit to dandavison/magit that referenced this issue Apr 27, 2020
@tarsius tarsius removed the progress label Aug 9, 2021
@tarsius tarsius removed this from the 2.94.0 milestone Aug 9, 2021
@deifactor
Copy link

deifactor commented Apr 5, 2022

Would there still be interest in the 'external git RPC server' model? I had the idea the other day to write an RPC server with an interface where you just send it, for example, challenge-email --exhale-configure-hash and it responds with the same text that git challenge-email --exhale-configure-hash would (with some JSON encapsulation for stdout vs. stderr, status codes, etc). The intent being that switching between process and RPC models can be done at a low level so you can support both with fairly minimal overhead.

@luismbo
Copy link
Contributor

luismbo commented Apr 5, 2022

@deifactor you'd be implementing (part of) the git CLI on top of libgit2 which is probably valuable regardless of magit. The libgit2 folks are probably interested in something like that. In the context of magit, however, calling libgit2 via FFI is probably (slightly?) more efficient than RPC. Plus there's some amount of text parsing that could probably be skipped since I assume libgit2's data structures are more convenient to work with than git's text output. (But this is just that, an assumption; I've never actually used libgit2. Also, being a C API, much of the convenience may offset by C's clunkiness.)

In any case, if you're motivated to work on the RPC approach rather than the libegit2 (FFI) approach... Sure, why not? :)

@deifactor
Copy link

Oh obviously it'd be better from magit's POV to use a libgit2 approach (directly or via emacs-ffi). But Rust has a better FFI story than Emacs... and I also like it more. :D Plus, like I said, I assume it'd be easier to swap since you'd only modify the commands that actually call git; the tradeoff being, as you said, that you can't do nicer parsing things.

In any case, I'll update if I ever make enough progress to be worth talking about.

@luismbo
Copy link
Contributor

luismbo commented Apr 6, 2022

I'm not a magit maintainer so I can't comment whether an RPC-based backend would be welcome, but at the very least your work could become a demonstration that the libgit2 approach (via FFI or RPC) is worthwhile. Or you might discover that this is not the (only) bottleneck. (For instance, GitExtensions on Windows also invokes lots of git.exe subprocesses, yet it's much faster than magit.)

@tarsius
Copy link
Member Author

tarsius commented Apr 6, 2022

(After following the link I though this was a belayed april's fool prank, not just an awful example, and decided to ignore it.)

Would there still be interest in the 'external git RPC server' model?

If that's something you would like to work on, then you should. I cannot guarantee that Magit will use it if you write it, but if you don't write it then it certainly won't. ;P If something like this already existed I would certainly experiment with it, but since it doesn't exist I haven't really thought about it since it last came up. Coming to the conclusion that using it would be the right thing to do but being unable to do so because it doesn't actually exists would have been frustrating, and there is so much more to do still. But again, this sounds interesting and if you write it I would enjoy experimenting with it but where that would lead, we won't know until we are there.

@deifactor
Copy link

Yeah I definitely wouldn't expect a hard 'yes we will definitely use this' without any actual code in hand, I just wanted to make sure you hadn't already decided 'magit will not use this even if it exists' for whatever reason.

@mateialexandru
Copy link

mateialexandru commented Apr 28, 2022

What's the status of this work? Magit on Windows is still unbearable :( by default taking 2s to refresh the status buffer (Surface Laptop4, on a basic git repo) .
So far, I was able to make load improvements by removing some status hooks => <1s right now.

Still, I am hoping that with libgit I will get that for free, without having to sacrifice my magit buffer functionality,

@tarsius
Copy link
Member Author

tarsius commented Apr 28, 2022

I am also still hoping that I will eventually start using libgit. However for me this is not free at all, but lots of work. That's the reason why it has not happened yet.

That and the fact that both new, little or medium sized, feature requests for Magit and my numerous other packages keep coming in, and that there are also many other much more interesting features to work on that I have also been putting of for years.

In other words, I still plan to do it eventually but when, I do not know.

@brotzeit
Copy link
Contributor

brotzeit commented Apr 28, 2022

I've experimented with libgit and magit and I think for most git commands it's not necessary to switch to libgit. However the performance of commands like magit-status, magit-blame and magit-log could be improved significantly.

EDIT: git blame is very buggy in libgit and would have to be fixed

@mateialexandru
Copy link

@tarsius Maybe I can give a hand? Do you have some good pointers to start with / take a look at?

@mateialexandru
Copy link

I've experimented with libgit and magit and I think for most git commands it's not necessary to switch to libgit. However the performance of commands like magit-status, magit-blame and magit-log could be improved significantly.

EDIT: git blame is very buggy in libgit and would have to be fixed

Thanks @brotzeit ! Funny that the official website says libgit is actively used in production by so many companies : I expected it to be stable

@ethomson
Copy link

ethomson commented May 3, 2022

Thanks @brotzeit ! Funny that the official website says libgit is actively used in production by so many companies : I expected it to be stable

It is, generally. libgit2's blame implementation is an exception. It was hurriedly ported from git and is not up to the quality bar of the rest of the library.

@tarsius
Copy link
Member Author

tarsius commented May 3, 2022

I've experimented with libgit and magit and I think for most git commands it's not necessary to switch to libgit. However the performance of commands like magit-status, magit-blame and magit-log could be improved significantly.

I would take magit-blame of that list too. Here the issue is mostly the overlays and I have recently removed the blame variant that uses them the most and as a result, at least for me, magit-blame has gone from unusable to "that's quite good enough" again.

The problem with magit-log is that it should work asynchronously. Using libgit probably wouldn't make much of a difference. I would like better logs (e.g. better graphs) and that would be easier with libgit, but that is a different topic.

Where libgit would definitely make a difference is when refreshing the status buffer. All those numerous calls to git rev-parse ... and such. (So @mateialexandru, you might want to work on those.) But this would only make a significant difference on Windows. On Linux there would actually be a (probably insignificant) slowdown if we had two implementations, due to the cost of generic functions.

What I believe is going to make a huge different implementing support for not refreshing everything all the time, and support for inserting (e.g. log and diff) sections asynchronously.

The reason I haven't done that yet is that such so many requests for other things keep coming in all the time. Once more I am almost done working through the backlog of "not so important things, that should still be done eventually, and since each on of them doesn't take that long by itself, I might as well do it now" issues, and can soon focus on more interesting and impactful problems. But there are many of those too, and since I have doubts about the impact of using libgit compared to the other mentioned changes that would improve performance, working on libgit is not a priority.

@netjune
Copy link

netjune commented Sep 2, 2023

Would there still be interest in the 'external git RPC server' model? I had the idea the other day to write an RPC server with an interface where you just send it, for example, challenge-email --exhale-configure-hash and it responds with the same text that git challenge-email --exhale-configure-hash would (with some JSON encapsulation for stdout vs. stderr, status codes, etc). The intent being that switching between process and RPC models can be done at a low level so you can support both with fairly minimal overhead.

It is very like the language server protocol. Intresting.

@chriscool
Copy link

Funny that the official website says libgit is actively used in production by so many companies : I expected it to be stable

More and more companies are trying to actually use only Git, instead of relying on both Git and libgit2. For example for GitHub in:

https://github.blog/2023-07-27-scaling-merge-ort-across-github/

they say:

"Previously, we used libgit2 to tick these boxes: it was faster than Git’s default merge strategy and it didn’t require a working directory."

"Two years ago, Git learned a new merge strategy, merge-ort. As the author details on the mailing list, merge-ort is fast, correct, and addresses many shortcomings of the older default strategy. Even better, unlike merge-recursive, it doesn’t need a working directory."

"It was clear that GitHub needed to upgrade to merge-ort. We split this effort into two parts: first deploy merge-ort for merges, then deploy it for rebases."

By the way, I am working on upstreaming git replay which is mentioned on the GitHub post. I work for GitLab and our goal is also to get rid of libgit2 and to use only Git.

@ethomson
Copy link

ethomson commented Sep 4, 2023

More and more companies are trying to actually use only Git, instead of relying on both Git and libgit2.

This feels rather offtopic, but regardless, that's a solid maybe. GitHub may be moving over to merge-ort, but libgit2 spent nearly a decade handling your pull requests. 🤷

It's an interesting data point that GitLab wants to get rid of libgit2, but there are still plenty of people using libgit2 - whether by itself or combined with git - and more new applications are using it daily, for myriad reasons.

@kohnish
Copy link

kohnish commented Nov 21, 2023

Some Json-RPC server process makes the most sense to me if it's so hard to achieve async with git processes. It can also save the round trip in case of tramp.

This de-abstraction endeavor with language binding won't solve the synchronous nature of git, as objects in libgit are not thread-safe. We could use libgit in a such rpc server process, though.

I don't know if a such experimental merge request will be considered by the maintainers, but I'm happy to make an attempt to make some demo commit.

I don't what part is too slow for the users but I think It's also worth considering contributing to git upstream to have some nice interface that magit and others can utilise more efficiently

@tarsius
Copy link
Member Author

tarsius commented Dec 4, 2023

I don't know if a such experimental merge request will be considered by the maintainers, but I'm happy to make an attempt to make some demo commit.

Having a POC would be nice, I would certainly try it out. Merging would be unlikely until it goes way beyond just a POC, but that's where we would have to start.

@mateialexandru
Copy link

@kohnish any updates on the JSON-RPC server?

@mateialexandru
Copy link

@kohnish what language are you planning to use? Seems C++ has a good support: https://github.com/jsonrpcx/json-rpc-cxx

@mateialexandru
Copy link

@tarsius are there any recommendations on what languages / library licenses should be used for executables?
Would a C++ program using a library with MIT license, that can be built with a makefile be good enough?

What is the standard for the programs that Emacs packages depend on? Are they supposed to be easily compiled on any platform, on demand, from Emacs?
Or is it okay to just download the pre-compiled binaries?

@tarsius
Copy link
Member Author

tarsius commented Mar 20, 2024

Rust seems popular nowadays. C and Python might be good options too. C++ probably too.

Are they supposed to be easily compiled on any platform, on demand, from Emacs?

Yes.

Or is it okay to just download the pre-compiled binaries?

Offering that is okay, but having such a complex build process, that most users would end up having to use the provided binaries, is not.

@kohnish
Copy link

kohnish commented Mar 20, 2024

@kohnish what language are you planning to use? Seems C++ has a good support: https://github.com/jsonrpcx/json-rpc-cxx

I wasn't at the stage of implementing yet. I prefer emacs-libvterm like distribution where in the best case scenario, it manages to compile from source without the internet.

The challenge for me is read magit source and come up with a single message to emacs to draw all git-blame results in the same way as it is now. During the development, it could be some stubbed implementation in any language. I'm also not too familiar with both emacs IPC and UI APIs. Magit-blame draws the buffer so wonderfully.

I'm thinking of looking into git blame, because that's the only thing that is very slow or never ends over tramp.

@mateialexandru
Copy link

mateialexandru commented Mar 21, 2024

Do you have any language preference for the server JSON-RPC implementation? @kohnish

@kohnish
Copy link

kohnish commented Mar 22, 2024

Do you have any language preference for the server JSON-RPC implementation? @kohnish

Not really. As long as it compiles fast and runs relatively fast, like c and go.
But for POC, maybe it doesn't has to be json(rpc), just a separate server process, could be even written in bash to spit out git command outputs over pipe.
What to be proven for this POC is whether having a dedicated process makes things any better.

@vermiculus
Copy link
Contributor

Throwing my 2c in here: in my experience, Rust has been the only non-.NET, compiled language that really hasn't been a bear to configure for building on Windows (provided a working MSVC, which has a well-greased installation process with Visual Studio). If one of the goals is to have a straightforward process for building from source on-demand, Rust would be a good choice for cross-platform compatibility. After all, the performance issues being addressed are most evident on Windows.

But of course the best choice will be something in which it actually gets written and maintained ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: abstraction enhancement New feature or request
Development

No branches or pull requests