General

Get rid of unsafe pointer magic WithSidebands (cost: high)

What needs to be done is to transform the &mut StreamingPeekableIter into a child future, and when exhausted, it must be transformed back into the &mut _ that created it. That way, only a single mutable reference to said Iter is present at any time. Unfortunately the generated futures (using async) don't support that as we would have to keep both the future and the parent that created it inside of our own struct. Instead of hiding this using pointers, one could implement the magical part by hand, a custom future, which happily dissolves into its mutable parent iter ref. That would be quite some work though.

Potential for improving performance

gix-odb

Finding an object in a loose object store costs an extra disk IO operation to pacify the borrow checker. This wouldn't be an issue with polonius.

gix-object

tree-parsing performance
- when diffing trees parsing can take substantial time. Maybe optimizations are possible here.

NLL/Borrowcheck limitation gix-odb::(compound|linked)::Db cause additional code complexity

Once polonius is available with production-ready performance, we should simplify the locate(…) code in (compound|linked)::Db() respectively. Currently these first have to obtain an index, and when found, access the data to avoid having the borrowchecker fail to understand our buffer usage within a loop correctly. Performance itself it probably not reasonably affected.

Pack Decoding

Pack decoding takes 5x more memory than git on the android-base repository.
On ARM64 on MacOS the SHA1 implementation of the sha-1 crate is capped at about 550MB/s, half the speed of what I saw on Intel and about 50% slower than what's implemented in libcorecrypto.dylib. Get that fast and the decoding stage will be able to beat git on fewer cores. See this comment for more. Right now we only do when scaling beyond what git can do due to lock contention.
- This should work once the asm feature can be enabled in the sha-1 crate, which currently fails but is tracked in this issue.
  - If it's not fast enough, one might hope that ARM8 instructions can improve performance, but right now they aren't available.
  - Maybe the path forward for that crate is to use system or openssl dylibs.
~~pack::cache::lru::Memory all copy input data in individual allocations. Could a pre-allocated arena or slab be faster?~~
- Probably not, as allocation performance might not be the issue here. Even though there definitely is a lot of effectively useless copying of data and deallocation happening if caches are not used after all.
Add more control over the amount of memory used for the less-memory algorithm of pack-verify to increase cache hit rate at the cost of memory. Note that depending on this setting, it might not be needed anymore to iterated over sorted offsets, freeing 150MB of memory in the process that could be used for the improved cache. With the current cache and no sorted offsets, the time nearly triples.
progress measuring costs when using 96 cores (see this comment)
- potential savings: low
Add '--chunk|batch-size' flag to pack-verify and pack-index-from-data to allow tuning sizes for large amounts of cores
- @joshtriplett write: "I did find that algorithm when I was looking for the chunk size, though I didn't dig into the details. As a quick hack, I tried dropping the upper number from 1000 to 250, which made no apparent difference in performance."
- potential savings: ~~medium~~ unclear
On 96 core machines, it takes visible time until all threads are started and have work. Is it because starting 100 threads takes so long? Or is it contention to get work?
Improve cache hit rate of lookup pack traversal by using partial DAGs build with help of the index
- @joshtriplett writes: "Would it be possible, with some care, to use the index to figure out in advance which objects will be needed again and which ones won't? Could you compute a small DAG of objects you need for deltas (without storing the objects themselves), and use that to decide the order you process objects in?"
- Note that there is tension between adding more latency to build such tree and the algorithms ability to (otherwise) start instantly.
- potential savings: unknown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

general-tasks.md

general-tasks.md

General

Get rid of unsafe pointer magic WithSidebands (cost: high)

Potential for improving performance

gix-odb

gix-object

NLL/Borrowcheck limitation gix-odb::(compound|linked)::Db cause additional code complexity

Pack Decoding

Files

general-tasks.md

Latest commit

History

general-tasks.md

File metadata and controls

General

Get rid of unsafe pointer magic WithSidebands (cost: high)

Potential for improving performance

gix-odb

gix-object

NLL/Borrowcheck limitation gix-odb::(compound|linked)::Db cause additional code complexity

Pack Decoding