internal: sort more tuples to potentially improve cache hit rates. #20767

kaos · 2024-04-08T10:05:07Z

I looked through most tuple uses in the Python backend, and added sorted(..) where I think the order wasn't already stable and the result ends up as a rule output.

src/python/pants/backend/python/dependency_inference/module_mapper.py

kaos · 2024-05-20T07:51:19Z

~~... oh forgot I had CI test issues.. looking into them now~~ done.

thejcannon

What do you think about using a helper for this, so you can comment why were doing this?

Otherwise it isn't obvious why these things should be sorted.

huonw

Thanks for this, increasing cache hit rates is good!

huonw · 2024-05-20T11:41:14Z

src/python/pants/engine/target.py

+    def __lt__(self, other: Any) -> bool:
+        if not isinstance(other, Target):
+            return NotImplemented
+        return self.address < other.address


I note that I think pair of targets a and b can theoretically have all of these false: a < b, a == b, a > b, if the addresses are equal but at least one of the other values used in __eq__ (__class__, residence_dir or field_values) are not. That is, targets are not "totally ordered".

Does that matter?

I considered that, but no I don't think that matters, as in our case, if the address is equal, it should be the same target, i.e. have the same field values too. So in this case, it's an optimization to not also compare the fields for the lt/gt methods used for sorting.

huonw · 2024-05-20T11:44:38Z

src/python/pants/backend/python/util_rules/local_dists.py

@@ -98,7 +98,9 @@ async def isolate_local_dist_wheels(
    )
    provided_files = set(wheels_listing_result.stdout.decode().splitlines())

-    return LocalDistWheels(tuple(wheels), wheels_snapshot.digest, frozenset(provided_files))
+    return LocalDistWheels(
+        tuple(sorted(wheels)), wheels_snapshot.digest, frozenset(sorted(provided_files))


Minor (and no ned to change), but... I think a frozenset is already order-independent, so there's no need to sort the input.

This suggests another potential solution: use a frozenset (or FrozenOrderedSet) instead of tuple if order truly doesn't matter for all uses. This may be more resilient for future refactorings, because caller can never forget to sort. What do you think?

Thanks, I failed to observe that. It could certainly be valuable to go over and see where order is of no consequence and save a few cycles by not sorting.

Saving that as an exercise for the reader/or future me. ;)

Lets keep an eye on performance, in case it seems to have degraded for largish repos, it could be from the additional sorting going into this.. which suggests using a version of frozen set may be the better route.

sureshjoshi · 2024-05-20T15:13:19Z

What do you think about using a helper

Is this deterministic enough a problem that we could add a ruff/flake linter for it?

kaos · 2024-05-20T15:14:28Z

What do you think about using a helper

Is this deterministic enough a problem that we could add a ruff/flake linter for it?

I think I'd prefer to land this first, to see if it has any noticeable impact at all and take it from there (with helpers/linters etc).

sureshjoshi · 2024-05-20T15:16:17Z

👍🏽 If it is meaningful, let me know - I don't think I've written a linter before, and I'm currently on a multi-language AST spree, so wouldn't mind adding another one under my belt :)

add unit tests internal: sort more tuples to potentially improve cache hit rates. (pantsbuild#20767) I looked through most `tuple` uses in the Python backend, and added `sorted(..)` where I think the order wasn't already stable and the result ends up as a rule output. update docs and test fixture

internal: sort more tuples to potentially improve cache hit rates.

610803c

kaos added the category:internal CI, fixes for not-yet-released features, etc. label Apr 8, 2024

fix sort issue.

62c2cd1

cognifloyd reviewed Apr 12, 2024

View reviewed changes

src/python/pants/backend/python/dependency_inference/module_mapper.py Outdated Show resolved Hide resolved

Merge branch 'main' into kaos/sort-tuples

6a74da1

kaos requested a review from a team May 20, 2024 07:48

make Target sortable.

4c15068

thejcannon approved these changes May 20, 2024

View reviewed changes

huonw reviewed May 20, 2024

View reviewed changes

kaos merged commit 86d8729 into main May 20, 2024
25 checks passed

kaos deleted the kaos/sort-tuples branch May 20, 2024 15:17

kaos mentioned this pull request May 21, 2024

Add support for fine grained diff with line numbers #20531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal: sort more tuples to potentially improve cache hit rates. #20767

internal: sort more tuples to potentially improve cache hit rates. #20767

kaos commented Apr 8, 2024

kaos commented May 20, 2024 •

edited

thejcannon left a comment

huonw left a comment

huonw May 20, 2024

kaos May 20, 2024

huonw May 20, 2024

kaos May 20, 2024

kaos May 20, 2024

sureshjoshi commented May 20, 2024

kaos commented May 20, 2024

sureshjoshi commented May 20, 2024

internal: sort more tuples to potentially improve cache hit rates. #20767

internal: sort more tuples to potentially improve cache hit rates. #20767

Conversation

kaos commented Apr 8, 2024

kaos commented May 20, 2024 • edited

thejcannon left a comment

Choose a reason for hiding this comment

huonw left a comment

Choose a reason for hiding this comment

huonw May 20, 2024

Choose a reason for hiding this comment

kaos May 20, 2024

Choose a reason for hiding this comment

huonw May 20, 2024

Choose a reason for hiding this comment

kaos May 20, 2024

Choose a reason for hiding this comment

kaos May 20, 2024

Choose a reason for hiding this comment

sureshjoshi commented May 20, 2024

kaos commented May 20, 2024

sureshjoshi commented May 20, 2024

kaos commented May 20, 2024 •

edited