Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal: sort more tuples to potentially improve cache hit rates. #20767

Merged
merged 4 commits into from May 20, 2024

Conversation

kaos
Copy link
Member

@kaos kaos commented Apr 8, 2024

I looked through most tuple uses in the Python backend, and added sorted(..) where I think the order wasn't already stable and the result ends up as a rule output.

@kaos kaos added the category:internal CI, fixes for not-yet-released features, etc. label Apr 8, 2024
@kaos kaos requested a review from a team May 20, 2024 07:48
@kaos
Copy link
Member Author

kaos commented May 20, 2024

... oh forgot I had CI test issues.. looking into them now done.

Copy link
Member

@thejcannon thejcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using a helper for this, so you can comment why were doing this?

Otherwise it isn't obvious why these things should be sorted.

Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, increasing cache hit rates is good!

Comment on lines +441 to +444
def __lt__(self, other: Any) -> bool:
if not isinstance(other, Target):
return NotImplemented
return self.address < other.address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that I think pair of targets a and b can theoretically have all of these false: a < b, a == b, a > b, if the addresses are equal but at least one of the other values used in __eq__ (__class__, residence_dir or field_values) are not. That is, targets are not "totally ordered".

Does that matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but no I don't think that matters, as in our case, if the address is equal, it should be the same target, i.e. have the same field values too. So in this case, it's an optimization to not also compare the fields for the lt/gt methods used for sorting.

@@ -98,7 +98,9 @@ async def isolate_local_dist_wheels(
)
provided_files = set(wheels_listing_result.stdout.decode().splitlines())

return LocalDistWheels(tuple(wheels), wheels_snapshot.digest, frozenset(provided_files))
return LocalDistWheels(
tuple(sorted(wheels)), wheels_snapshot.digest, frozenset(sorted(provided_files))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor (and no ned to change), but... I think a frozenset is already order-independent, so there's no need to sort the input.

This suggests another potential solution: use a frozenset (or FrozenOrderedSet) instead of tuple if order truly doesn't matter for all uses. This may be more resilient for future refactorings, because caller can never forget to sort. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I failed to observe that. It could certainly be valuable to go over and see where order is of no consequence and save a few cycles by not sorting.

Saving that as an exercise for the reader/or future me. ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep an eye on performance, in case it seems to have degraded for largish repos, it could be from the additional sorting going into this.. which suggests using a version of frozen set may be the better route.

@sureshjoshi
Copy link
Member

What do you think about using a helper

Is this deterministic enough a problem that we could add a ruff/flake linter for it?

@kaos
Copy link
Member Author

kaos commented May 20, 2024

What do you think about using a helper

Is this deterministic enough a problem that we could add a ruff/flake linter for it?

I think I'd prefer to land this first, to see if it has any noticeable impact at all and take it from there (with helpers/linters etc).

@sureshjoshi
Copy link
Member

👍🏽 If it is meaningful, let me know - I don't think I've written a linter before, and I'm currently on a multi-language AST spree, so wouldn't mind adding another one under my belt :)

@kaos kaos merged commit 86d8729 into main May 20, 2024
25 checks passed
@kaos kaos deleted the kaos/sort-tuples branch May 20, 2024 15:17
TonySherman added a commit to TonySherman/pants that referenced this pull request May 21, 2024
add unit tests

internal: sort more tuples to potentially improve cache hit rates. (pantsbuild#20767)

I looked through most `tuple` uses in the Python backend, and added
`sorted(..)` where I think the order wasn't already stable and the
result ends up as a rule output.

update docs and test fixture
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants