Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make pants help-all more deterministic #20764

Merged
merged 13 commits into from May 22, 2024

Conversation

lilatomic
Copy link
Contributor

@lilatomic lilatomic commented Apr 8, 2024

pants help-all is nondeterministic (see #20621 ) . This fixes all sources of nondeterminism I could find:

  • unsorted name_to_api_type_info
  • __name__ used instead of __qualname__, resulting in collisions for classes generated by distinct_union_per_subclass
  • generated OptionField.__qualname__ not mangled
  • generated __qualname__.GenerateWrapSourceSourcesRequest not mangled
  • EnvironmentAware classes not uniquified
  • classes provided by multiple backends now list all providers
    I've also checked:
  • changing to __qualname__ hasn't broken docs
  • using multiple values in name_to_api_type_info.[].provider hasn't broken docs
  • name_to_api_type_info uses __qualname__ consistently

for api_type in sorted(all_types, key=attrgetter("__name__"))
}
)
def merge_type_info(this: PluginAPITypeInfo, that: PluginAPITypeInfo) -> PluginAPITypeInfo:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better place we could put this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to be with PluginAPITypeInfo

@lilatomic
Copy link
Contributor Author

use of __qualname__ isn't strictly consistent. I think it's a bit of an aesthetic choice, though. For example, should the name of "pants.backend.docker.subsystems.docker_options.DockerOptions.EnvironmentAware" be "EnvironmentAware" or "DockerOptions.EnvironmentAware" ?

@lilatomic
Copy link
Contributor Author

also converting the lazy dict to a dict (although mostly necessary) does mean that the whole structure needs to be computed for any help. I think it should be possible to restore the laziness, either by making the whole element lazy or by making the items be lazy aggregations.

@lilatomic lilatomic force-pushed the fix/deterministic-help branch 2 times, most recently from b0cf01d to b469bad Compare April 28, 2024 20:58
@lilatomic
Copy link
Contributor Author

also converting the lazy dict to a dict (although mostly necessary) does mean that the whole structure needs to be computed for any help. I think it should be possible to restore the laziness, either by making the whole element lazy or by making the items be lazy aggregations.

it's lazy again! not being lazy caused a measureable performance degradation and fixing it wasn't that hard.

@lilatomic lilatomic marked this pull request as ready for review April 28, 2024 20:59
@lilatomic lilatomic force-pushed the fix/deterministic-help branch 2 times, most recently from 9234c89 to 74d20cb Compare April 29, 2024 01:45
Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

I've got some general concerns about how we track/register backend types, but that's not for this PR to fix..

Comment on lines 867 to 872
names_to_types: dict[str, list[type]] = defaultdict(list)
for api_type in sorted(all_types, key=attrgetter("__qualname__")):
names_to_types[PluginAPITypeInfo.fully_qualified_name_from_type(api_type)].append(
api_type
)

infos: dict[str, Callable[[], PluginAPITypeInfo]] = {
k: get_api_type_info(v) for k, v in names_to_types.items()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written using a groupby:

Suggested change
names_to_types: dict[str, list[type]] = defaultdict(list)
for api_type in sorted(all_types, key=attrgetter("__qualname__")):
names_to_types[PluginAPITypeInfo.fully_qualified_name_from_type(api_type)].append(
api_type
)
infos: dict[str, Callable[[], PluginAPITypeInfo]] = {
k: get_api_type_info(v) for k, v in names_to_types.items()
}
api_type_name=PluginAPITypeInfo.fully_qualified_name_from_type
all_types_sorted=sorted(all_types, key=api_type_name)
infos: dict[str, Callable[[], PluginAPITypeInfo]] = {
k: get_api_type_info(tuple(v))
for k, v in itertools.groupby(all_types_sorted, key=api_type_name)
}

@@ -322,7 +323,7 @@ class PluginAPITypeInfo:
name: str
module: str
documentation: str | None
provider: str
provider: tuple[str, ...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not backwards compatible, which is a nuance. Probably not used extensively by end users, but we consume it for docs generation, so need to adapt there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed you have a checked box for this affecting docs.. so 🤷🏽 guess I'm off base here 😬

@lilatomic lilatomic added the category:internal CI, fixes for not-yet-released features, etc. label May 18, 2024
minimises most diff changes
prevents some generated classes from conflicting on name
not being lazy caused a measureable slowdown (1.022 s ±  0.054 s -> 3.184 s ±  0.071 s) in unrelated help goals (`pants help backends`)
turning it lazy basically negates that (1.037 s ±  0.090 s)

more performance gains could be realised by converting from attributes which are lazy but eagerly materialised
to properties which are lazily materialised.
That would require more rework
Creating a lambda in the dictcomp means that the lambda captured a reference to the iterator,
not the value it contained at that time.
Every call to the distinct lambdas would reference the last value processed.
@tdyas tdyas merged commit 770e825 into pantsbuild:main May 22, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants