Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add "workspace" process execution strategy #20772

Merged
merged 16 commits into from May 24, 2024

Conversation

tdyas
Copy link
Contributor

@tdyas tdyas commented Apr 9, 2024

Add a "workspace" process execution strategy and a process_execution::workspace::ComandRunner to execute processes in the workspace instead of the ordinary "local" execution sandbox.

This implements the Rust parts of Design B from the "In-Workspace Process Execution" design document which will add support to Pants for "workspace environments."

This PR supersedes #20740 since the approach of using the existing Process capture workflow fits better with the existing system design.

@tdyas tdyas requested review from stuhood and benjyw April 9, 2024 22:07
@tdyas tdyas changed the title add "workspace" process execution straegy add "workspace" process execution strategy Apr 9, 2024
@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch from c6fd8d3 to beb5516 Compare April 9, 2024 22:09
@@ -219,6 +220,22 @@ impl Core {
exec_strategy_opts.local_keep_sandboxes,
);

// TODO: Wrap this in a BoundedCommandRunner so only run process can execute in the workspace at once.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using BoundedCommandRunner to enforce this?

Copy link
Sponsor Member

@stuhood stuhood May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the sandboxes uniquely named such that they wouldn't collide?

For cases where we are invoking an existing build tool, we'd expect them to do the right thing with regard to locking so that they could safely run concurrently. But I wonder whether it would make it challenging to write correct end users scripts to have concurrency.

It sounds like mutability of declared outputs is hidden, because those go into the sandbox? So only undeclared outputs that didn't go into temporary directories would be a problem. And that is easy to fix... but definitely need to be documented.

I think it's fine to ship without limiting to 1... in future could add a concurrency limit as a workspace option perhaps?

@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch from 51c55a4 to b911f9d Compare April 13, 2024 20:55
@tdyas tdyas marked this pull request as ready for review April 16, 2024 00:05
@tdyas tdyas requested a review from tgolsson April 16, 2024 00:05
@tdyas
Copy link
Contributor Author

tdyas commented Apr 16, 2024

Did some refactoring and added tests. This should be ready for review now.

@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch from d093168 to 54aceee Compare April 16, 2024 00:09
@benjyw
Copy link
Sponsor Contributor

benjyw commented May 2, 2024

Sorry @tdyas, this fell below the fold. I think @stuhood is best positioned to review, if he has time?

@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch 2 times, most recently from 797ee54 to c9ca3fa Compare May 9, 2024 23:07
@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch from c9ca3fa to b35e034 Compare May 10, 2024 15:37
Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks @tdyas : sorry for the long delayed review.

src/python/pants/engine/environment.py Outdated Show resolved Hide resolved
src/python/pants/engine/environment.py Outdated Show resolved Hide resolved
src/python/pants/engine/process_test.py Outdated Show resolved Hide resolved
src/rust/engine/process_execution/src/lib.rs Outdated Show resolved Hide resolved
@@ -219,6 +220,22 @@ impl Core {
exec_strategy_opts.local_keep_sandboxes,
);

// TODO: Wrap this in a BoundedCommandRunner so only run process can execute in the workspace at once.
Copy link
Sponsor Member

@stuhood stuhood May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the sandboxes uniquely named such that they wouldn't collide?

For cases where we are invoking an existing build tool, we'd expect them to do the right thing with regard to locking so that they could safely run concurrently. But I wonder whether it would make it challenging to write correct end users scripts to have concurrency.

It sounds like mutability of declared outputs is hidden, because those go into the sandbox? So only undeclared outputs that didn't go into temporary directories would be a problem. And that is easy to fix... but definitely need to be documented.

I think it's fine to ship without limiting to 1... in future could add a concurrency limit as a workspace option perhaps?

@@ -29,17 +29,22 @@ impl PyProcessExecutionEnvironment {
platform: String,
remote_execution: bool,
remote_execution_extra_platform_properties: Vec<(String, String)>,
execute_in_workspace: bool,
environment_name: Option<String>,
docker_image: Option<String>,
) -> PyResult<Self> {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oy. Need support for ADTs at this boundary. Oh well!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oy. Need support for ADTs at this boundary. Oh well!

What's the preferred way to do that? A Python class per enum variant with them all in a Union?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I will explore such a refactor after this PR has landed.

@tdyas
Copy link
Contributor Author

tdyas commented May 20, 2024

Are the sandboxes uniquely named such that they wouldn't collide?

Yes, the code makes use of the same temporary directory support has the ordinary local executor and so makes uniquely-named temporary directories for the sandbox directories.

@tdyas
Copy link
Contributor Author

tdyas commented May 20, 2024

I think it's fine to ship without limiting to 1... in future could add a concurrency limit as a workspace option perhaps?

Agreed. Seems too early to fix without affirmative proof it is a problem. And it can be a config option in the future. I'll update the comment accordingly.

@tdyas
Copy link
Contributor Author

tdyas commented May 20, 2024

It sounds like mutability of declared outputs is hidden, because those go into the sandbox? So only undeclared outputs that didn't go into temporary directories would be a problem. And that is easy to fix... but definitely need to be documented.

@stuhood: What do you mean by "mutability of declared outputs is hidden"?

@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch from 86a6371 to 5a58a79 Compare May 20, 2024 17:15
src/rust/engine/src/context.rs Outdated Show resolved Hide resolved
tdyas added a commit that referenced this pull request May 21, 2024
Move the exclusive spawn logic out of
`process_execution::local::CommandRunner` into a helper module in
advance of using that logic in the forthcoming "workspace" command
runner to be introduced by
#20772. (This PR was extracted
from #20772.)
@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch from 85db027 to 6c72fc9 Compare May 22, 2024 02:49
@tdyas
Copy link
Contributor Author

tdyas commented May 22, 2024

Rebased on top of main including the refactor in c8cdca0.

@benjyw: This version should be smaller diff size with the refactor.

Comment on lines 12 to 13
# Reserved sentinel value representing execution within the workspace and not local sandbox.
LOCAL_WORKSPACE_ENV_NAME = "__local_workspace__"
Copy link
Contributor Author

@tdyas tdyas May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentinel value was added so tests in this PR could make use of the workspace process execution strategy even without the next PR in the series (which adds the workspace_environment type). That next PR in the series will remove this particular use and rename it.

Maybe it should be renamed so the support in this PR is not in the public API (in case the next PR is delayed in landing)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @benjyw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefixed the symbol with underscore to make it clear that the symbol is not public API, and added a comment to that effect.

@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch from 6c72fc9 to 212167f Compare May 23, 2024 16:13
@tdyas tdyas force-pushed the workspace_process_intrinsic_as_strategy branch from 212167f to 58fc224 Compare May 23, 2024 16:25
Copy link
Sponsor Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks for pairing on this!

@tdyas tdyas merged commit a79beec into pantsbuild:main May 24, 2024
25 checks passed
@tdyas tdyas deleted the workspace_process_intrinsic_as_strategy branch May 24, 2024 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants