Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support C++20 Modules #19940

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

PikachuHyA
Copy link

this PR implement the support C++20 Modules in bazel.

the design doc: bazelbuild/proposals#354

the discussion: #19939

the demo: https://github.com/PikachuHyA/async_simple

the extra tests: https://github.com/PikachuHyA/bazel_cxx20_module_test

see #4005

@PikachuHyA PikachuHyA requested review from gregestren and removed request for a team October 25, 2023 11:08
@google-cla
Copy link

google-cla bot commented Oct 25, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Configurability Issues for Configurability team team-Rules-CPP Issues for C++ rules labels Oct 25, 2023
@lberki lberki requested review from comius and removed request for oquenchil, ahumesky, ted-xie and gregestren October 25, 2023 13:32
@comius
Copy link
Contributor

comius commented Oct 25, 2023

This PR already got a lot of attention at Google in the group of C++ toolchain maintainers / experts. There’s a desire to have it, but no concrete/incompatible plans yet. The design would need some changes so that it’s compatible and supports Google well. (Think of easier maintenance in the future)

I’m not an expert in C++, but I will start the discussion internally and come back with possible requirements/changes when we figure out what they are.

@comius comius self-assigned this Oct 27, 2023
@comius
Copy link
Contributor

comius commented Oct 27, 2023

Some people are out of office. The main discussion will start second week of November. I’ll post next update after that.

@sgowroji sgowroji added awaiting-user-response Awaiting a response from the author and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 8, 2023
@PikachuHyA PikachuHyA force-pushed the cxx20-modules-support branch 2 times, most recently from 826867b to cf2c9ad Compare November 16, 2023 08:32
@PikachuHyA
Copy link
Author

I rebase the PR to the latest master branch due to MODULE.bazel.lock conflict

@PikachuHyA
Copy link
Author

Some people are out of office. The main discussion will start second week of November. I’ll post next update after that.

gentle ping :-)

@mathstuf
Copy link

CMake developer here; just tracking how modules are being implemented in various places :) .

I read through the design doc and had a few comments. Since it was already merged, I figured that here may be better; can move wherever is best though.

  • Two-phase compilation is only supported by Clang. With the work ongoing to make smaller BMIs, a .pcm → .o rule may not be so feasible in the future. There may be a way to do .full.pcm → .importable.pcm / .full.pcm → .o` though? In any case, this is something the build system can hide away from the user interface pretty easily.
  • I notice that references are not tracked in .CXXModules.json files. This was found to be necessary in CMake for MSVC where BMIs contain no transitive references to the BMIs they need. GCC still embeds them; Clang is deprecating it. This helps the reproducible case but means that the build system needs to track transitive imports to specify in the .modmap files when using modules. As an example, if the module import looks like leaf → intermediate → impl → detail, the P1689 is only going to report one level at each scan (i.e., intermediate's .ddi file won't specify detail unless directly imported). The .CXXModules.json must somehow store "I see an import of intermediate; impl and detail need specified as well".

@ChuanqiXu9
Copy link

ChuanqiXu9 commented Nov 18, 2023

a .pcm → .o rule may not be so feasible in the future.

No, clang don't have such plans (deprecating 2 phase compilation model) at least for now.

Two-phase compilation is only supported by Clang.

Yes but the story of the 2 phase compilation model seems really appealing. So the build system supporting 2-phase compilation model may be a positive advantages. And in the future, the build systems may be able to support both (or even more) compilation models and the users can make the choice.

@mathstuf
Copy link

No, clang don't have such plans (deprecating 2 phase compilation model) at least for now.

To be more precise, there may be multiple kinds of BMIs in the future and Clang may have a 3-phase with the trimmed BMI being the "interesting" bit for importers in the future, but still using the full BMI for codegen. Clang is also getting a (proper rather than "frontend does the 2-phase internally" of today) 1-phase compilation like GCC and MSVC as well.

Yes but the story of the 2 phase compilation model seems really appealing.

I agree. However, I prioritized 1-phase over 2-phase for CMake due to compiler support.

And in the future, the build systems may be able to support both (or even more) compilation models and the users can make the choice.

Agreed. However, given the simplicity of the 1-phase, I find it better for the initial implementation. There are a number of performance things that can be looked at in the future:

  • only-if-changed on more minimal BMI files
  • target-wide batch scanning
  • grouped target-wide batch scanning
  • grouped target-wide batch collation

Basically my main interest is in getting things working across the ecosystem as a baseline before we start up our ricer cars. Of course, Bazel can do as they please; I can only offer my view on things here.

@mathstuf
Copy link

This issue was filed against CMake. Unconditional redirection of clang-scan-deps may be unwise in the case of a failed scan. Maybe Bazel doesn't care given its execution strategies, but it is something to consider at least.

@taekahn
Copy link

taekahn commented Apr 19, 2024

The difficulty in accepting this PR is that the current design might differ from the final one we will eventually land on. We’re not even clear what new attributes will be added to cc_binary, what their semantics is and changing those later, would cause problems to the Bazel community.

The great thing about code, is that it can be changed.
Bazel didn't pop out fully formed. Things have been changed, deprecated, reimplemented in different ways. toolchains comes to mind.
How is this any different?

@zaucy
Copy link
Contributor

zaucy commented Apr 19, 2024

Merging it when our design concerns are met might be an option.

I like this option. Allow the community to develop this sooner than Google's timeline, but let us address your design concerns.

@PikachuHyA
Copy link
Author

Google would like to implement the support for C++20 modules in Bazel and deploy them internally, however our timeline for this is in about 2 years.
... but at the same time not something we can prioritize now, because we decided to not use C++20 Modules ourselves in the next few years.

I am deeply concerned with the two-year wait you’ve specified. This delay reflects an internal decision-making process that drastically impedes the progress of the Bazel user community. The entire Bazel community is being forced to align with your internal timeline, with their current needs being disregarded. This inevitably ties the hands of the community, curtailing experimentation and the adoption of new technologies in a timely manner. The motivation of the patch is that the missing C++20 Modules support in Bazel blocks our internal uses.

Scalability and code style are indeed very important factors. That's also the reason we need code reviews. As mathstuf mentioned, it's not only Google that could achieve a battle-tested implementation. The open-source world thrives on responsiveness and iterative improvements. In the community, we should also be able to obtain an implementation that is scalable and well-designed through rigorous review processes. Tools and features improve through use, feedback, and refinements—not through prolonged periods of inaction. The community is more than capable of handling incremental changes and can be entrusted to do so.

The scanning will eventually need to happen completely in parallel, which we believe is possible, however not trivial. Using clang-scan-deps might be problematic in this context. Having an implementation in Java would be easier to work with.

Having an implementation in Java will require implementing a preprocessor. It may not be trivial. It shouldn't be bad to use compiler native scanners. Also it won't be a blocking issue if we want to implement a Java scanner some day. The scanner should be changeable by its nature. For example, there exist two implementations for collecting header file dependencies: one using Java native (see #13871) and the other using compiler native.

@PikachuHyA
Copy link
Author

The difficulty in accepting this PR is that the current design might differ from the final one we will eventually land on. We’re not even clear what new attributes will be added to cc_binary, what their semantics is and changing those later, would cause problems to the Bazel community.

The great thing about code, is that it can be changed. Bazel didn't pop out fully formed. Things have been changed, deprecated, reimplemented in different ways. toolchains comes to mind. How is this any different?

+1

@mathstuf
Copy link

Having an implementation in Java will require implementing a preprocessor. It may not be trivial.

Indeed as import statements guarded by #if __has_feature() must be accurately evaluated. I am not sure how anything other than the compiler itself can be expected to answer such questions reliably without whitelisting versions (and configurations!) of toolchains that are supported by an external scanner.

It shouldn't be bad to use compiler native scanners.

Agreed. Batch scanning would be an improvement for one-shot builds, but incremental/development builds may be better off with per-TU scanning commands (subject to process launch costs…things we need measurements for to actually know).

@comius
Copy link
Contributor

comius commented Apr 29, 2024

Hey @PikachuHyA,

thanks for your patience. The replies sparked some more internal discussions, and there seems to be more benefit from your implementation to Google than what was previously thought.

I don't have a green light yet, but we're considering accepting the change, under an experimental flag, that guards the additional attributes on C++ rules and the implementation. The flag would mean that both implementation and the public interface is subject to change in the future.

There's not so much objection to clang-scan-deps, so I guess you can keep it in the initial implementation.

For the sake of improving the quality of the review, do you think you could break this XXL PR into several digestible pieces? I'll take care that each piece is reviewed in a couple of business days.

From Bazel perspective the tricky part might be additional fields on C++ providers and C++ actions. Those might cause a regression, we need to benchmark it and figure out if it's small enough to be justifiable or should we do something extra to remove it.

cc @sam-mccall, @jyknight, @ilya-biryukov, @pzembrod

Copy link

@mathstuf mathstuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to consider a way to tell Bazel that some sources do not use modules and can therefore completely skip scanning (and, if nothing in the target needs scanned, the target's collation step as well).

Comment on lines +360 to +361
// if cpp20_module enabled, only c++20-deps-scanning will produce .d file
// other actions will reuse the .d file from c++20-deps-scanning

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate as the "real" compile may mention other files in its .d output. Including, but not limited to:

  • the BMI files that are read (or only those that are used)
  • modmap files
  • header units which are translated into imports may stop reading the header and read the BMI directly

The last one should be covered by the header changing -> trigger a rescan, not listing it here allows the build graph to not-run the compile in case its change is non-consequential to the compile by waiting for the scanning to say so rather than queuing up the compile automatically.

Preconditions.checkState(module.isFileType(CppFileTypes.CPP_MODULE), "Non-module? %s", module);
var skyValue = actionExecutionValues.get(module.getGeneratingActionKey());
if (skyValue == null) {
return null;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a problematic error case; no messages or context about what happened?

public CppCompileActionBuilder setPcmFiles(NestedSet<Artifact.DerivedArtifact> pcmFiles) {
this.pcmFiles = pcmFiles;
return this;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems weird here. Also looks like a missing newline after this brace.

Comment on lines +96 to +98
<li>Clang use cppm </li>
<li>GCC can use any source file extension </li>
<li>MSVC use ixx </li>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three can use any extension with the right flags (e.g., -x c++-module or -interface/-interfacePartition). These are the preferred extensions.

var scanDepsBuilder = initializeCompileAction(sourceArtifact);
scanDepsBuilder.setActionName(CppActionNames.CPP20_DEPS_SCANNING);
scanDepsBuilder.setOutputs(ddiFile, dotdFile, null);
// only c++20-deps-scanning add .d file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted elsewher, this seems unwise.

content.append("module-file=");
content.append(moduleName);
content.append("=");
content.append(modulePath);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any escaping mechanism required to be considered (e.g., spaces in the path)?

actionExecutionContext,
out -> {
OutputStreamWriter content =
new OutputStreamWriter(out, StandardCharsets.ISO_8859_1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah; not UTF-8? Or are paths in Bazel essentially limited to ASCII? Charset.defaultCharset() seems better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular charset makes Strings behave just like byte arrays, which is useful when supporting both Unix (where paths are essentially just byte arrays) and Windows (which at least with the system APIs Bazel is using requires UTF-16). Unless the tools involved have any particular encoding requirements, this charset should be the most compatible choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth a comment, but I'm not the intended audience here; maybe it's just implicit knowledge for Bazel developers.

Comment on lines +48 to +49
@SerializedName("source-path")
private String sourcePath;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future header unit support would require reading use-source-path (bool) and lookup-method (enum). It might be prudent to read these and fail gracefully with a message about header unit non-support.

@comius
Copy link
Contributor

comius commented May 22, 2024

TL;DR The Bazel team has decided to accept this PR, I'll be doing the reviews and I'll get some help from internal C++ experts, namely @trybka.

We identified the following risks:

  • increase in maintenance cost for the Bazel team
  • divergent implementations in Bazel and at Google, or no implementation at Google
  • newly introduced complexity in CppCompileAction

We'd like to keep the maintenance costs at minimum - Bazel team will only do reviews on PRs after the initial community review. We won't address any issues that are reported. We don't mind if the community addresses them.

We'd like to keep the change behind an experimental flag, to mitigate the risk of divergent implementations. While the change is under the experimental flag, there is no guarantee about incompatible changes. If Google does an internal implementation, we'd like it to match, to reduce maintenance costs.

We'd also like to make the change as "modular" as possible, in order to make it easier to remove the future. That might happen in an unlikely scenario, that Google doesn't implement support for the C++20 modules and that this remains the only complexity in CppCompileAction that we can't be rewritten to Starlark. In case this scenario plays out, the C++20 modules support will probably need to be implemented in a different way.

That said, we do see the benefits of this change for both the community and Google. Thank you for your contribution.

@PikachuHyA
Copy link
Author

For the sake of improving the quality of the review, do you think you could break this XXL PR into several digestible pieces? I'll take care that each piece is reviewed in a couple of business days.

hi @comius , I have split this XXL PR into 6 smaller commits. Initially, I hoped to divide it into independent small patches (see #22425 , #22427), but that proved to be unfeasible due to dependencies between the patches (#22429). Later, I plan to use stacked PRs to facilitate code review. However, stacked PRs require creating branches in the target repository first, and I'm not sure if I could be granted the necessary permissions. I've also created a demo of stacked PRs in my repository (https://github.com/PikachuHyA/bazel/pulls) as bakup.

Do you have any suggestions on code review process?

BTW. the windows CI is broken, I will fix it later.

@PikachuHyA
Copy link
Author

@mathstuf Thanks for your comments.

I will make the related code changes as soon as possible.

@mathstuf
Copy link

Later, I plan to use stacked PRs to facilitate code review. However, stacked PRs require creating branches in the target repository first,

Nothing should require that; tools doing so should…work on that. It's kind of crazy to make tools not available for external contributors to projects. I believe https://stacked-git.github.io/ does most of its work locally so that at least you're not tied to any Github limitations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer awaiting-user-response Awaiting a response from the author team-Configurability Issues for Configurability team team-Rules-CPP Issues for C++ rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants