Do not perform domain substitutions within comments #1026

csagan5 · 2020-05-05T07:01:22Z

Covers C/C++, Java, Javascript and Python file extensions

This change will prevent domain substitutions to happen within comments or comments blocks, thus reducing the amount of unnecessary file changes when applying domain substitution.

This is what I had in mind for issue #849.

Covers C/C++, Java, Javascript and Python file extensions

Eloston

I'm okay with the overall idea here (due to the practical problems caused by trying to keep this script simple and non-discretionary in the past, such as the Android manifest schema URL problem a while back), but IMO the implementation is not quite ideal.

One thing that comes to mind is this sort of pre- and post- processing before actual substitution shouldn't be inlined with the actual substitution logic. I think this should be abstracted, since this is the first time that we're introducing pre- and post- substitution logic, and I think there's potential for additional processing in the future. I'll need to think more about how I'd like to organize the code.

My second concern is regarding the implementation. For starters, the regex for the non-(Python and GN) files don't apply to all the selected file types. Secondly, I'm not 100% convinced the regex will work in all scenarios, especially if character escapes get involved. Since domain substitution is an essential component, I'd rather play it safe and implement AST parsing. This should allow us to properly support docstrings as well. I can take a look at how this can be implemented for the filetypes listed here.

csagan5 · 2020-05-31T17:52:37Z

I think there's potential for additional processing in the future.

As a general rule I avoid adding code for the "just in case" situation, but if you have some use-cases to leverage already then sure - why not.

For starters, the regex for the non-(Python and GN) files don't apply to all the selected file types.

What do you mean here? I am genuinely curious about some bug I might not have spotted.

Since domain substitution is an essential component, I'd rather play it safe and implement AST parsing.

That would indeed be a very advanced and complete solution.

Eloston · 2020-05-31T22:53:40Z

As a general rule I avoid adding code for the "just in case" situation, but if you have some use-cases to leverage already then sure - why not.

I think the AndroidManifest schema problem is another such example. Also, anything in DOMAIN_EXCLUDE_PREFIXES could have a smarter exclusion rule associated with them.

What do you mean here? I am genuinely curious about some bug I might not have spotted.

Actually, I was mistaken about this. I didn't realize all those languages supported both of those commenting types.

However, I also realized your code will break if the comment delimiter appears in strings. I don't know if any substitutions could be affected by this, though.

csagan5 · 2020-05-31T23:06:39Z

However, I also realized your code will break if the comment delimiter appears in strings. I don't know if any substitutions could be affected by this, though.

Yes, I am aware, I decided to go with this risk. I have inspected the produced diff (now much smaller thanks to the code in this PR) a few times (even if not thoroughly) and it seemed correct.

Luka-Filipovic · 2020-09-19T22:33:15Z

What issue is tried to be solved by not performing domain substitution on comments? If it's the timestamps, wasn't that taken care of in 1a0e163?

csagan5 · 2020-09-19T22:36:51Z

@Luka-Filipovic I create git commits after performing the domain substitutions and there are several MBs of unwanted changes; I also think that changing the timestamp alone won't prevent the file from being rebuilt if one if its comments changed.

Luka-Filipovic · 2020-09-19T22:59:27Z

@Luka-Filipovic I create git commits after performing the domain substitutions and there are several MBs of unwanted changes;

What percentage of files from lists have links only in comments? Not doing domsub in test files could also help with your problem, and it can be made only by parsing the file name.

When comments and test files are cut out, what could be the next step in improving this process?

I also think that changing the timestamp alone won't prevent the file from being rebuilt if one if its comments changed.

If that's the case (I didn't test it myself yet), would #1185 help the problem?

csagan5 · 2020-09-20T09:00:27Z

Not doing domsub in test files could also help with your problem, and it can be made only by parsing the file name.

I already covered this by excluding all test files/directories I could identify.

What percentage of files from lists have links only in comments?

Can't say, but if you make a git diff you will not be able to easily identify the actual code changes because the comment changes drown them out.

If that's the case (I didn't test it myself yet), would #1185 help the problem?

ninja uses timestamps like make, there's an open ninja issue about using file characteristics.

The problem is when you build Chromium and then run domain substitution: setting all files to a precedent timestamp to trick ninja into not rebuilding those files is wrong, as you cannot tell if only comments or code were changed. Thus what will happen is that there will be no advantage in building Chromium and then your modified version with patches and domain substitution, as most files will be rebuilt because of the changes. ccache mitigates this to some extent, but it still leaves room for improvement; with the patch I proposed here I made this process of re-using unchanged object files more efficient.

As for #1185 I actually think it's better to apply domain substitution after the other patches, so that they can be used individually without dependencies on the big domain substitution patch (which I treat as a patch, not different from any other).

Eloston · 2020-09-27T22:32:46Z

The problem is when you build Chromium and then run domain substitution: setting all files to a precedent timestamp to trick ninja into not rebuilding those files is wrong, as you cannot tell if only comments or code were changed. Thus what will happen is that there will be no advantage in building Chromium and then your modified version with patches and domain substitution, as most files will be rebuilt because of the changes. ccache mitigates this to some extent, but it still leaves room for improvement; with the patch I proposed here I made this process of re-using unchanged object files more efficient.

That's interesting. What use-case do you have where you'd want to build Chromium once first, apply domain substitution, then re-run the build? I envisioned only these two workflows for domain substitution:

Full build: Domain substitution, then build.
Incremental build:
1. Apply domain substitution
2. Build
3. Revert domain substitution
4. Make code changes
5. Repeat steps 1-4 until the the desired build is created.

Of these two workflows, the timestamp manipulation logic is useful only for the incremental build workflow.

csagan5 · 2020-09-28T17:59:15Z

That's interesting. What use-case do you have where you'd want to build Chromium once first, apply domain substitution, then re-run the build?

The Bromite build process is a bit like that:

build the vanilla Chromium (only some small patches applied, and the GN args of course): https://www.bromite.org/chromium
then build Bromite with all the other patches, including the domain substitution

The Chromium builds are important because before a bug is submitted for Bromite it needs to be tested against vanilla Chromium first (yes, we often incur in upstream bugs which become visible downstream because of specific feature/GN flags combinations).

In general you want to make the best out of ninja, C++, Java own invalidation systems, and ccache's one; the changes in the comments paddle against all of them as at least the parser must run to find out that nothing changed.

csagan5 force-pushed the do-not-replace-in-comments branch 3 times, most recently from 5931d0a to 84d04d2 Compare May 5, 2020 11:16

csagan5 changed the title ~~Do not perform replacements within comments~~ Do not perform domain substitutions within comments May 5, 2020

csagan5 force-pushed the do-not-replace-in-comments branch 8 times, most recently from 2384438 to b070bab Compare May 5, 2020 22:42

Do not perform domain substitutions within comments

41d6306

Covers C/C++, Java, Javascript and Python file extensions

csagan5 force-pushed the do-not-replace-in-comments branch from b070bab to 41d6306 Compare May 5, 2020 22:54

csagan5 mentioned this pull request May 16, 2020

[Bromite Builder] [Prepare] Use ungoogled-chromium for domain substitutions nikolowry/bromite-builder#9

Merged

Eloston reviewed May 23, 2020

View reviewed changes

Eloston mentioned this pull request Sep 19, 2020

Perform domain substitution before patching #1185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not perform domain substitutions within comments #1026

Do not perform domain substitutions within comments #1026

csagan5 commented May 5, 2020 •

edited

Eloston left a comment

csagan5 commented May 31, 2020

Eloston commented May 31, 2020

csagan5 commented May 31, 2020

Luka-Filipovic commented Sep 19, 2020

csagan5 commented Sep 19, 2020

Luka-Filipovic commented Sep 19, 2020

csagan5 commented Sep 20, 2020

Eloston commented Sep 27, 2020

csagan5 commented Sep 28, 2020

Do not perform domain substitutions within comments #1026

Are you sure you want to change the base?

Do not perform domain substitutions within comments #1026

Conversation

csagan5 commented May 5, 2020 • edited

Eloston left a comment

Choose a reason for hiding this comment

csagan5 commented May 31, 2020

Eloston commented May 31, 2020

csagan5 commented May 31, 2020

Luka-Filipovic commented Sep 19, 2020

csagan5 commented Sep 19, 2020

Luka-Filipovic commented Sep 19, 2020

csagan5 commented Sep 20, 2020

Eloston commented Sep 27, 2020

csagan5 commented Sep 28, 2020

csagan5 commented May 5, 2020 •

edited