Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CacheDir duplicates cache entry for identical target outputs #4335

Open
mcspr opened this issue Apr 13, 2023 · 4 comments
Open

CacheDir duplicates cache entry for identical target outputs #4335

mcspr opened this issue Apr 13, 2023 · 4 comments
Labels

Comments

@mcspr
Copy link

mcspr commented Apr 13, 2023

Describe the Feature

Originally discussed at platformio/platformio-core#4574
PlatformIO 'environments' are separated as VariantDirs, and often include similar target outputs SDK and library files. Every time environment is built, cache is populated exclusively for that environment alone; it is not possible to share cache entries between environments, thus increasing total compilation time of otherwise exact same project files.

With the following test construct taken from the issue

CacheDir('build_cache')
VariantDir('build1', 'src', duplicate=False)
VariantDir('build2', 'src', duplicate=False)
env1 = Environment()
env1.Program('build1/hello.c')
env2 = Environment()
env2.Program('build2/hello.c')

Empty build_cache and clean build directory, build1 variant is compiled from scratch as expected.
Immediately trying to build build2 generates files from scratch as well, but since it is identical to the build1 one would expect the output files would be taken from build1 cache.

Currently, this does not work like that because of hard-coded path included for the CacheDir entry key, in the FS Node code method

scons/SCons/Node/FS.py

Lines 3705 to 3717 in c80cbb0

# Collect signatures for all children
children = self.children()
sigs = [n.get_cachedir_csig() for n in children]
# Append this node's signature...
sigs.append(self.get_contents_sig())
# ...and it's path
sigs.append(self.get_internal_path())
# Merge this all into a single signature
result = self.cachesig = hash_collect(sigs)
return result

Main question is - what is the best approach to handle such case?
Naive solution would be to simply allow to modify this path, as mentioned in the original issue
master...mcspr:scons:pio4574v2
FS node now has set_cachedir_bsig_path() method, which can be used to substitute that path; either set to some common prefix + filename, or simply removed.

But, I am not sure how that would apply to the env.Program() case, since it does not allow target factory (or, for anything in-between like .o files). Or, whether modifying FS Node code is the best place for this change to happen.

Required information

@acmorrow
Copy link
Contributor

I encountered this behavior at one point as well, and considered whether it was worth fixing. There are, however, some critical wrinkles. For this to work the build environment must guarantee that no path information ends up in the object files. If you have, for instance, debug info that records something like .dwo paths or split dwarf info or any other information like that, then what would otherwise be equivalent object files will end up with different content. If you allow those files to be considered equivalent via the cache by stripping the variant directory from consideration then you can end up retrieving the "wrong" file. Is it a problem in practice? Maybe or maybe not. But think if this behavior is desired it needs to be opt-in.

@LordMike
Copy link

As I understood it (and I might not have) .. wouldn't preprocessor stuff like "Current file" or "Line number" be rendered before hashing and checking the cache?

If so, that would make two different sources, which yields two different outputs.

@bdbaddog
Copy link
Contributor

As I understood it (and I might not have) .. wouldn't preprocessor stuff like "Current file" or "Line number" be rendered before hashing and checking the cache?

If so, that would make two different sources, which yields two different outputs.

If duplicate=False, then the sources and preprocessor stuff would all be the same..so no.
But some tools embed the target path/name in the output files.

@mwichmann
Copy link
Collaborator

Setting aside embedding, I wonder if fiddling the relevant for_signature could help - if a variantdir is active, don't include the variantdir part in the calculation. The interpretation of what to include is claimed to be fairly arbitrary:

        ...    The purpose         
        of this method is to generate a value to be used in signature          
        calculation for the command line used to build a target, and           
        we use this method instead of str() to avoid unnecessary               
        rebuilds.  This method does not need to return something that          
        would actually work in a command line; it can return any kind of       
        nonsense, so long as it does not change.                 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants