Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when combining CacheDir and MSVC_BATCH #4361

Open
CiderMan opened this issue May 25, 2023 · 4 comments
Open

Bug when combining CacheDir and MSVC_BATCH #4361

CiderMan opened this issue May 25, 2023 · 4 comments
Labels

Comments

@CiderMan
Copy link

Describe the bug
We have recently switched on CacheDir in our CI machine and, while this has largely been a positive experience, we have seen the odd link failure that we could not explain. I was eventually able to reproduce the issue and produce a minimal SConstruct that illustrates the problem. It is an interaction between the MSVS batching and the cache.

Required information

  • Link to SCons Users thread discussing your issue.
  • Version of SCons: 4.2.0 in production. Reproduced on 4.5.2.
  • Version of Python: 3.8.10
  • Which python distribution: python.org
  • How you installed SCons: PIP
  • What Platform are you on? Windows 10
  • How to reproduce your issue? How you invoke scons (The command line you're using "scons --flags some_arguments")

To reproduce the problem, I have created two C files and a SConstruct:

src/main.c

void Foo(void);

int main()
{
   Foo();
   return 0;
}

src/foo.c

void Foo(void)
{
   //
}

SConstruct

CacheDir(r"C:\tmp\cache")

env1 = Environment(MSVC_BATCH=True)
env1.VariantDir("build1", "src", duplicate=False)
env1.Program("build1/a.exe", ["build1/main.c", "build1/foo.c"])

env2 = Environment(MSVC_BATCH=True)
env2.VariantDir("build2", "src", duplicate=False)
env2.Program("build2/a.exe", ["build2/main.c", "build2/foo.c"])

The steps to reproduce the issue are:

  1. Perform a build (with just scons) to populate the cache
  2. Delete build1/main.obj
  3. Edit src/foo.c such that it needs to be rebuilt and won’t be in the cache
  4. Perform another build

Running scons, I would expect build1/main.obj to be retrieved from the cache, foo.c to be builts to produce build1/foo.obj and build2/foo.obj before the two Programs are linked. What I actually get is a build failure:

$ scons
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
Retrieved `build1\main.obj' from cache
cl /Fobuild1\\ /c src\main.c src\foo.c /nologo
main.c
foo.c
Generating Code...
link /nologo /OUT:build1\a.exe build1\main.obj build1\foo.obj
Retrieved `build2\main.obj' from cache
cl /Fobuild2\\ /c src\foo.c /nologo
foo.c
link /nologo /OUT:build2\a.exe build2\main.obj build2\foo.obj
LINK : fatal error LNK1181: cannot open input file 'build2\main.obj'
scons: *** [build2\a.exe] Error 1181
scons: building terminated because of errors.

At this point build2/main.obj has been removed (hence the link failure) but running scons again succeeds:

$ scons
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
Retrieved `build2\main.obj' from cache
Retrieved `build2\foo.obj' from cache
link /nologo /OUT:build2\a.exe build2\main.obj build2\foo.obj
scons: done building targets.
@mwichmann
Copy link
Collaborator

To add what I said in the referenced email thread: can reproduce this, and believe, though don't have definitive proof, that the presence of two variantdirs is also a necessary part of the triggering - however, I've not added the variantdir label onto this issue (yet).

@CiderMan
Copy link
Author

OK, I've monkey patched os.unlink and can see where the file is getting deleted:

Retrieved `build2\main.obj' from cache
Unlinking build2\main.obj
  File "C:\Python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\tmp\repro\.venv\Scripts\scons.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Script\Main.py", line 1412, in main
    _exec_main(parser, values)
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Script\Main.py", line 1374, in _exec_main
    _main(parser)
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Script\Main.py", line 1154, in _main
    nodes = _build_targets(fs, options, targets, target_top)
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Script\Main.py", line 1349, in _build_targets
    jobs.run(postfunc = jobs_postfunc)
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Taskmaster\Job.py", line 123, in run
    self.job.start()
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Taskmaster\Job.py", line 221, in start
    task.execute()
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Script\Main.py", line 195, in execute
    SCons.Taskmaster.OutOfDateTask.execute(self)
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Taskmaster\__init__.py", line 242, in execute
    t.fs.unlink(t.get_internal_path())
  File "c:\tmp\repro\.venv\lib\site-packages\SCons\Node\FS.py", line 1184, in unlink
    return os.unlink(path)
  File "C:\tmp\repro\SConstruct", line 8, in _unlink
    traceback.print_stack()

In other words, it is the t.fs.unlink in the following piece of code:

        try:
            cached_targets = []
            for t in self.targets:
                if not t.retrieve_from_cache():
                    break
                cached_targets.append(t)
            if len(cached_targets) < len(self.targets):
                # Remove targets before building. It's possible that we
                # partially retrieved targets from the cache, leaving
                # them in read-only mode. That might cause the command
                # to fail.
                #
                for t in cached_targets:
                    try:
                        t.fs.unlink(t.get_internal_path())
                    except (IOError, OSError) as e:
                        SCons.Warnings.warn(SCons.Warnings.CacheCleanupErrorWarning,
                            "Failed copying all target files from cache, Error while attempting to remove file %s retrieved from cache: %s" % (t.get_internal_path(), e))
                self.targets[0].build()
            else:
                for t in cached_targets:
                    t.cached = 1
        except SystemExit:

Dumping out the values of cached_targets and self.targets at the time of the erroneous deletion:

cached_targets = ['build2\\main.obj']
self.targets = ['build2\\main.obj', 'build2\\foo.obj']

I'm guessing that this intends to rebuild main.obj but, because of batching and because main.obj is not part of the batch due to it being up-to-date, it doesn't get rebuilt?

I hope that this is of some help...

@bdbaddog
Copy link
Contributor

bdbaddog commented May 30, 2023

I'm not sure batching can be compatible with cachedir.. </before coffee thoughts>
How much speedup do you see from using MSVC_BATCH?

@CiderMan
Copy link
Author

CiderMan commented May 31, 2023

Hi @bdbaddog,

Doing a clean build of one of our build products as an example:

  • With MSVC_BATCH: 10m30.079s
  • Without MSVC_BATCH: 16m20.716s

We could certainly disable MSVC_BATCH when building with a cache but we would need to analyse the results to be sure that the cache provides more speed up than the removal of batching causes slow down.

Would it be possible to implement the following logic for each batch:

  • if all objects files are up-to-date or in the cache, fetch from cache where necessary and move on
  • otherwise, rebuild all object files that are not present or out-of-date, ignoring anything in the cache

While it is sub-optimal, I suspect that it would give a lot of the performance benefit of each mechanism while simplifying the work that needs to be done...

P.S. For reference, I used the non-MSVC_BATCH build to --cache-populate and then performed a clean build with the cache enabled (so everything was in the cache) and it took 3m18.218s. Note that all the builds are parallel builds on a 8 (logical) core i7-6700.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants