Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fail if rds files are corrupted #471

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

richfitz
Copy link

This is a somewhat unsatisfactory PR as I can't replicate this reliably.

As seen in #451 and #315 if the R process is killed when writing the Rds file it will create a corrupted rds file that will break the covr::package_coverage(). I've seen this rarely on gha builds (e.g. https://github.com/mrc-ide/rrq/pull/40/checks?check_run_id=2363889569) when using callr::r_bg fairly extensively, but have been unable to trigger in a MWE.

The downside is that the coverage report may be somewhat partial (and a message could be added during reading to indicate that) but continuing despite failure might be more graceful for relatively little additional complexity.

@CLAassistant
Copy link

CLAassistant commented Apr 16, 2021

CLA assistant check
All committers have signed the CLA.

@jimhester
Copy link
Member

I am pretty sure this is due to processx using SIGKILL in a number of places and the finalizer that is writing the coverage is interrupted.

@gaborcsardi would know better than I if there is a way to have the improved behavior for this use case, the current behavior is unfortunate.

@gaborcsardi
Copy link
Member

Sometimes you have to use sigkill, because R in the subprocess is not interruptible.

@jimhester
Copy link
Member

Sure, but if the process is writing the coverage out in an exit finalizer the process is already in the process of exiting, so there must be some case where the SIGKILL is too eager.

@gaborcsardi
Copy link
Member

@richfitz You can add some $signal() calls to your event loop layer if you want a graceful shutdown. processx is a bit too low level to implement this, without a standard event loop.

Or, if you don't mind sequential shutdown, then you can probably add a finalizer to your processes, that sends a SIGTERM first and then after a timeout a SIGKILL.

On Windows there is no SIGTERM, but you can $interrupt() first.

@richfitz richfitz marked this pull request as ready for review April 19, 2021 07:48
@richfitz
Copy link
Author

@richfitz You can add some $signal() calls to your event loop layer if you want a graceful shutdown. processx is a bit too low level to implement this, without a standard event loop.

I don't think that this is the high level parts, unfortunately - I think that this is probably where a task has been spawned and is later cleaned up? I can't easily replicate unfortunately, and after replacing all the $kill calls I had (which I see uses SIGKILL after SIGTERM + grace period) I am still seeing intermittent failures - perhaps 1/3 to 1/4 of the time, so enough to be hard to work with but not enough to know if random fixes are helping!

I've not seen anything running covr::package_coverage() locally so I expect this is an issue with the speed at which the process is closed in the context of fairly slow GHA runners?

I do think that the current PR as it stands may be of use, but as always up to you for if you agree :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants