Skip to content

Divide and conquer with workaround for batch submission #717

Answered by HenrikBengtsson
scottkosty asked this question in Q&A
Discussion options

You must be logged in to vote

Using:

set.seed(0xBEEF)
y <- future_lapply(X, FUN = my_fcn, future.seed = TRUE)

should be 100% reproducible, i.e. no need to orchestrate the initial random seeds (.Random.seed) yourself.

If you're concerned about some tasks failing and not wanting to have rerun everything from scratch, you can use memoization for my_fcn(). The gist:

my_fcn <- function(x) {
  file <- x_to_rds(x)

  ## Already processed?
  if (already_exists(file)) return(file)

  ## Otherwise, run the analysis
  file <- full_run(x)

  file
}

Yes, this would be a bit wasteful on the job scheduler, because you're requesting jobs for steps that will be skipped. Right now, we don't have a mechanism to avoid this. If we could run

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@scottkosty
Comment options

Answer selected by scottkosty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants