Future for package developers and temporary overriding of plans #450

lorenzwalthert · 2020-11-28T13:01:12Z

It's an almost 3 year long standing feature request in r-lib/stlyler to offer parallelisation and I have been very conservative on this and rejected implementations in r-lib/styler#682 and r-lib/styler#617, mainly because of your recommendations for package developers in ?future::plan(). I also read up on #263 and #274. I agree that there should be clear rules on how to use future strategies to avoid lack of control of the end user and a 'parallel chaos' with an increasing adoption of the future framework. I think one thing to consider in this discussion is that requirements for different tasks are widely different. A package for processing genomic sequences has a different user base and different scaling requirements than a tool like styler. For styler, 99% just want to have results quicker, they probably don't even know anything about future, remote workers or similar. And they should not have to. So I appreciate your note in ?future::plan():

If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. This can be done using: [...].

  oplan <- plan(new_set_of_strategies)
  on.exit(plan(oplan), add = TRUE)
  [...]

The problem with this solution is that there is no way for users (package developer or end user) of code that contains the above workaround to override the strategy anymore. Which is exactly what we wanted to prevent in the first place. For this reason, I suggest to extend the recommendation and give the user a way to override this with an env variable. Of course every package developer could think of their own way to override, but wouldn't it be great to have a spec for this? Along these lines:

If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. In any case, enduser or package developer of your code should be able to override your decision, as they might use your code in ways you could not think of. To allow this, the recommended convention is to use an environment variable R_FUTURE_OVERRIDE_[pkgname] where [pkgname] should be substituted with the name of your package in which you place this code. If your package is called pkg1:

if (nzchar(Sys.getenv('R_FUTURE_OVERRIDE_pkgname'))) {
oplan <- plan(new_set_of_strategies)
  on.exit(plan(oplan), add = TRUE)
  [...]
}

Maybe package authors should document also that they are temporarily setting a future plan in ?pkg1::future_strategy or similar. Not sure I have enough context here for naming conventions and similar, but I guess you get the main idea. What do you think?

The text was updated successfully, but these errors were encountered:

HenrikBengtsson · 2020-11-28T19:38:38Z

Thanks - this is useful and I encourage everyone to join in a think hard about this problem. I should clarify that I've got this, and other limitations on my mind carrying them with me when I work on the future framework, i.e. the current constraints are not set in stone and I hope we'll get to expand on what can be done at some point. However, these things just take a very long time to mature and require feedback from lots of developers, end-users, and their use-cases. Even if it looks like there's an obvious quick solution in front of you, it might not be the best one because it might bring us into a dead-end (e.g. the progressr updates is a good example of that)

Some quick comments:

I think one thing to consider in this discussion is that requirements for different tasks are widely different. ... The problem with this solution is that there is no way for anyone (package developer or end-user) of code that contains the above workaround to override the strategy anymore. ...

I agree. Basically, the current plan() model can only be pushed so far. I'm still extremely hesitant to allow developers to choose the exact parallel backend for the end-user because the choices provide will then reflect what the developer care for or can test on, e.g. in the past Windows users suffered from Linux/macOS developers who were happy enough with mclapply().

I think that a framework for specifying "resources" that a future requires could be a way forward here (#172). I can also imagine optional or preferred resources, e.g. a future my declare that it prefers to run a file system where some big genomic reference file is already downloaded/cached but it could still do without (e.g. it can download it again if it runs elsewhere).

With "resources", we could imagine the user being able to declare multiple, alternative future backends that code can pick from. If the code requires "localhost", then for instance remote backends are immediately ruled out, leaving say "sequential", "multicore", "multisession", "callr", and "batchtools_local" as the options.

Maybe package authors should document also that they are temporarily setting a future plan in ?pkg1::future_strategy or similar. Not sure I have enough context here for naming conventions and similar, but I guess you get the main idea. What do you think?

I haven't had time to digest this so cannot say much right now, but I agree that whatever the solution should be it needs to be standardized so that end-user can feel familiar with it.

Back to your {styler} per se. Is the above really a problem with this type of tools-for-tools package? Isn't it mainly used occasionally by developers during development, i.e. it's unlikely that it'll clash with a user running genomic research on an HPC scheduler? Or, are there use-cases where {styler} is in constant use in the background, e.g. R GUIs? (FWIW, this far I've only used it for cleaning up legacy code)

lorenzwalthert · 2020-11-29T08:17:50Z

Thanks for your detailed answer.

If the code requires "localhost", then for instance remote backends are immediately ruled out, leaving say "sequential", "multicore", "multisession", "callr", and "batchtools_local" as the options.

Yeah, this was also an issue with styler, as the files from the local file system need to be available (r-lib/styler#277 (comment)).

The idea about preferred backends also sounds good. Probably redundant to say here but I guess it also makes sense how these challenges are addressed in other programming languages and what their communities have learned from thinking about the problem and implementing solutions.

Back to your {styler} per se. Is the above really a problem with this type of tools-for-tools package?

{styler} probably won't have the tools-for-tools problem (yet). It is suggested by 13 and imported by 7 CRAN packages, although I haven't looked into how styler is used in these. Maybe I should - and check which of these packages also have {future} in the recursive dependencies and how it is used. I would say most common use case is the RStudio Addin for interactive styling of the active file (where parallelisation does not make sense because of startup costs), periodical styling with style_file() and style_pkg() in the development workflow as well as a pre-commit hook.

lorenzwalthert mentioned this issue Dec 5, 2020

Parallel styling r-lib/styler#682

Draft

HenrikBengtsson added the feature/resources label Dec 26, 2020

HenrikBengtsson added this to the Future release (not next) milestone Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future for package developers and temporary overriding of plans #450

Future for package developers and temporary overriding of plans #450

lorenzwalthert commented Nov 28, 2020 •

edited

HenrikBengtsson commented Nov 28, 2020

lorenzwalthert commented Nov 29, 2020

Future for package developers and temporary overriding of plans #450

Future for package developers and temporary overriding of plans #450

Comments

lorenzwalthert commented Nov 28, 2020 • edited

HenrikBengtsson commented Nov 28, 2020

lorenzwalthert commented Nov 29, 2020

lorenzwalthert commented Nov 28, 2020 •

edited