Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future for package developers and temporary overriding of plans #450

Open
lorenzwalthert opened this issue Nov 28, 2020 · 2 comments
Open

Comments

@lorenzwalthert
Copy link

lorenzwalthert commented Nov 28, 2020

It's an almost 3 year long standing feature request in r-lib/stlyler to offer parallelisation and I have been very conservative on this and rejected implementations in r-lib/styler#682 and r-lib/styler#617, mainly because of your recommendations for package developers in ?future::plan(). I also read up on #263 and #274. I agree that there should be clear rules on how to use future strategies to avoid lack of control of the end user and a 'parallel chaos' with an increasing adoption of the future framework. I think one thing to consider in this discussion is that requirements for different tasks are widely different. A package for processing genomic sequences has a different user base and different scaling requirements than a tool like styler. For styler, 99% just want to have results quicker, they probably don't even know anything about future, remote workers or similar. And they should not have to. So I appreciate your note in ?future::plan():

If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. This can be done using: [...].

  oplan <- plan(new_set_of_strategies)
  on.exit(plan(oplan), add = TRUE)
  [...]

The problem with this solution is that there is no way for users (package developer or end user) of code that contains the above workaround to override the strategy anymore. Which is exactly what we wanted to prevent in the first place. For this reason, I suggest to extend the recommendation and give the user a way to override this with an env variable. Of course every package developer could think of their own way to override, but wouldn't it be great to have a spec for this? Along these lines:

If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. In any case, enduser or package developer of your code should be able to override your decision, as they might use your code in ways you could not think of. To allow this, the recommended convention is to use an environment variable R_FUTURE_OVERRIDE_[pkgname] where [pkgname] should be substituted with the name of your package in which you place this code. If your package is called pkg1:

if (nzchar(Sys.getenv('R_FUTURE_OVERRIDE_pkgname'))) {
oplan <- plan(new_set_of_strategies)
  on.exit(plan(oplan), add = TRUE)
  [...]
}

Maybe package authors should document also that they are temporarily setting a future plan in ?pkg1::future_strategy or similar. Not sure I have enough context here for naming conventions and similar, but I guess you get the main idea. What do you think?

@HenrikBengtsson
Copy link
Owner

Thanks - this is useful and I encourage everyone to join in a think hard about this problem. I should clarify that I've got this, and other limitations on my mind carrying them with me when I work on the future framework, i.e. the current constraints are not set in stone and I hope we'll get to expand on what can be done at some point. However, these things just take a very long time to mature and require feedback from lots of developers, end-users, and their use-cases. Even if it looks like there's an obvious quick solution in front of you, it might not be the best one because it might bring us into a dead-end (e.g. the progressr updates is a good example of that)

Some quick comments:

I think one thing to consider in this discussion is that requirements for different tasks are widely different. ... The problem with this solution is that there is no way for anyone (package developer or end-user) of code that contains the above workaround to override the strategy anymore. ...

I agree. Basically, the current plan() model can only be pushed so far. I'm still extremely hesitant to allow developers to choose the exact parallel backend for the end-user because the choices provide will then reflect what the developer care for or can test on, e.g. in the past Windows users suffered from Linux/macOS developers who were happy enough with mclapply().

I think that a framework for specifying "resources" that a future requires could be a way forward here (#172). I can also imagine optional or preferred resources, e.g. a future my declare that it prefers to run a file system where some big genomic reference file is already downloaded/cached but it could still do without (e.g. it can download it again if it runs elsewhere).

With "resources", we could imagine the user being able to declare multiple, alternative future backends that code can pick from. If the code requires "localhost", then for instance remote backends are immediately ruled out, leaving say "sequential", "multicore", "multisession", "callr", and "batchtools_local" as the options.

Maybe package authors should document also that they are temporarily setting a future plan in ?pkg1::future_strategy or similar. Not sure I have enough context here for naming conventions and similar, but I guess you get the main idea. What do you think?

I haven't had time to digest this so cannot say much right now, but I agree that whatever the solution should be it needs to be standardized so that end-user can feel familiar with it.

Back to your {styler} per se. Is the above really a problem with this type of tools-for-tools package? Isn't it mainly used occasionally by developers during development, i.e. it's unlikely that it'll clash with a user running genomic research on an HPC scheduler? Or, are there use-cases where {styler} is in constant use in the background, e.g. R GUIs? (FWIW, this far I've only used it for cleaning up legacy code)

@lorenzwalthert
Copy link
Author

Thanks for your detailed answer.

If the code requires "localhost", then for instance remote backends are immediately ruled out, leaving say "sequential", "multicore", "multisession", "callr", and "batchtools_local" as the options.

Yeah, this was also an issue with styler, as the files from the local file system need to be available (r-lib/styler#277 (comment)).

The idea about preferred backends also sounds good. Probably redundant to say here but I guess it also makes sense how these challenges are addressed in other programming languages and what their communities have learned from thinking about the problem and implementing solutions.

Back to your {styler} per se. Is the above really a problem with this type of tools-for-tools package?

{styler} probably won't have the tools-for-tools problem (yet). It is suggested by 13 and imported by 7 CRAN packages, although I haven't looked into how styler is used in these. Maybe I should - and check which of these packages also have {future} in the recursive dependencies and how it is used. I would say most common use case is the RStudio Addin for interactive styling of the active file (where parallelisation does not make sense because of startup costs), periodical styling with style_file() and style_pkg() in the development workflow as well as a pre-commit hook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants