Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda discovery heuristics and environmental variables #1490

Open
cboettig opened this issue Oct 6, 2023 · 1 comment
Open

Conda discovery heuristics and environmental variables #1490

cboettig opened this issue Oct 6, 2023 · 1 comment

Comments

@cboettig
Copy link

cboettig commented Oct 6, 2023

Discovery of CONDA environments often fails -- specifically in containerized linux environments. I think this could be easily fixed by:

A. extending the list of directories searched by current heuristics
B. respecting the standard environmental variables used by Conda / Ananconda when set (e.g. CONDA_PREFIX and CONDA_PYTHON_EXE, e.g. see configuration docs on anoconda.com.

From what I can see, discovery of where CONDA lives is based on a variety of heuristics

reticulate/R/conda.R

Lines 730 to 735 in afb0a22

prefixes <- c("~/opt/", "~/", "/opt/", "/")
names <- c("anaconda", "miniconda", "miniforge")
versions <- c("", "2", "3", "4")
combos <- expand.grid(versions, names, prefixes, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
combos <- combos[rev(seq_along(combos))]
conda_locations <- unlist(.mapply(paste0, combos, NULL))

reticulate/R/config.R

Lines 527 to 547 in afb0a22

env_dirs <- c("~/anaconda/envs",
"~/anaconda2/envs",
"~/anaconda3/envs",
"~/anaconda4/envs",
"~/miniconda/envs",
"~/miniconda2/envs",
"~/miniconda3/envs",
"~/miniconda4/envs",
"/anaconda/envs",
"/anaconda2/envs",
"/anaconda3/envs",
"/anaconda4/envs",
"/miniconda/envs",
"/miniconda2/envs",
"/miniconda3/envs",
"/miniconda4/envs",
"~/opt/anaconda/envs",
"~/opt/anaconda2/envs",
"~/opt/anaconda3/envs",
"~/opt/anaconda4/envs",
"~")

This second list includes paths in ~/opt but not /opt. Given the first list, I think this is a typo? (Also unclear why these heuristics are set by different code bases at all). Specifically, many containerized deployments used by universities etc will want to map ~/ to persistent user-config, and need a place outside of ~/ for default installs. (/opt being a standard linux convention for this, though you also see /srv/ and even /var being used here). I think the situation would be much better if /opt paths are included in the second example?

I'd be happy to prepare a PR that adds /opt to the second list, though you may prefer to just patch that directly? It would be nice additionally to honor the CONDA_* env vars when set before falling back on the heuristic list of paths to search. Lastly, these issues (e.g. when does reticulate look for Conda and where, and when does it fall back to looking for something else? does it try virtualenv first or conda first if no env is forced by one of the current env vars? do not seem to be answered by the current (otherwise very nice!) docs on discovery order in https://rstudio.github.io/reticulate/articles/versions.html.

Thanks for considering this. Reticulate is amazing, we just want users of containerized RStudio setups (rocker, 2i2c/jupyerlab rstudio proxy) to get the best experience out of the box too.

@t-kalinowski
Copy link
Member

Thanks for the detailed issue. Some notes:

  • The list of locations in R/config.R under python_conda_versions() appears to be dead code, so far as I can tell that function is not used anywhere and not exported.

  • All code paths where reticulate interacts with conda, and conda is not explicitly provided, invoke the first function you link find_conda(). Therein are already two escape hatches provided for globally pointing reticulate: the env var RETICULATE_CONDA, and the R option reticulate.conda_binary (and /opt is already a default search location).

  • If CONDA_PREFIX and/or CONDA_PYTHON_EXE are set, then that means a conda environment had already been activated in the shell where R was started from. This is not our recommended usage, unless R is also being provided by the same conda environment. (In your experience, is that the case? Is the R binary also coming from the same conda environment?). If we were to use these variables in reticulate, it would probably be to detect if conda has already been activated (externally), and use that to override other preference in the "Order of Discovery" used in py_discover_config(). (and emit a warning if appropriate about binary incompatibilities if R is not from the same condaenv).

Lastly, these issues (e.g. when does reticulate look for Conda and where, and when does it fall back to looking for something else? does it try virtualenv first or conda first if no env is forced by one of the current env vars? do not seem to be answered by the current (otherwise very nice!) docs on discovery order in rstudio.github.io/reticulate/articles/versions.html.

Thanks, this should updated in the docs. This logic is encoded in reticulate:::py_resolve(), specifically here:

reticulate/R/install.R

Lines 182 to 188 in df1ac35

envpath <- virtualenv_path(envname)
if (file.exists(envpath))
return(envpath)
envpath <- condaenv_path(envname)
if (file.exists(envpath))
return(envpath)
First we check for a virtualenv of that name, and if that doesn't exist, we check for a condaenv of that name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants