Skip to content

Interpreter and Environment Discovery

Karthik Nadig edited this page Feb 2, 2021 · 2 revisions

One of the things that the extension does on start up is search for installed interpreters and environments in known global and workspace locations. This is later used to enable behaviors such as auto-selection, environment lists, environment activation based on type, etc.

Design (previous)

This was the original way the extension looked for interpreters and environments. There were several implementations of IInterpreterLocatorService that were focussed on a particular interpreter or environment type. For example, we have a class that specifically looks at the environments.txt file created by conda and reports the environments discovered through that file.

Each of the implementation was exposed to the rest of the extension via service container (using inversify). In addition to the IInterpreterLocatorService interface, some classes also provide convenient wrappers for domain specific features. Such as identifying if a given interpreter belongs to a environment of a particular type.

There are a couple of issues with this design:

  1. Each implementation of IInterpreterLocatorService requires that the python interpreter is eventually run to extract the required information. This often slows down extension load depending on the number of environments available on a given machine.
  2. Since each implementation also exposes additional functionality, there are other classes that offer unrelated features that have taken a dependency these classes. This also means that when we are testing we sometimes have to mock large number of unrelated classes to test a simple feature.
  3. A large number of classes are implemented and used as singletons. This has lead to lack of well defined API separating the modules that do discovery and modules that consume the result of discovery.

Design (current)

This version of the interpreters and environment discovery module attempts to provide a scoped API to get the interpreters for use with the rest of the extension. This design exposes APIs via the IComponentAdapter interface. IComponentAdapter was added to allow integrating with the rest of the extension which uses dependency injection to acquire dependencies. This component depends only on platform APIs like file system, processes, OS specific features, and settings passed in when the component is created. Implementation of IComponentAdapter can depend on vscode APIs if needed. IComponentAdapter acts as a bridge between the APIs exposed by the component and the rest of the extension.

getInterpreters call flow with and without experiment flag set

One issue with the old locator code was the the internal of the environment specific locators were exposed to the rest of the extension. This made testing difficult due to extension taking dependencies on concrete implementation rather than abstractions. With the new component all code flows through the component adapter that exposes a well defined API.

The discovery component is activated as a part of the component activation. Once all the classes are loaded following APIs are available to use:

API Description
getInterpreters Returns interpreter found, this API may return interpreters from cache.
getInterpreterDetails Returns environment info for a given interpreter.
onDidCreate Register a callback to be called when a workspace virtual environment is created.
onRefreshing Discovery component is still looking for environments.
onRefreshed Discovery component has finished looking for environments.
getInterpreterInformation Temporary. Returns partial environment information as available at that point in time.
isMacDefaultPythonPath Temporary. Returns true if the interpreter path is identified as the default Mac python path.
isCondaEnvironment Temporary. Returns true if the interpreter path is identified as the part of a conda environment.
getCondaEnvironment Temporary. Returns name and path to the conda environment given path to interpreter that belongs to a conda environment.
isWindowsStoreInterpreter Temporary. Returns true if the interpreter path is identified as python installed via Windows Store.
hasInterpreters Temporary. Returns true if an interpreter has been found.
getWorkspaceVirtualEnvInterpreters Temporary. Returns environments that belong to a workspace.
getWinRegInterpreters Temporary. Returns environments that were discovered using windows Registry.

APIs marked temporary will be either removed or we already have equivalents in the new component making them obsolete.

Caching, Reducing, and Resolving environments

On activation the component adapter loads the known environments from cache and performs a background refresh to find any new environments. If any new environment is discovered, this will trigger a cache update once we have enough details about the environment. See CachingLocator for implementation.

Environment data flow through locators

Once a environment is found, we reduce the number of environments detected to a distinct set of environments. This reduction is done to prevent cases where same interpreter binary might have symlinks that exist in the same folder with a slightly different name. The comparison rule is if the versions and the parent directory of the python binary match, then it is likely the same python environment.

The next step is the reduction phase where the environment is checked for missing information. This is the resolver step, where we find additional information, by running a python script in that environment. The number of simultaneous python processes we execute here is throttled to prevent over use of system resources in cases where there are large number of environments.

After these steps anything that remains is used to overwrite the cache. So the next call to getInterpreters will pull the latest information from the cache.

Locators by environment type

There are two groups of locators, global and workspace locators. Global locators look for python installed in the global locations such as the ~/.venv, ~/.pyenv, Windows Registry, etc. Workspace locators look for python that is available in the workspace. Workspace locators are similar to global locators and have some retraction on running python. Lastly, there are also file system watchers that are initialized on some global folders and workspace locations. The watchers are there to find any environment that is created after the extension has started up.

Each locator has does the following things:

  1. Finds the environment or interpreter that the locator is responsible for.
  2. Extracts any information from files or metadata. Information such as, version, distribution name, environment name, etc.
  3. Fires an event indicating that it found an environment, if it is a FS watching locator.

The environments are exposed via iterEnvs method implemented by each locator. A locator must also implement resolveEnv method, which given an absolute path to an interpreter will identify the environment and provide additional data, if missing, for environments that belong to the type handled by that locator.

A locator also implements onChanged event. This is used by locators which also watch the file system for new environment creation. When this event is fired, the handler is expected to call iterEnvs to get the updated set of environments.

Windows specific python installations

Windows has two unique sources where we can find python installations.

  1. Windows Registry (see PEP 514 for the registry layout). We have a locator that specifically looks the windows registry locations for both 32 bit and 64 bit registry for install pythons. The locators also extract versions and other metadata from the registry keys.
  2. Windows Store installations. These are global pythons environments installed via windows store. There are some subtleties here, about which paths are valid to use and which are not. See the comments on isWindowsStoreEnvironment and getWindowsStorePythonExes for more details. This location is watched, so new installations should found by the extension even after initial discovery.

PATH environment variable

Another way we detect installed pythons is via PATH environment variable. This is done for all platforms. However, we do include additional paths for non-windows based OS (see commonPosixBinPaths for more details). These locations are not watched to reduce the performance load due to watching large number of binaries.

venv, pipenv, pyenv, and conda based environments

We have specific locators of the each of these environment types. The individual implementation should have details on how each environment is found. In most cases we look at the known location and known environment variables to find the environments.

Environment Locator Implementation
venv, virtualenv, virtualenvwrapper, pipEnv GlobalVirtualEnvironmentLocator
pyenv PyenvLocator
conda CondaEnvironmentLocator

Workspace virtual environments

Workspace virtual environments are found in two ways. One byt searching through the directories, the other by watching the workspace folders for creation of virtual environments. Implementation details can be see here WorkspaceVirtualEnvironmentLocator.

Priority order for environment types

Each environment is treated as a particular environment type based on priority. This is to ensure that when we need to activate the environment we treat it the right way. For example, you could use pyenv to create and conda environment. See getPrioritizedEnvKinds for more info.

Implementing new Locators

These are the things you should consider before implementing a new locator:

  1. Do you need file system watching for your locator?
  2. Do you need a new environment type? or does it fit within the known environment kinds (see PythonEnvKind)

If you need file watching then extend the FSWatchingLocator class (see WindowsStoreLocator for example). It provides convenience methods to setup file system watching for a particular location. If you don't need it then extend the Locator class (see WindowsRegistryLocator for example).

Once you have your implementation, add it to createNonWorkspaceLocators. If you had to add a new environment type, be sure to update the following function getPrioritizedEnvKinds.

Be sure to add tests for your implementation. You can look at the tests for the existing locators for ideas on how to add tests.

Clone this wiki locally