Conda Best Practices

Conda works best at MSI when used to manage distinct environments for different workflows, rather than installing packages to a single central location. Doing so reduces the possibility of issues from conflicting version requirements, makes it easier to create upgraded workflows without breaking existing ones, and makes it easier to share workflows with other users, including your future self.

Configuring Conda

The file ~/.condarc can be used to define how conda should behave such as where environments are installed, which channels are used for installations. Full documentation on condarc can be found here

Example configuration, this configuration relies on this directory structure.

[vega0051@ahl01 /projects/standard/msistaff/shared/vega0051 ]$ tree .conda
.conda
├── envs
└── pkgs

2 directories, 0 files

It can be created with mkdir -vp $SHARED/$USER/.conda/{envs,pkgs}, then the file .condarc can be written to the user’s home directory.

# channel locations. These override conda defaults, i.e., conda will
# search *only* the channels listed here, in the order given.
# Use "defaults" to automatically include all default channels.
channels:
  - bioconda
  - conda-forge
# environment directories will be searched in the order listed for
# existing conda environments
# by pointing the directory away from the default location of $HOME
# users can ensure larger environments will not overwhelm files/storage
# limits
envs_dirs:
  - $SHARED/$USER/.conda/envs
  - $SCRATCH/$USER/.conda/envs
# The package directory can be pointed to the project space to
# avoid filling the user file/storage limits.
pkgs_dirs:
  - $SHARED/$USER/.conda/pkgs

Use Community-Managed Channels

The company Anaconda, which develops conda and provides several package repositories for it, changed its Terms of Service as of Spring 2024 to no longer accommodate free use of its licensed components by academic researchers. Discussion on how to proceed under these terms was still ongoing as of November 2024.

In the meantime, avoid using the package channels defaults, main, anaconda, msys2, or r. Make sure these package channels do not appear in your ~/.condarc file, and do not reference them in your conda install commands.

Finally, use miniforge over other modules that provide conda, as it avoids using forbidden channels by default.

Create Independent Environments

The recommended command for creating a conda environment:

module load miniforge
mamba create --copy -p /path/to/my/conda/environment pandas

Important parts of this command:

  • Use mamba create rather than mamba install. The latter attempts to install packages into the currently active environment rather than creating a new one.

  • Include the --copy flag, which makes a local copy of all libraries and dependencies installed by the environment. Without this flag, conda may link to libraries available in the miniforge module, and MSI wants environments to be as self-contained as possible.

  • Use the -p or --prefix flag to install to a particular location rather than defaulting to ~/.conda.

Tip

Where you store your environment matters for quota management:

  • $SHARED/$USER — Installed environments count toward your group’s shared storage quota, making this a good default location.

  • $SCRATCH/$USER — Best for testing or disposable environments so they don’t count against permanent storage limits.

Install All Packages at Once

For more complex environments, specify all packages on the original command rather than installing them later:

mamba create --copy -p /path/to/my/conda/environment pandas tensorflow pillow scikit-learn

Including all packages in the same command allows the environment solver to do a better job of creating a consistent environment.

If you need to install packages as a second step via pip, you can usually do so safely if you run the pip install step immediately after the environment is created.

Define Environments with YAML Files

For multi-package environments, define them in a YAML file. This captures the environment name, channels, and all dependencies in a single reproducible file that can be shared or version-controlled.

Recommended environment.yml:

# The environment name (used when activating with `source activate`)
name: my-env
# Channels to search, in priority order
channels:
  - conda-forge
  - bioconda
# Dependencies — list conda packages first, then pip if needed
dependencies:
  - python=3.11
  - pandas
  - scikit-learn
  # Pip-only packages go under a nested pip key
  - pip:
    - some-pip-only-package

Build with:

module load miniforge
mamba env create --file environment.yml --copy --prefix /path/to/my/env

To generate a YAML file from an existing environment for reuse, run conda env export --no-builds > environment.yml while the environment is activated.

Don’t Modify Existing Environments - Create New Ones Instead

The guidance about installing all packages at once also applies when you later need to update or upgrade an environment. Rather than updating the existing environment, MSI recommends creating a new environment with the upgraded contents.

Reasons:

  • If something goes wrong during the installation, the old environment will still be available in its original location.

  • The status of packages and their relationships on conda servers may change significantly over time and can cause unintended errors and issues with in-place installs of older environments.

Take Snapshots of Your Important Environments

It is good practice to record the contents of your environments after installing them so that you can reproduce them exactly in the future. One way to do this is with:

conda env export --no-builds

Run this while your environment is activated. It will print all packages and their versions installed by conda into your environment. If you capture this output into a .yml file, you can use it to recreate the exact environment in the future so long as the relevant package versions are still available on the remote server. This is explored in an MSI software management tutorial.

Alternate strategies for snapshotting an environment to share include bundling the environment into a single file that can be backed up elsewhere or shared with colleagues. For this purpose, MSI recommends either conda-pack or bundling your environment into an apptainer. See the introductory apptainer tutorial.

Don’t Use conda activate

The default command for activating a conda environment does not work cleanly in an HPC environment. If you need a direct analogue, MSI recommends using source activate, for example:

source activate /path/to/my/conda/environment/

This generally behaves better than the default command, but will be deprecated in the not-too-distant future.

A more general approach is to modify your PATH variable to include the environment’s bin directory, preferably using a modulefile. This is also explored in the MSI software management tutorial.

Addressing Version Conflicts

Sometimes you will run into issues while building or using an environment where the package versions installed by the solver are not compatible. Build-time errors may look like this:

Solving environment: failed
LibMambaUnsatisfiableError: Encountered problems while solving:
  - package scipy-1.15.2-py310h1d65ade_0 requires python_abi 3.10.* *_cp310, but none of the providers can be installed

Here, a conflict between the package scipy and available versions of python_abi is being reported. Version conflicts of this type most commonly occur for environments with a large number of dependencies that are under active development.

Runtime errors will be subtler and varied, and will likely not directly report an issue with package versions. But you might consider the possibility of a package version mismatch if you are running an example from the package developer on a freshly installed environment that crashes with an error traceback.

Recommended solutions:

Unpin Package Versions

If you are pinning package versions in your mamba create command, try unpinning them. This gives the solver more flexibility.

Change:

mamba create --copy -p /path/to/my/env python=2 numpy=1 scipy=1.15

Into:

mamba create --copy -p /path/to/my/env python numpy scipy

Pin an Older Version of Key Packages

If you are not pinning versions and the issue is still occurring, try pinning one or more packages to the second-most-recent version. You can browse releases for most packages on anaconda.org, for example available scipy versions on conda-forge.

For example:

mamba create --copy -p /path/to/my/env python numpy scipy=1.15.1

Pin All Packages to Known Working Versions

If you have an environment snapshot or otherwise know the package versions for a working environment, try pinning the versions of all packages you are installing to those known-working versions.

If you do not have a reference to work from, you can try to manually determine the correct combination of package versions via trial and error, though this is not practical for larger environments.

Wait Until the Issue Is Fixed

Wait a few days and try again. Often these issues arise when a package has many dependencies and one of them is updated before the others are ready.

You might also consider reporting the issue to the developers of the package you are having issues with, or to the developers maintaining the package repository you are using. For example, for conda-forge you can report issues on GitHub.