# Conda Best Practices Conda works best at MSI when used to manage distinct environments for different workflows, rather than installing packages to a single central location. Doing so reduces the possibility of issues from conflicting version requirements, makes it easier to create upgraded workflows without breaking existing ones, and makes it easier to share workflows with other users, including your future self. ## Configuring Conda The file `~/.condarc` can be used to define how conda should behave such as where environments are installed, which channels are used for installations. Full documentation on condarc can be found [here](https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/index.html) Example configuration, this configuration relies on this directory structure. ``` [vega0051@ahl01 /projects/standard/msistaff/shared/vega0051 ]$ tree .conda .conda ├── envs └── pkgs 2 directories, 0 files ``` It can be created with `mkdir -vp $SHARED/$USER/.conda/{envs,pkgs}`, then the file `.condarc` can be written to the user's home directory. ``` # channel locations. These override conda defaults, i.e., conda will # search *only* the channels listed here, in the order given. # Use "defaults" to automatically include all default channels. channels: - bioconda - conda-forge # environment directories will be searched in the order listed for # existing conda environments # by pointing the directory away from the default location of $HOME # users can ensure larger environments will not overwhelm files/storage # limits envs_dirs: - $SHARED/$USER/.conda/envs - $SCRATCH/$USER/.conda/envs # The package directory can be pointed to the project space to # avoid filling the user file/storage limits. pkgs_dirs: - $SHARED/$USER/.conda/pkgs ``` ## Use Community-Managed Channels The company Anaconda, which develops conda and provides several package repositories for it, changed its Terms of Service as of Spring 2024 to no longer accommodate free use of its licensed components by academic researchers. Discussion on how to proceed under these terms was still ongoing as of November 2024. In the meantime, avoid using the package channels `defaults`, `main`, `anaconda`, `msys2`, or `r`. Make sure these package channels do not appear in your `~/.condarc` file, and do not reference them in your `conda install` commands. Finally, use `miniforge` over other modules that provide conda, as it avoids using forbidden channels by default. ## Create Independent Environments The recommended command for creating a conda environment: ```bash module load miniforge mamba create --copy -p /path/to/my/conda/environment pandas ``` Important parts of this command: - Use `mamba create` rather than `mamba install`. The latter attempts to install packages into the currently active environment rather than creating a new one. - Include the `--copy` flag, which makes a local copy of all libraries and dependencies installed by the environment. Without this flag, conda may link to libraries available in the `miniforge` module, and MSI wants environments to be as self-contained as possible. - Use the `-p` or `--prefix` flag to install to a particular location rather than defaulting to `~/.conda`. ```{tip} Where you store your environment matters for quota management: - **`$SHARED/$USER`** — Installed environments count toward your group's shared storage quota, making this a good default location. - **`$SCRATCH/$USER`** — Best for testing or disposable environments so they don't count against permanent storage limits. ``` ### Install All Packages at Once For more complex environments, specify all packages on the original command rather than installing them later: ```bash mamba create --copy -p /path/to/my/conda/environment pandas tensorflow pillow scikit-learn ``` Including all packages in the same command allows the environment solver to do a better job of creating a consistent environment. If you need to install packages as a second step via `pip`, you can usually do so safely if you run the `pip install` step immediately after the environment is created. ### Define Environments with YAML Files For multi-package environments, define them in a YAML file. This captures the environment name, channels, and all dependencies in a single reproducible file that can be shared or version-controlled. Recommended `environment.yml`: ```yaml # The environment name (used when activating with `source activate`) name: my-env # Channels to search, in priority order channels: - conda-forge - bioconda # Dependencies — list conda packages first, then pip if needed dependencies: - python=3.11 - pandas - scikit-learn # Pip-only packages go under a nested pip key - pip: - some-pip-only-package ``` Build with: ```bash module load miniforge mamba env create --file environment.yml --copy --prefix /path/to/my/env ``` To generate a YAML file from an existing environment for reuse, run `conda env export --no-builds > environment.yml` while the environment is activated. ## Don't Modify Existing Environments - Create New Ones Instead The guidance about installing all packages at once also applies when you later need to update or upgrade an environment. Rather than updating the existing environment, MSI recommends creating a new environment with the upgraded contents. Reasons: - If something goes wrong during the installation, the old environment will still be available in its original location. - The status of packages and their relationships on conda servers may change significantly over time and can cause unintended errors and issues with in-place installs of older environments. ## Take Snapshots of Your Important Environments It is good practice to record the contents of your environments after installing them so that you can reproduce them exactly in the future. One way to do this is with: ```bash conda env export --no-builds ``` Run this while your environment is activated. It will print all packages and their versions installed by conda into your environment. If you capture this output into a `.yml` file, you can use it to recreate the exact environment in the future so long as the relevant package versions are still available on the remote server. [This is explored in an MSI software management tutorial](https://pages.github.umn.edu/dunn0404/software-management-tutorial/15-conda-pip/index.html). Alternate strategies for snapshotting an environment to share include bundling the environment into a single file that can be backed up elsewhere or shared with colleagues. For this purpose, MSI recommends either [conda-pack](https://conda.github.io/conda-pack/) or bundling your environment into an apptainer. See [the introductory apptainer tutorial](https://pages.github.umn.edu/dunn0404/intro-singularity-tutorial/). ## Don't Use `conda activate` The default command for activating a conda environment does not work cleanly in an HPC environment. If you need a direct analogue, MSI recommends using `source activate`, for example: ```bash source activate /path/to/my/conda/environment/ ``` This generally behaves better than the default command, but will be deprecated in the not-too-distant future. A more general approach is to modify your `PATH` variable to include the environment's `bin` directory, preferably using a modulefile. [This is also explored in the MSI software management tutorial](https://pages.github.umn.edu/dunn0404/software-management-tutorial/15-conda-pip/index.html). ## Addressing Version Conflicts Sometimes you will run into issues while building or using an environment where the package versions installed by the solver are not compatible. Build-time errors may look like this: ``` Solving environment: failed LibMambaUnsatisfiableError: Encountered problems while solving: - package scipy-1.15.2-py310h1d65ade_0 requires python_abi 3.10.* *_cp310, but none of the providers can be installed ``` Here, a conflict between the package `scipy` and available versions of `python_abi` is being reported. Version conflicts of this type most commonly occur for environments with a large number of dependencies that are under active development. Runtime errors will be subtler and varied, and will likely not directly report an issue with package versions. But you might consider the possibility of a package version mismatch if you are running an example from the package developer on a freshly installed environment that crashes with an error traceback. Recommended solutions: ### Unpin Package Versions If you are pinning package versions in your `mamba create` command, try unpinning them. This gives the solver more flexibility. Change: ```bash mamba create --copy -p /path/to/my/env python=2 numpy=1 scipy=1.15 ``` Into: ```bash mamba create --copy -p /path/to/my/env python numpy scipy ``` ### Pin an Older Version of Key Packages If you are not pinning versions and the issue is still occurring, try pinning one or more packages to the second-most-recent version. You can browse releases for most packages on anaconda.org, for example [available scipy versions on conda-forge](https://anaconda.org/conda-forge/scipy/files). For example: ```bash mamba create --copy -p /path/to/my/env python numpy scipy=1.15.1 ``` ### Pin All Packages to Known Working Versions If you have an environment snapshot or otherwise know the package versions for a working environment, try pinning the versions of all packages you are installing to those known-working versions. If you do not have a reference to work from, you can try to manually determine the correct combination of package versions via trial and error, though this is not practical for larger environments. ### Wait Until the Issue Is Fixed Wait a few days and try again. Often these issues arise when a package has many dependencies and one of them is updated before the others are ready. You might also consider reporting the issue to the developers of the package you are having issues with, or to the developers maintaining the package repository you are using. For example, for conda-forge you can [report issues on GitHub](https://github.com/conda-forge/conda-forge.github.io/issues).