Package and Environment Management (Conda)
Overview
Teaching: 60 min
Exercises: 30 minQuestions
How to create my own Environment and install packages with Conda
Objectives
Learn about the different components in Conda
Conda is an open source package management system and environment management system. Conda quickly installs, runs and updates packages and their dependences Conda easily creates, saves, loads and switches between environments on your local computer. Conda as a package manager helps you find and install packages. Conda knows the recipes of compilation for a number of packages. Originally created for Python it is able to manage packages, dependencies for any language like Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++ and FORTRAN.
There are two major versions on Conda: Anaconda and Miniconda. The difference among Anaconda and Miniconda is that Miniconda only comes the package management system. So when you install it, there is just the management system and not coming with a bundle of pre-installed packages like Anaconda does. Once Conda is installed, you can then install whatever package you need from scratch
So for your Desktop environment Anaconda is probably a better option. For and HPC cluster, inside a container, or for a Continuous Integration (CI) system, Miniconda is a better solution, lighter and easier to customize.
Packages, Channels and Environments
Those 3 concepts are critical to understand how conda works from the user perspective. A package is a compressed tarball file (.tar.bz2) that contains the binaries, libraries and metadata to allow Conda to manage the installation and follow the dependencies. Conda packages are downloaded from channels, which are URLs to directories containing conda packages. The conda command searches a default set of channels, and packages are automatically downloaded from the corresponding channel. Bioconda for example, is a channel specialized in software packages for bioinformatics. Finally, a conda environment is a directory that contains a collection of conda packages that you have installed. You can activate or deactivate environments, and switch packages or versions of them. Environments are created with recipes that you can also share with others. We can have environments maintained centrally and you can create your own.
With these concepts we can know learn how to create new environments, adding channels and install packages using them.
Activating Conda
The command to activate conda on Spruce is:
source /shared/software/miniconda3/etc/profile.d/conda.sh
If you forget this path in the future, all that you have to do is execute:
module load conda
There is no change to environment variables with this load, it will just show you the command that you have to source.
Creating Environments
To create a new environment use:
$ conda create --name myenv
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /users/username/.conda/envs/myenv
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate myenv
#
# To deactivate an active environment, use
#
# $ conda deactivate
What we just create was a new environment with no packages in it. You will add packages a bit later. For the moment notice that you can activate and deactivate with the conda itself (This is a new feature, from version 4.6 and beyond).
$ conda activate myenv
(myenv) $
Notice that the prompt is modified, adding the name of the environment to as a prefix. This is useful to know which environment is activated at a given moment in the shell.
To know the environments accesible by conda execute:
$ conda info --envs
# conda environments:
#
base /shared/software/miniconda3
qiime2-2018.8 /shared/software/miniconda3/envs/qiime2-2018.8
tpd0001 /shared/software/miniconda3/envs/tpd0001
myenv * /users/username/.conda/envs/myenv
The list includes not only the environment you just create but also environments that are centrally managed and created for general usage.
To leave the current environment execute:
(myenv) $ conda deactivate
$
When the environment is deactivated, its name is no longer shown in your prompt.
The command conda activate
send you to the lowest environment called base
.
Installing Packages
Inside your own environment, you can install and uninstall packages for any of your environments. But first, lets search for a package on the active channels on the cluster:
(myenv) $ conda hdf5
Loading channels: done
# Name Version Build Channel
hdf5 1.8.16 3 conda-forge
hdf5 1.8.16 4 conda-forge
hdf5 1.8.17 0 conda-forge
hdf5 1.8.17 10 conda-forge
hdf5 1.8.17 11 conda-forge
hdf5 1.8.17 2 conda-forge
hdf5 1.8.17 3 conda-forge
hdf5 1.8.17 4 conda-forge
hdf5 1.8.17 5 conda-forge
hdf5 1.8.17 6 conda-forge
hdf5 1.8.17 7 conda-forge
hdf5 1.8.17 8 conda-forge
hdf5 1.8.17 9 conda-forge
hdf5 1.8.18 0 conda-forge
hdf5 1.8.18 1 conda-forge
hdf5 1.8.18 2 conda-forge
hdf5 1.8.18 3 conda-forge
hdf5 1.8.18 h525d4c3_0 pkgs/main
hdf5 1.8.18 h6792536_1 pkgs/main
hdf5 1.8.19 0 conda-forge
hdf5 1.8.19 1 conda-forge
hdf5 1.8.19 2 conda-forge
hdf5 1.8.20 0 conda-forge
hdf5 1.8.20 1 conda-forge
hdf5 1.8.20 hba1933b_1 pkgs/main
hdf5 1.10.1 0 conda-forge
hdf5 1.10.1 1 conda-forge
hdf5 1.10.1 2 conda-forge
hdf5 1.10.1 h9caa474_1 pkgs/main
hdf5 1.10.1 hb0523eb_0 pkgs/main
hdf5 1.10.2 0 conda-forge
hdf5 1.10.2 hba1933b_1 pkgs/main
hdf5 1.10.2 hc401514_1 conda-forge
hdf5 1.10.2 hc401514_2 conda-forge
hdf5 1.10.2 hc401514_3 conda-forge
hdf5 1.10.3 hba1933b_1001 conda-forge
hdf5 1.10.3 hc401514_0 conda-forge
hdf5 1.10.3 hc401514_1 conda-forge
hdf5 1.10.3 hc401514_2 conda-forge
hdf5 1.10.4 hb1b8bf9_0 pkgs/main
hdf5 1.10.4 mpi_mpich_ha7d0aea_1006 conda-forge
hdf5 1.10.4 mpi_openmpi_hac320be_1006 conda-forge
hdf5 1.10.4 nompi_h3c11f04_1106 conda-forge
hdf5 1.10.5 mpi_mpich_ha7d0aea_1000 conda-forge
hdf5 1.10.5 mpi_openmpi_hac320be_1000 conda-forge
hdf5 1.10.5 nompi_h3c11f04_1100 conda-forge
As you see above, there are quite a number of versions, and builds for the same package. You can install the latest version with:
(myenv) $ conda install hdf5
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /users/username/.conda/envs/myenv
added / updated specs:
- hdf5
The following packages will be downloaded:
package | build
---------------------------|-----------------
_libgcc_mutex-0.1 | main 3 KB
hdf5-1.10.5 |nompi_h3c11f04_1100 5.2 MB conda-forge
libgcc-ng-9.1.0 | hdf63c60_0 5.1 MB
libgfortran-ng-7.3.0 | hdf63c60_0 1006 KB
libstdcxx-ng-9.1.0 | hdf63c60_0 3.1 MB
------------------------------------------------------------
Total: 14.4 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
hdf5 conda-forge/linux-64::hdf5-1.10.5-nompi_h3c11f04_1100
libgcc-ng pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
libgfortran-ng pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
zlib conda-forge/linux-64::zlib-1.2.11-h14c3975_1004
Proceed ([y]/n)? y
Downloading and Extracting Packages
libgcc-ng-9.1.0 | 5.1 MB | ###################################################################### | 100%
hdf5-1.10.5 | 5.2 MB | ###################################################################### | 100%
libgfortran-ng-7.3.0 | 1006 KB | ###################################################################### | 100%
_libgcc_mutex-0.1 | 3 KB | ###################################################################### | 100%
libstdcxx-ng-9.1.0 | 3.1 MB | ###################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
As you notice, conda took the decision of installing the version conda-forge/linux-64::hdf5-1.10.5-nompi_h3c11f04_1100
and its dependencies.
It could be the case that the actual version that conda installs is not what you want. You can also declare a very specific version and build using the command:
$ conda install -c conda-forge hdf5=1.10.5=mpi_openmpi_hac320be_1000
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /users/gufranco/.conda/envs/myenv
added / updated specs:
- hdf5==1.10.5=mpi_openmpi_hac320be_1000
The following packages will be downloaded:
package | build
---------------------------|-----------------
hdf5-1.10.5 |mpi_openmpi_hac320be_1000 5.9 MB conda-forge
openmpi-3.1.4 | hc99cbb1_0 4.0 MB conda-forge
------------------------------------------------------------
Total: 10.0 MB
The following NEW packages will be INSTALLED:
mpi conda-forge/linux-64::mpi-1.0-openmpi
openmpi conda-forge/linux-64::openmpi-3.1.4-hc99cbb1_0
The following packages will be DOWNGRADED:
hdf5 1.10.5-nompi_h3c11f04_1100 --> 1.10.5-mpi_openmpi_hac320be_1000
Proceed ([y]/n)? y
Downloading and Extracting Packages
hdf5-1.10.5 | 5.9 MB | ###################################################################### | 100%
openmpi-3.1.4 | 4.0 MB | ###################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
You can be very granular, declaring the channel, the package, version and build.
To know the list of packages on the current environment execute:
(myenv) $ conda list
# packages in environment at /users/username/.conda/envs/myenv:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
hdf5 1.10.5 mpi_openmpi_hac320be_1000 conda-forge
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
mpi 1.0 openmpi conda-forge
openmpi 3.1.4 hc99cbb1_0 conda-forge
zlib 1.2.11 h14c3975_1004 conda-forge
You can install several packages of your interest together, but if you install too many it could be the case that at some point you end up with conflicting packages for which conda cannot find a good solution. Or end up downgrading some packages to satisfy the dependencies of others.
Adding channels
The file .condarc
in your $HOME
folder controls the list of channels that can be searched when requesting new packages. The file is in YAML format and the order of channels matters. For example if your .condarc
looks like this:
channels:
- bioconda
- conda-forge
- defaults
Conda will search for packages first in bioconda channel, the highest priority channel, and go down up to the default channel. This is important if you start adding more and more channels to the list as packages could start conflicting with same packages provided by two or more channels.
Removing Packages and Environments
Packages can be removed with:
$ conda remove -n myenv scipy
If you are inside the environment the command can be simplified as:
(myenv) $ conda remove scipy
Remove entire environments with:
$ conda remove -n myenv --all
or equivalent
conda env remove --name myenv
Creating recipes for Environments
Instead of creating environment and populate them with packages one by one. It could be better to have a file that could be used to recreate environments when needed.
If your environment is already created the command to get the file environment.yml
(myenv)$ conda env export > environment.yml
The file environment.yml
can also being created manually. The files exported from the command above are very detailed, with specific indications for version and build. However, you can create simple environment files such as:
name: stats
dependencies:
- numpy
- pandas
And recreate the environment with:
$ conda env create -f stats.yml
Conda filling your quota
If you have multiple conda environments at some point you will reach the point where conda could reach your quota limit for
$HOME
There are several things you can do:
Remove packages at
.conda/pkgs
, those packages are downloaded when you create the environments and they are kept in case you reuse them for new environments. They are safe to remove, and will be downloaded next time you need them.Move the entire
.conda
folder to your scratch. Keep recipes for the creation of your environments on$HOME
but the environments themselves can take significant space, bigger than your 10 GB quota on Spruce or Thorny. You can move.conda
very easily with this commands$ mv $HOME/.conda $SCRATCH $ ln -s $SCRATCH/.conda $HOME/.conda
Exercise
-
Create a couple of environments one with Python 2.7 and another with Python 3.6.
-
To each of them add the following packages: numpy, scipy, ipython, networkxs and pandas.
-
Now try adding the intel channel and install tensorflow and scikit-learn
Key Points
With conda you are able to install a number of packages not available among the centrally managed packages