Introduction to Conda
“Packaging can be theater, it can create a story.” -- Steve Jobs
Example files for this post have been uploaded to: https://github.com/mday299/keypuncher/tree/main/Python/pandas
According to their website, conda is:
“…an open source package management system and environment management system that runs on Windows, macOS, Linux and z/OS... It was created for Python programs, but it can package and distribute software for any language….”
“In its default configuration, conda can install and manage the thousand packages at repo.anaconda.com that are built, reviewed and maintained by Anaconda®.”
I have never used Anaconda before, but I first heard it being thrown around in Machine Learning contexts since around 2017. Let’s give it a try from the perspective of a total n00b shall we?
I’m doing this on Windows 10.
Getting Started
Let’s try Miniconda since the website claims you have to download fewer dependencies.
First, let’s verify we have the required version of Python on the machine. At the time of this writing that was Python 3.9. At a prompt in cmd or PowerShell enter:
python --version
If Python is installed the text displayed should be which version you have. Mine said Python 3.10.4.
Next, run the installer you have downloaded. The only thing I changed from the defaults was to install for all users, but this tutorial shouldn’t require that.
There is a helpful-ish tutorial here: https://anaconda.cloud/getting-started-with-anaconda-distribution (note: probably will require the set up of a free account). I say --ish because it assumes you have installed the full Anaconda (vice Miniconda) and I’m not ready to install all those dependencies.
On Windows when the installer completes you get a start menu item that says: Anaconda3 or similar. From there you can launch an Anaconda via a regular cmd prompt or a PowerShell prompt. If the text:
conda
is entered at this prompt it will provide a helpful list of commands. For example, to see which version of conda is installed enter:
conda --version
Mine says: conda 4.12.0
Conda Basics
According to https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-environments conda has a default environment named base:
You don't want to put programs into your base environment, though. Create separate environments to keep your programs isolated from each other
We are going to create a new environment to house a simple pandas application. Pandas is a data analysis library written in Python. I’ve borrowed the majority of this example from:
https://www.w3schools.com/python/pandas/pandas_getting_started.asp
To create a new environment enter the command:
conda create --name pandas-test
Which should create the new pandas-test environment. To activate the environment, enter:
conda activate pandas-test
The text at the very beginning of the prompt should change from (base) to (pandas-test). For the paranoid, to verify the activation was successful enter:
conda info --envs
Which on my machine yields:
# conda environments:
#
base C:\ProgramData\Miniconda3
pandas-test * C:\Users\<path>\.conda\envs\pandas-test
robot-test C:\Users\<path>\.conda\envs\robot-test
The asterisk (*) at the beginning of pandas-test means success in the attempt to change environments. Note that the robot-test environment is for another unrelated project on my machine.
To install pandas inside the conda panda test type:
conda install pandas
This will download a number of dependencies of pandas. Dependencies on my machine were:
blas pkgs/main/win-64::blas-1.0-mkl
bottleneck pkgs/main/win-64::bottleneck-1.3.5-py310h9128911_0
bzip2 pkgs/main/win-64::bzip2-1.0.8-he774522_0
ca-certificates pkgs/main/win-64::ca-certificates-2022.4.26-haa95532_0
certifi pkgs/main/win-64::certifi-2022.6.15-py310haa95532_0
intel-openmp pkgs/main/win-64::intel-openmp-2021.4.0-haa95532_3556
libffi pkgs/main/win-64::libffi-3.4.2-hd77b12b_4
mkl pkgs/main/win-64::mkl-2021.4.0-haa95532_640
mkl-service pkgs/main/win-64::mkl-service-2.4.0-py310h2bbff1b_0
mkl_fft pkgs/main/win-64::mkl_fft-1.3.1-py310ha0764ea_0
mkl_random pkgs/main/win-64::mkl_random-1.2.2-py310h4ed8f06_0
numexpr pkgs/main/win-64::numexpr-2.8.3-py310hb57aa6b_0
numpy pkgs/main/win-64::numpy-1.22.3-py310h6d2d95c_0
numpy-base pkgs/main/win-64::numpy-base-1.22.3-py310h206c741_0
openssl pkgs/main/win-64::openssl-1.1.1q-h2bbff1b_0
packaging pkgs/main/noarch::packaging-21.3-pyhd3eb1b0_0
pandas pkgs/main/win-64::pandas-1.4.3-py310hd77b12b_0
pip pkgs/main/win-64::pip-22.1.2-py310haa95532_0
pyparsing pkgs/main/noarch::pyparsing-3.0.4-pyhd3eb1b0_0
python pkgs/main/win-64::python-3.10.4-hbb2ffb3_0
python-dateutil pkgs/main/noarch::python-dateutil-2.8.2-pyhd3eb1b0_0
pytz pkgs/main/win-64::pytz-2022.1-py310haa95532_0
setuptools pkgs/main/win-64::setuptools-61.2.0-py310haa95532_0
six pkgs/main/noarch::six-1.16.0-pyhd3eb1b0_1
sqlite pkgs/main/win-64::sqlite-3.38.5-h2bbff1b_0
tk pkgs/main/win-64::tk-8.6.12-h2bbff1b_0
tzdata pkgs/main/noarch::tzdata-2022a-hda174b7_0
vc pkgs/main/win-64::vc-14.2-h21ff451_1
vs2015_runtime pkgs/main/win-64::vs2015_runtime-14.27.29016-h5e58377_2
wheel pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0
wincertstore pkgs/main/win-64::wincertstore-0.2-py310haa95532_2
xz pkgs/main/win-64::xz-5.2.5-h8cc25b3_1
zlib pkgs/main/win-64::zlib-1.2.12-h8cc25b3_2
As you can see, pandas has a load of dependencies!
Now enter:
mkdir pandas
cd pandas
note: because we are inside a conda shell we can use Unix-style syntax on Windows! However, Windows is becoming ever more Linux friendly so on Windows 10 at least I’m 90% sure that was already possible…
—————-
Pro Tip:
I ran into a problem in my environment where I couldn’t activate the pandas-test environment. I think this is because I rebooted my machine without fully deactivating the pandas-test environment. Should you run into a similar issue: exit from the Anaconda shell and re-enter it. Then type:
conda deactivate pandas-test
conda activate pandas-test
See this post: https://stackoverflow.com/questions/49127834/removing-conda-environment
——————
Next, enter the following text into VS Code, your text editor of choice, or Iron Python, or some other way:
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
Save the file as pandas_example.py (or similar). Now at the prompt enter:
python pandas_example.py
If all of the preceding steps have been followed the screen should show:
cars passings
0 BMW 3
1 Volvo 7
2 Ford 2
That is far from all that is possible with pandas! Pandas has become quite a powerful data processor in its own right. Read more about it (for example) on the w3schools pages or at Wikipedia.
As noted before, example files have been uploaded at: https://github.com/mday299/keypuncher/tree/main/Python/pandas
Credits:
https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html
https://www.w3schools.com/python/pandas/default.asp
Installing conda on Ubuntu 22.04: https://linuxhint.com/install-anaconda-ubuntu-22-04/