Introduction to Conda

“Packaging can be theater, it can create a story.” -- Steve Jobs

Photo by Photoholgic on Unsplash

Example files for this post have been uploaded to: https://github.com/mday299/keypuncher/tree/main/Python/pandas

According to their website, conda is:

“…an open source package management system and environment management system that runs on Windows, macOS, Linux and z/OS... It was created for Python programs, but it can package and distribute software for any language….”

“In its default configuration, conda can install and manage the thousand packages at repo.anaconda.com that are built, reviewed and maintained by Anaconda®.”

I have never used Anaconda before, but I first heard it being thrown around in Machine Learning contexts since around 2017. Let’s give it a try from the perspective of a total n00b shall we?

I’m doing this on Windows 10.

Getting Started

Let’s try Miniconda since the website claims you have to download fewer dependencies.

First, let’s verify we have the required version of Python on the machine. At the time of this writing that was Python 3.9. At a prompt in cmd or PowerShell enter:

python --version

If Python is installed the text displayed should be which version you have. Mine said Python 3.10.4.

Next, run the installer you have downloaded. The only thing I changed from the defaults was to install for all users, but this tutorial shouldn’t require that.

There is a helpful-ish tutorial here: https://anaconda.cloud/getting-started-with-anaconda-distribution (note: probably will require the set up of a free account). I say --ish because it assumes you have installed the full Anaconda (vice Miniconda) and I’m not ready to install all those dependencies.

On Windows when the installer completes you get a start menu item that says: Anaconda3 or similar. From there you can launch an Anaconda via a regular cmd prompt or a PowerShell prompt. If the text:

conda

is entered at this prompt it will provide a helpful list of commands. For example, to see which version of conda is installed enter:

conda --version

Mine says: conda 4.12.0

Conda Basics

According to https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-environments conda has a default environment named base:

You don't want to put programs into your base environment, though. Create separate environments to keep your programs isolated from each other

We are going to create a new environment to house a simple pandas application. Pandas is a data analysis library written in Python. I’ve borrowed the majority of this example from:

https://www.w3schools.com/python/pandas/pandas_getting_started.asp

To create a new environment enter the command:

conda create --name pandas-test

Which should create the new pandas-test environment. To activate the environment, enter:

conda activate pandas-test

The text at the very beginning of the prompt should change from (base) to (pandas-test). For the paranoid, to verify the activation was successful enter:

conda info --envs

Which on my machine yields:

# conda environments:
#
base                    C:\ProgramData\Miniconda3
pandas-test    *  C:\Users\<path>\.conda\envs\pandas-test
robot-test           C:\Users\<path>\.conda\envs\robot-test

The asterisk (*) at the beginning of pandas-test means success in the attempt to change environments. Note that the robot-test environment is for another unrelated project on my machine.

To install pandas inside the conda panda test type:

conda install pandas

This will download a number of dependencies of pandas. Dependencies on my machine were:

blas               pkgs/main/win-64::blas-1.0-mkl
bottleneck         pkgs/main/win-64::bottleneck-1.3.5-py310h9128911_0
bzip2              pkgs/main/win-64::bzip2-1.0.8-he774522_0
ca-certificates    pkgs/main/win-64::ca-certificates-2022.4.26-haa95532_0
certifi            pkgs/main/win-64::certifi-2022.6.15-py310haa95532_0
intel-openmp       pkgs/main/win-64::intel-openmp-2021.4.0-haa95532_3556
libffi             pkgs/main/win-64::libffi-3.4.2-hd77b12b_4
mkl                pkgs/main/win-64::mkl-2021.4.0-haa95532_640
mkl-service        pkgs/main/win-64::mkl-service-2.4.0-py310h2bbff1b_0
mkl_fft            pkgs/main/win-64::mkl_fft-1.3.1-py310ha0764ea_0
mkl_random         pkgs/main/win-64::mkl_random-1.2.2-py310h4ed8f06_0
numexpr            pkgs/main/win-64::numexpr-2.8.3-py310hb57aa6b_0
numpy              pkgs/main/win-64::numpy-1.22.3-py310h6d2d95c_0
numpy-base         pkgs/main/win-64::numpy-base-1.22.3-py310h206c741_0
openssl            pkgs/main/win-64::openssl-1.1.1q-h2bbff1b_0
packaging          pkgs/main/noarch::packaging-21.3-pyhd3eb1b0_0
pandas             pkgs/main/win-64::pandas-1.4.3-py310hd77b12b_0
pip                pkgs/main/win-64::pip-22.1.2-py310haa95532_0
pyparsing          pkgs/main/noarch::pyparsing-3.0.4-pyhd3eb1b0_0
python             pkgs/main/win-64::python-3.10.4-hbb2ffb3_0
python-dateutil    pkgs/main/noarch::python-dateutil-2.8.2-pyhd3eb1b0_0
pytz               pkgs/main/win-64::pytz-2022.1-py310haa95532_0
setuptools         pkgs/main/win-64::setuptools-61.2.0-py310haa95532_0
six                pkgs/main/noarch::six-1.16.0-pyhd3eb1b0_1
sqlite             pkgs/main/win-64::sqlite-3.38.5-h2bbff1b_0
tk                 pkgs/main/win-64::tk-8.6.12-h2bbff1b_0
tzdata             pkgs/main/noarch::tzdata-2022a-hda174b7_0
vc                 pkgs/main/win-64::vc-14.2-h21ff451_1
vs2015_runtime     pkgs/main/win-64::vs2015_runtime-14.27.29016-h5e58377_2
wheel              pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0
wincertstore       pkgs/main/win-64::wincertstore-0.2-py310haa95532_2
xz                 pkgs/main/win-64::xz-5.2.5-h8cc25b3_1
zlib               pkgs/main/win-64::zlib-1.2.12-h8cc25b3_2

As you can see, pandas has a load of dependencies!

Now enter:

mkdir pandas
cd pandas

note: because we are inside a conda shell we can use Unix-style syntax on Windows! However, Windows is becoming ever more Linux friendly so on Windows 10 at least I’m 90% sure that was already possible…

—————-

Pro Tip:

I ran into a problem in my environment where I couldn’t activate the pandas-test environment. I think this is because I rebooted my machine without fully deactivating the pandas-test environment. Should you run into a similar issue: exit from the Anaconda shell and re-enter it. Then type:

conda deactivate pandas-test
conda activate pandas-test

See this post: https://stackoverflow.com/questions/49127834/removing-conda-environment

——————

Next, enter the following text into VS Code, your text editor of choice, or Iron Python, or some other way:

import pandas as pd

mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}

myvar = pd.DataFrame(mydataset)

print(myvar)

Save the file as pandas_example.py (or similar). Now at the prompt enter:

python pandas_example.py

If all of the preceding steps have been followed the screen should show:

cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2

That is far from all that is possible with pandas! Pandas has become quite a powerful data processor in its own right. Read more about it (for example) on the w3schools pages or at Wikipedia.

As noted before, example files have been uploaded at: https://github.com/mday299/keypuncher/tree/main/Python/pandas

Credits:

https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html

https://www.w3schools.com/python/pandas/default.asp

Installing conda on Ubuntu 22.04: https://linuxhint.com/install-anaconda-ubuntu-22-04/

https://www.youtube.com/watch?v=1VVCd0eSkYc

Previous
Previous

Ignorance is bliss: the .gitignore file

Next
Next

Synergy Introduction