Prepare Your Data Science Environment

Anaconda Comprehensive Guide

6 min readFeb 28, 2021

Anaconda environments header. Built on top of a vector image from freepik

Content

Set the context
Installation
Manage your environment
Anaconda commands guide
Extras (Pycharm & EC2 instances)

Set the context

There are many cases where you need to set up your environment, such as working on a local machine, preparing a docker container, or working directly on the server, which can be cheaper and more flexible than off-the-shelf platforms.

It is not that complicated to prepare your machine learning (ML) environment with distribution tools like Anaconda, which helps to manage and distribute Python and R libraries.

Anaconda resolves dependencies; when you install a new library or framework, then the dependency will be installed as well. For example, a command to install pandas will install numPy implicitly.

We are going to go through the steps to prepare your Python ML environment and launch notebooks to write code. Besides, instruction for Pycharm integration with Anaconda and preparing Amazon’s Elastic Compute Cloud (EC2) instances for jupyter notebook.

Installation

Anaconda provides an installer for the major operating systems. Installation should be easy, run the execution file and follow the default installer instructions.

The major components:

Command-line to manage Anaconda which requires a setup in some cases.
Anaconda navigator is a UI view to manage environments that does not require any setup.

Next are the steps to set up and execute Anaconda commands on macOS, ubuntu, and windows.

MacOS

To execute Anaconda commands, you have to define the path in your terminal profiles — usually, the Anaconda installer will handle this.

Open the terminal application.
Make sure the bash profile points to anaconda.

Edit bash_profile command:

nano ~/.bash_profile

Make sure the anaconda path available [or] add it by yourself (The path can change based on your installation):

export PATH="/opt/anaconda3/bin:$PATH"

3. (Optional — for zsh shell) if your terminal uses zsh shell; then the bash_profile will not be the default path and an extra step required.

Use this command to edit the path:

nano ~/.zshrc

Add the path to anaconda bin — if not available:

export PATH="/opt/anaconda3/bin:$PATH"

[OR] add the following script to point to the bash profile:

if [ -f ~/.bash_profile ]; then. ~/.bash_profile;fi

Ubuntu

For other operating systems it is straightforward to run the installer. However, on a linux based OS you will use the commands to install Anaconda:

wget <installer_url_from_anaconda.com>
sh <installation_file.sh>
rm <installation_file.sh>

Then you have to define the terminal path — usually defined by the installer:

Open the terminal application.

2. Open the “bashrc” paths:

nano ~/.bashrc

3. Verify the path to anaconda available or add it to the end of the file:

export PATH=~/anaconda3/bin:$PATH

4. Refresh the terminal source:

source ~/.bashrc

Windows

To start writing Anaconda commands on a windows machine, go to the start menu and search for “Anaconda Prompt”.

For a user interface with the major environment actions open “Anaconda Navigator”.

Dekstop navigator — Source: anaconda.com

Manage your environment

Anaconda allows you to create a separate environment for your projects. There are two ways to manage the environments: (1) using the commands, which are flexible and portable or (2) using a user interface to control the major actions.

Way1 — Anaconda commands

Open the terminal and execute the commands.

Create a new conda environment:

conda create -n env_name

Create a new environment and define Python version:

conda create -n env_name python=3.8

Remove conda environment:

conda env remove -n env_name

List conda environments:

conda env list

Activate conda environment: before you start executing any code make sure the relevant environment is active

conda activate env_name

Export conda environment: will export the active environment

conda env export > environment.yaml

Import conda environment: will create the environment

conda env create -f environment.yaml

Export the requirements: a common requirements file for most deployments

pip freeze > requirements.txt

Import the requirements: will import to the active environment

pip install -r requirements.txt

Reset conda root environment: reset to rev 1 instead of 0 to avoid issues.

conda list --revisions
conda install --rev REV_NUM

Way2 — Anaconda navigator interface

Open the Anaconda navigator application.

2. Go to environments section to create, clone, import, or remove any env.

3. Click “create” to start a new Python or R environment.

Anaconda commands guide

Install the major libraries

Create environment : bypass if you already created the environment

conda create -n env_name

Activate your environment:

conda active env_name

Install numPy:

conda install numpy

Install pandas and numpy:

conda install pandas

Install sklearn:

conda install -c conda-forge scikit-learn

Install Tensorflow: there is no official conda distribution for Tensorflow; it’s better to use python’s pip installation command

pip install --upgrade pippip install tensorflow

Install PyTorch:

conda install pytorch torchvision torchaudio -c pytorch

NOTE: Python 3.9 users will need to add ‘-c=conda-forge’ for installation

Start jupyter notebook

Start a jupyter notebook from local environment:

jupyter notebook

Start a jupyter notebook from server: replace 0.0.0.0 with the server IP :

jupyter notebook --ip=0.0.0.0 --no-browser

Extras

Pycharm integration

You can connect your Pycharm project to an existing anaconda environment.

When you start a new Pycharm project:

Select “previous configured interpreter” and click the three dots.

Select the python file for the desired environment from the interpreter menu.

If the menu is empty you can click the three dots and navigate to “anaconda3/envs/environment_name/bin/python”

For existing Pycharm projects:

Go to preferences and select “Python Interpreter”.

Look for the environment in the drop-down menu or add if not available from the settings button.

Environment location: “anaconda3/envs/environment_name/bin/python”

Prepare EC2 instance

When you launch a new EC2 instance, search for the “deep learning” compute machine that comes with all the data science requirements.

Make sure to open the default port for the jupyter notebooks. You can define this in the 6th step of instance creation.

Add a custom TCP rule with port range “8888”; For the source section, it is recommended to add only your IP.

All the steps were tested by me before writing the article; Hopefully, you find this blog useful to start your machine learning projects.