Prepare Your Data Science Environment

Anaconda Comprehensive Guide

Ahmad Albarqawi
6 min readFeb 28, 2021
Anaconda environments header. Built on top of a vector image from freepik

Content

  • Set the context
  • Installation
  • Manage your environment
  • Anaconda commands guide
  • Extras (Pycharm & EC2 instances)

Set the context

There are many cases where you need to set up your environment, such as working on a local machine, preparing a docker container, or working directly on the server, which can be cheaper and more flexible than off-the-shelf platforms.

It is not that complicated to prepare your machine learning (ML) environment with distribution tools like Anaconda, which helps to manage and distribute Python and R libraries.

Anaconda resolves dependencies; when you install a new library or framework, then the dependency will be installed as well. For example, a command to install pandas will install numPy implicitly.

We are going to go through the steps to prepare your Python ML environment and launch notebooks to write code. Besides, instruction for Pycharm integration with Anaconda and preparing Amazon’s Elastic Compute Cloud (EC2) instances for jupyter notebook.

Installation

Anaconda provides an installer for the major operating systems. Installation should be easy, run the execution file and follow the default installer instructions.

The major components:

  • Command-line to manage Anaconda which requires a setup in some cases.
  • Anaconda navigator is a UI view to manage environments that does not require any setup.

Next are the steps to set up and execute Anaconda commands on macOS, ubuntu, and windows.

MacOS

To execute Anaconda commands, you have to define the path in your terminal profiles — usually, the Anaconda installer will handle this.

  1. Open the terminal application.
  2. Make sure the bash profile points to anaconda.

Edit bash_profile command:

nano ~/.bash_profile

Make sure the anaconda path available [or] add it by yourself (The path can change based on your installation):

export PATH="/opt/anaconda3/bin:$PATH"

3. (Optional — for zsh shell) if your terminal uses zsh shell; then the bash_profile will not be the default path and an extra step required.

zsh terminal

Use this command to edit the path:

nano ~/.zshrc

Add the path to anaconda bin — if not available:

export PATH="/opt/anaconda3/bin:$PATH"

[OR] add the following script to point to the bash profile:

if [ -f ~/.bash_profile ]; then. ~/.bash_profile;fi

Ubuntu

For other operating systems it is straightforward to run the installer. However, on a linux based OS you will use the commands to install Anaconda:

wget <installer_url_from_anaconda.com>
sh <installation_file.sh>
rm <installation_file.sh>

Then you have to define the terminal path — usually defined by the installer:

  1. Open the terminal application.

2. Open the “bashrc” paths:

nano ~/.bashrc

3. Verify the path to anaconda available or add it to the end of the file:

export PATH=~/anaconda3/bin:$PATH

4. Refresh the terminal source:

source ~/.bashrc

Windows

To start writing Anaconda commands on a windows machine, go to the start menu and search for “Anaconda Prompt”.

Anaconda prompt on windows machines — Source: anaconda.com

For a user interface with the major environment actions open “Anaconda Navigator”.

Dekstop navigator — Source: anaconda.com

Manage your environment

Anaconda allows you to create a separate environment for your projects. There are two ways to manage the environments: (1) using the commands, which are flexible and portable or (2) using a user interface to control the major actions.

Way1 — Anaconda commands

Open the terminal and execute the commands.

  • Create a new conda environment:
conda create -n env_name
  • Create a new environment and define Python version:
conda create -n env_name python=3.8
  • Remove conda environment:
conda env remove -n env_name
  • List conda environments:
conda env list
  • Activate conda environment: before you start executing any code make sure the relevant environment is active
conda activate env_name
  • Export conda environment: will export the active environment
conda env export > environment.yaml
  • Import conda environment: will create the environment
conda env create -f environment.yaml
  • Export the requirements: a common requirements file for most deployments
pip freeze > requirements.txt
  • Import the requirements: will import to the active environment
pip install -r requirements.txt
  • Reset conda root environment: reset to rev 1 instead of 0 to avoid issues.
conda list --revisions
conda install --rev REV_NUM

Way2 — Anaconda navigator interface

  1. Open the Anaconda navigator application.
Anaconda navigator

2. Go to environments section to create, clone, import, or remove any env.

Navigator actions

3. Click “create” to start a new Python or R environment.

Navigator create window

Anaconda commands guide

Install the major libraries

source: https://ahmadai.com/miner/
  • Create environment : bypass if you already created the environment
conda create -n env_name
  • Activate your environment:
conda active env_name
  • Install numPy:
conda install numpy
  • Install pandas and numpy:
conda install pandas
  • Install sklearn:
conda install -c conda-forge scikit-learn
  • Install Tensorflow: there is no official conda distribution for Tensorflow; it’s better to use python’s pip installation command
pip install --upgrade pippip install tensorflow
  • Install PyTorch:
conda install pytorch torchvision torchaudio -c pytorch

NOTE: Python 3.9 users will need to add ‘-c=conda-forge’ for installation

Start jupyter notebook

  • Start a jupyter notebook from local environment:
jupyter notebook
  • Start a jupyter notebook from server: replace 0.0.0.0 with the server IP :
jupyter notebook --ip=0.0.0.0 --no-browser

Extras

Pycharm integration

You can connect your Pycharm project to an existing anaconda environment.

  • When you start a new Pycharm project:

Select “previous configured interpreter” and click the three dots.

Select the python file for the desired environment from the interpreter menu.

If the menu is empty you can click the three dots and navigate to “anaconda3/envs/environment_name/bin/python”

  • For existing Pycharm projects:

Go to preferences and select “Python Interpreter”.

Look for the environment in the drop-down menu or add if not available from the settings button.

Environment location: “anaconda3/envs/environment_name/bin/python”

Prepare EC2 instance

  • When you launch a new EC2 instance, search for the “deep learning” compute machine that comes with all the data science requirements.
EC2 instance from the marketplace
  • Make sure to open the default port for the jupyter notebooks. You can define this in the 6th step of instance creation.
Jupyter notebook rule

Add a custom TCP rule with port range “8888”; For the source section, it is recommended to add only your IP.

All the steps were tested by me before writing the article; Hopefully, you find this blog useful to start your machine learning projects.

--

--

Ahmad Albarqawi
Ahmad Albarqawi

Written by Ahmad Albarqawi

Master’s data science scholar at UIUC. ahmadai.com

No responses yet