Anaconda and Jupyter

Notes on using Anaconda

Annoyed at often forgetting some of this, I thought I'd try to document parts of my workflow here so that I could refer back to it in future.

Which Conda? Or no Conda at all?

Honestly, I fell into using Anaconda via a group project I joined a while back. I've used it sporadically since for stuff I want to do in Jupyter Notebooks. I've never really looked at alternatives (eg Miniconda, or using pip based solutions). For now, it's been nice to have a GUI, particularly on Windows, but on Linux I prefer to do stuff from the command line. That said, it's mostly meant I haven't had to think too much about it which has helped me get up and running. There is a good write-up of what Conda is and isn't here

Installing Anaconda

Just pull the latest installer from their website. Download the appropriate bundle for your OS and architecture and follow the installation instructions. I am only using the 3.7 distribution and don't have any legacy code I depend on for my projects.

Eg using Linux, wherever you've downloaded the installer (obviously change the filename to match whatever you've downloaded):

source ./Anaconda3-2019.07-Linux-x86_64.sh

I generally install it in the default location, i.e. an anaconda directory in ~/

Managing Environments

You can use the GUI (ie anaconda-navigator) but can do most things from the command line. I find the gui a bit clunky to be honest but it's nice to be able to fall back on it.

List environments

This will give you a list of environments this conda knows about.

conda env list

# conda environments:
#
base                  *  /home/alex/anaconda3
[...]
$

Create

You probably want to create a new environment for a new project and install any relevant libraries within it.

conda create -n testenv python=3.7 anaconda

Activate

Once you've created an environment, activate it and install packages of interest.

conda activate testenv

Installing libraries

Libraries can be installed with either conda or pip. As a basic rule, I search for them on conda first and install if available then install them via pip if not.

Is it in the conda repos?

This looks for a package and tells you it's available.

conda search plotly

Loading channels: ...working... done
# Name                       Version           Build  Channel
plotly                        2.0.15  py27h139127e_0  pkgs/main
plotly                        2.0.15  py35h43bf465_0  pkgs/main
plotly                        2.0.15  py36hd032def_0  pkgs/main
[...]

Great, install it from conda

We don't need to look any further, just install it.

conda install plotly

It might not exist in the Conda repo though

If you get a response like this, it's not available in the Conda distribution you have.

conda search geoplotlib

Loading channels: done
No match found for: geoplotlib. Search: *geoplotlib*
[...]

Try installing via pip instead

If it's a library that's available on PyPi, you should be able to install it by doing the following.

pip install geoplotlib

Collecting geoplotlib
[...]
Successfully installed geoplotlib-0.3.2

Deactivate

Once you are done fiddling with a given environment, you can deactivate it so that anything you do won't impact it.

conda deactivate

Managing Jupyter kernels and conda environments

I find it useful to create a single conda environment to launch Jupyter Notebooks from and leverage the ability to let them run kernels from other environments offered by installing ipykernel.

Install ipykernel in our new env

This makes a kernel based on this environment available for environments running nb_conda_kernels.

conda install ipykernel

Create the notebook env

To launch our notebook server, we want to create a separate env.

conda create -n notebook python=3.7 anaconda

Activate the notebook env and install nb_conda_kernels

Activate the new notebook env

conda activate notebook

Install nb_conda_kernels

conda install nb_conda_kernels

Launch a notebook server

jupyter notebook

Open the notebook base url in your browser

Eg navigate to http://localhost:8888

You should now be able to create a new notebook using a kernel from your new environment. From there you should be able to import all the libraries you've installed to the current environment from the running kernel.

Using Notebooks

A few notes on using notebooks themselves.

Useful key bindings

Ones I most for navigation.

Command Mode

Action	Binding
Enter	enter edit mode
Shift-Enter	run, select below
Ctrl-Enter	run
Alt-Enter	run, insert below
Y	to code
M	to markdown

Edit Mode

Action	Binding
Esc	Enter command mode
Shift-Enter	run, select below
Ctrl-Enter	run
Alt-Enter	run, insert below
Ctrl-S	Save and checkpoint

Using with Git

Repository setup

I created a new git repository with a project template that provides a consistent base to start from. I tend to work on different projects in separate branches which I merge into master once complete. Potentially, I'll move bigger projects into separate repositories.

Gitignore

I include this to ensure no checkpoints are included, useful if you iterate on the notebooks in the repo and don't want to inadvertently push private details back to Github. If you use the Python .gitignore from Github this will be included automatically.

.ipynb_checkpoints

Securing private info

I.e. don't include things like private API keys as declared variables in any notebooks you end up publishing on Github.

Read from a config file

Perhaps this isn't the best approach but it works ok for now. I create a simple notebook_config.json file containing something like this:

{
  "SECRET_API_KEY": "verysecret"
}

I can then read it in from the notebook:

import json

with open('notebook_config.json') as config_file:
 data = json.load(config_file)

key = data['SECRET_API_KEY']

Add this to .gitignore

notebook_config.json

Reading in local data

I find it helps to create a folder for datasets and load them in a standard(ish) way. For example:

dataset_path = "../datasets/"
gdf = gpd.read_file(dataset_path + 'Berlin_AL9-AL9.shp', encoding='utf-8')