Anaconda and Jupyter Cheatsheet
A bit annoyed at finding myself having to re-learn a good workflow for setting up dev environments for Jupyter projects, I thought I'd try to list my workflow as is and update as I improve so I can refer back after a hiatus.
Which Conda? Or no Conda at all?
Honestly, I fell into using Anaconda via a group project I joined a while back. I've used it sporadically since for stuff I want to do in Jupyter Notebooks. I've never really looked at alternatives (eg Miniconda, or using pip based solutions). For now, it's been nice to have a GUI, particularly on Windows, but on Linux I prefer to do stuff from the command line. That said, it's mostly meant I haven't had to think too much about it which has helped me get up and running. There is a good write up of what Conda is and isn't here
Installing Anaconda
Just pull the latest installer from their website. Download the appropriate bundle for your OS and architecture and follow the installation instructions. I am exclusively using the 3.7 distribution and don't have any legacy code I depend on for my projects.
Eg using Linux, wherever you've downloaded the installer (obviously full filename to match whatever you've downloaded):
source ./Anaconda3-2019.07-Linux-x86_64.sh
I generally install it in the default location, i.e. an anaconda directory in ~/
Managing Environments
You can use the GUI (ie anaconda-navigator) but can do most things from the command line. I find the gui a bit clunky to be honest but it's nice to be able to fall back on it.
List environments
This will give you a list of environments this conda knows about.
conda env list
# conda environments:
#
base * /home/alex/anaconda3
[...]
$
Create
You probably want to create a new environment for a new project and install any relevant libraries within it.
conda create -n testenv python=3.7 anaconda
Activate
Once you've created an environment, activate it and install packages of interest.
conda activate testenv
Installing libraries
Libraries can be installed with either conda or pip. As a basic rule, I search for them on conda first and install if available and install them via pip if not.
Is it in the conda repos?
This looks for a package and tells you it's available.
conda search plotly
Loading channels: ...working... done
# Name Version Build Channel
plotly 2.0.15 py27h139127e_0 pkgs/main
plotly 2.0.15 py35h43bf465_0 pkgs/main
plotly 2.0.15 py36hd032def_0 pkgs/main
[...]
Great, install it from conda
We don't need to look any further, just install it.
conda install plotly
It might not exist in the conda repo though
If you get a response like this, it's not available in the conda distribution you have.
conda search geoplotlib
Loading channels: done
No match found for: geoplotlib. Search: *geoplotlib*
[...]
Try installing via pip instead
If it's a library that's available on PyPi, you should be able to install it by doing the following.
pip install geoplotlib
Collecting geoplotlib
[...]
Successfully installed geoplotlib-0.3.2
Deactivate
Once you are done fiddling with a given environment, you can deactivate it so that anything you do won't impact it.
conda deactivate
Managing Jupyter kernels and conda environments
I find it useful to create a single conda environment to launch jupyter notebooks from and leverage the ability to let them run kernels from other environments offered by installing ipykernel.
Install ipykernel in our new env
This makes a kernel based on this environment available for environments running nb_conda_kernels.
conda install ipykernel
Create the notebook env
To launch our notebook server, we want to create a separate env.
conda create -n notebook python=3.7 anaconda
Activate the notebook env and install nb_conda_kernels
Activate the new notebook env
conda activate notebook
Install nb_conda_kernels
conda install nb_conda_kernels
Launch a notebook server
jupyter notebook
Open the notebook base url in your browser
Eg navigate to http://localhost:8888
You should now be able to create a new notebook using a kernel from your new environment. From there you should be able to import all the libraries you've installed into the current environment from the running kernel.
Using Notebooks
A few notes on use of notebooks themselves.
Useful key bindings
Ones I most for navigation.
Command Mode
Action | Binding |
---|---|
Enter | enter edit mode |
Shift-Enter | run, select below |
Ctrl-Enter | run |
Alt-Enter | run, insert below |
Y | to code |
M | to markdown |
Edit Mode
Action | Binding |
---|---|
Esc | Enter command mode |
Shift-Enter | run, select below |
Ctrl-Enter | run |
Alt-Enter | run, insert below |
Ctrl-S | Save and checkpoint |
Using with Git
Repo setup
I created a new empty repo on Github and cloned that locally and work there on notebook projects, committing completed projects back to Github and potentially moving bigger projects into their own repo or publishing certain notebooks to this blog.
Gitignore
I include this to ensure no checkpoints are included, useful if you iterate on the notebooks in the repo and don't want to inadvertently push private details back to Github. If you use the standard Python .gitignore from Github this will be included automatically.
.ipynb_checkpoints
Securing private info
I.e. don't include things like private API keys as declared variables in any notebooks you end up publishing on Github.
Read from a config file
Perhaps this isn't the best approach but it works ok for now. I create a simple notebook_config.json file containing something like this:
{
"SECRET_API_KEY": "verysecret"
}
I can then read it in from the notebook:
import json
with open('notebook_config.json') as config_file:
data = json.load(config_file)
key = data['SECRET_API_KEY']
Add this to .gitignore
notebook_config.json
Reading in local data
I find it helps to create a folder for datasets and load them in a standard(ish) way. For example:
dataset_path = "../datasets/"
gdf = gpd.read_file(dataset_path + 'Berlin_AL9-AL9.shp', encoding='utf-8')