How to Install Python Packages

Creating and saving a Nextjournal environment

Each Nextjournal code cell runs in a runtime, and each runtime has an environment, which is a Docker container with its own filesystem. In any environment we can install whatever system or language packages we need, modify configuration files, and set up directory and data file structures however we like. Then, we can save and export the environment as a whole for future reproducibility, as well as use by others.

Let's configure an environment for mapmaking, with the geoplot package. We'll install packages in a runtime we name geoplot, and set it to use Nextjournal's default Python environment. This Python 3 environment—as well as its Python 2 counterpart—has a variety of packages installed, including numpy, matplotlib, and plotly:

pip freeze

When we need additional Python packages, they can be installed in multiple ways. The easiest way is to use conda, which will attempt to install all packages and dependencies in a consistent manner, including system packages and libraries. The Anaconda Cloud has a searchable database of packages and channels—by default we will select only from the anaconda channel.

conda install -y descartes pysal

If we need a different version or esoteric package, we can add other channels.

conda install -y -c conda-forge cartopy

For packages and versions unavailable via conda, or for installing packages in wheel files, pip is available. For any packages that require compilation, we can install gcc first.

apt-get update > /dev/null
apt-get install -y gcc

pip install quilt

We can also use pip to install development versions off of github, though we have to install git first.

apt-get install -y git

pip install git+https://github.com/geopandas/geopandas

Finally, if a package has a setup.py, we can download and install with that.

git clone https://github.com/ResidentMario/geoplot
cd geoplot
python setup.py install

Once everything is set up to our liking, we can save and export the runtime's end state as a new environment using its configuration panel. Using the saved geoplot environment as our Main runtime's environment then ensures that the versions of programs and packages that the article is developed on will be preserved for future reproducibility, even through a Remix. Once the article is published, our exported environment will also be available for other articles to use via transclusion.

Here's an example from the geoplot gallery:

quilt install ResidentMario/geoplot_data
2.1s
# Load the data (uses the `quilt` package).
import geopandas as gpd
from quilt.data.ResidentMario import geoplot_data

continental_cities = gpd.read_file(
  geoplot_data.usa_cities()).query('POP_2010 > 100000')
continental_usa = gpd.read_file(geoplot_data.contiguous_usa())


# Plot the figure.
import geoplot as gplt
import geoplot.crs as gcrs
import matplotlib.pyplot as plt

poly_kwargs = {'linewidth': 0.5, 'edgecolor': 'gray', 'zorder': -1}
point_kwargs = {'linewidth': 0.5, 'edgecolor': 'black', 'alpha': 1}
legend_kwargs = {'bbox_to_anchor': (0.9, 0.9), 'frameon': False}

ax = gplt.polyplot(continental_usa,
                   projection=gcrs.AlbersEqualArea(central_longitude=-98, 
                                                   central_latitude=39.5),
                   **poly_kwargs)

gplt.pointplot(continental_cities, projection=gcrs.AlbersEqualArea(), ax=ax,
               scale='POP_2010', limits=(1, 80),
               hue='POP_2010', cmap='Blues',
               legend=True, legend_var='scale',
               legend_values=[8000000, 6000000, 4000000, 2000000, 100000],
               legend_labels=['8 million', '6 million', '4 million', 
                              '2 million', '100 thousand'],
               legend_kwargs=legend_kwargs,
               **point_kwargs)

plt.title("Large cities in the contiguous United States, 2010")
plt.savefig("/results/map.svg", bbox_inches='tight', pad_inches=0.1)