Heiko Schmidle / Jan 20 2021

with

Python Tutorial

This is a short introduction to using Nextjournal with Python. Here, we will show you how to run code, how to install libraries, how to work with files (small and big), and how to re-use code.

The best way to use this tutorial is to open an account (it's free!) and then click the “Remix” button in the menu bar. Remixing will create your own copy of this notebook (to be found in your personal dashboard), so that you can start playing around with it. You can execute the code cells along the way which will help you understand the concepts.

Another option is to just start a new notebook and execute the commands described in the tutorial, performing each single step yourself. This way, you will probably learn the most.

Some of the concepts are a bit technical, but don't worry: the aim of this tutorial is to show you how to drive a car, and there is no need to fully understand how the engine works yet. We add more technical descriptions where useful, but to get started you can skip those parts if you wish.

Highlights

Before we start, let's highlight some of the unique features available in Nextjournal that will be introduced in this tutorial.

Sharing

You can share notebooks super easily by creating a link or by publishing the notebook. In the former case, the person reading it doesn't even need a Nextjournal account. When you publish your notebook to your team (or to the public), anyone who has access to your notebook can immediately remix it and start working where you left off. This includes the data, libraries, functions, etc. Everything will work immediately; the reason why that is the case, is part of this tutorial.

Mixing runtimes

A very powerful concept of Nextjournal is the association of runtimes (environments that your code needs in order to be run) to code cells. This means that you can use different runtimes for different code cells within one notebook. Therefore, you can mix programming languages to run your research, import, export, and share runtimes with your co-workers and the public. We will explore that more later.

History

Every change to a notebook is saved and you can always re-create every state of your notebook from the very beginning. Always.

Locking cells

Unlike most other computational notebooks, you can lock a code cell and make sure it is not executed unintentionally.

Cloud computing

You can run your code in the cloud without setting up credentials, starting and stopping instances, etc. You can choose even pretty big machines using GPU's without the headache of installing drivers and libraries.

Production code

Technically speaking, your runtimes are Docker containers with an underlying image. You can export the environment and use it in production. This way, you can make sure that the production system uses exactly the same state as your research.

Exporting

In case you want to share your results with someone outside of Nextjournal, you can simply export the notebook as Markdown or as a Jupyter notebook, or share it as a read-only page.

Basic concepts

Please read the Quickstart Guide for a quick overview of the basic concepts.

Now you should be familiar with how to use the editor, add new content, what blocks and code cells are. To use Nextjournal with Python you will probably only need the Bash and Python runtime. Regarding runtimes, the concept is a bit more tricky to wrap one's head around, but once you get it, it is very powerful. For a more detailed introduction to runtimes and environments, see Runtimes and Environments.

Let's summarise the main points you need to know about runtimes and environments. A code cell runs in a runtime, you can see it as the machine where the code cell is executed. The environment describes the machine, e.g. installed libraries and packages, files, etc. One very important point is the fact that the runtime (machine) shuts down after 20 mins of idle time. When you restart the runtime all the changes to the environment are lost, e.g installed libraries, saved files, etc. Don't worry, there are convenient solutions to not always re-install packages and re-create files over and over again that I will demonstrate below.

Technical details

Runtimes correspond to a Docker container where your code cell is running. This container is defined by an environment, the underlying Docker image. The environment gives you access to the file system, packages and libraries, etc. But you need to be careful, the filesystem is transient. That means if you save data to a file and the runtime stops (what happens automatically after 20 mins of idle time) your file will be gone. The same applies to packages, e.g. if you install a library via pip, after a restart the package is gone and you need to re-install it. The reason is simply that the runtime always starts with the exact same environment and unless you have added your changes to the filesystem, they are not saved. This concept has several advantages, you can run code cells with different runtimes in the same notebook, i.e. mixing programming languages, exporting and importing runtimes, as well as reuse and share runtimes in your team and with everybody else.

Running Python code

You probably saw how to run code in the Quickstart Guide, but let's recap.

Click on the small plus sign on the left side, or hit Enter twice and select

Code Cell: Python

print('Hello from Nextjournal')

0.4s

Python Tutorial (Python)

Alternatively, you can add a code cell in any language by typing ```<language> and then pressing Space , so in our case ```python and then Space will make a new Python code cell appear. If you just type ``` and follow with Space, a code cell of the last used language will be added.

You should see a small yellow circle 🟡 at the bottom left of the code cell you just created. This might take a few moments, since the runtime needs to boot, but the next execution should happen very fast. Try it, just place the cursor in the cell and hit Shift + Enter or click the small play button ▶️ in the bottom right corner. All runtimes of a notebook appear above the Table of Contents on the left side. For instance, the current notebook has three different runtimes (Bash, Libraries (Bash), Python Tutorial (Python)). It is good practice to give your runtimes meaningful names what makes it easier to keep track of different runtimes and assign the correct one to each code cell.

There are a lot of Python libraries already pre-installed and you can easily import them.

import sklearnsklearn.__version__

1.4s

Python Tutorial (Python)

Technical details

When you run a code cell for the first time a few things are happening. First an environment gets pulled (Docker image) that uses the respective environment of the code cell we just created: for Python this is Python 3 by default. A computing instance is started and the runtime boots using the environment (the Docker container starts). When you first start the runtime downloading and booting will take some time, indicated by the yellow dot, but all following runs will be much faster. You should also see that the code cell displays Python 3 with a blue shading, if you click on it you will be directed to the underlying environment (notebook) that was exported to create this runtime. We have added a default Python 3 code cell below.

0.0s

Python

Installing libraries

In case you need a library that is not part of the Nextjournal Python 3 environment, you can easily install a new package with pip.

We would like to use Bash together with our Python 3 runtime, and Nextjournal provides this feature by automatically attaching Bash to the last runtime we used.

Let's do it step-by-step. Add a Bash code cell; this will allow you to run bash commands. The Bash code cell is automatically attached to our last runtime, which is the Python Tutorial runtime, meaning that any command we run in the Bash code cell will install the library in the Python Tutorial runtime. The runtime should now look like this (the code cell below is an image):

At the bottom right, you see now Python Tutorial (Bash in Python): this means that we have now Bash that accesses the Python Tutorial runtime, and we can use bash commands to install libraries or perform other tasks.

If the Bash runtime is not attached to the Python Tutorial runtime, the new runtime will look like this (the code cell below is an image):

At the bottom right, you see only Bash, which means we have a runtime that is stand-alone, like a different machine, and not connected to the Python Tutorial runtime. We can change the runtime by using the options menu, clicking the “•••” button next to the cell or hit Command/Ctrl+Shift+O, choose Change Runtime... and then pick Python Tutorial (Python). This attaches the Bash runtime to the Python Tutorial runtime and you are now able to install the library in this runtime.

pip install Delorean

7.1s

Python Tutorial (Bash in Python)

Now, we can import the library in our runtime:

import delorean

0.6s

Python Tutorial (Python)

To demonstrate some underlying mechanisms, let's add a new code cell and change the runtime. We select again Bash from the code cells. It is again attached to the Python Tutorial runtime. We can change that by clicking the “•••” button left to the code cell (or hit Command/Ctrl+Shift+O ), choose Change Runtime..., then Add New Runtime... , and finally choose Bash. Now the bottom right part of the cell tells you that this runtime is Bash and not (Bash in Python) anymore. Also, you will notice that on the left side within the Table of Contents a new runtime was added. This runtime is now fully independent from the first one, meaning if you install a package here, it won't be accessible within the Python Tutorial runtime.

Let's demonstrate it:

pip install pendulum

3.8s

Bash

Creating a Python code cell and importing will result in an error.

import pendulum

0.2s

Python Tutorial (Python)

You see that the installed library is not available within the Python Tutorial runtime.

It is a good idea to lock the cell where we installed the library to not accidentally re-run the cell again. This allows us as well to choose Run all from the run options without worrying that the installation will be triggered again. To lock the cell, just click the “•••” button next to the cell or use Command/Ctrl+Shift+O and choose Lock cell.

Another good tip is to create an Appendix section, and move the code cell where the installation happened there, so that your research notebook stays clean.

What's nice, is that you can export the runtime so that it is also available to other notebooks. Then, if let's say you want to use the installed library in another notebook, you can import the environment by choosing Import environment... in the settings where we choose the environment. This allows us to create notebooks with pre-installed packages, making sure we use the same version, save time, and collaborate better. For more details on how to leverage the power of runtimes, see here.

Technical details

Saving an environment pushes the Docker image to a central registry. You can see the location and name of the image at the bottom of the runtime settings.

By creating a runtime, exporting the environment, and then using this environment as base for other runtimes, we can create hierarchical structures of environments. This means we are able to stack environments on top of each other and build arbitrary complex chains of environments. We can also export the Docker images and use them outside of Nextjournal, e.g on a production server or your local machine.

Working with data

The simplest way to work with a file is to just drag and drop it into the notebook. The second easiest way is to use the Add content button (or, click the “+” button below the block of content you are editing), choose the “File” option, and upload it.

test.csv

1.31 KB

You can easily access the file by reference, just use the shortcut Command/Ctrl + E when you want to fill in a file path, e.g. when reading the csv with pandas.

import pandas as pdpd.read_csv(test.csv
)

0.8s

Python Tutorial (Python)

df = pd.read_csv(test.csv
)

0.5s

Python Tutorial (Python)

In case we want to process a large file and the download takes a while it is a good idea to download the file into the special folder /results and lock the cell. We do this here for the sample.csv file:

wget -P /results https://raw.githubusercontent.com/datapackage-examples/sample-csv/master/sample.csv

1.1s

Bash

If you want to check out the content of the /results directory you won't find anything. E.g. we can use a Bash runtime and just execute ls. Why this directory is empty will become clear when you read the technical details of this section. One major advantage is the automated versioning of the /results directory. That means you can always restore older versions of your dataset and nothing will ever be lost again.

ls /results

0.6s

Bash

However, using the autocompletion Command/Ctrl + E will now display the newly added sample.csv and we can read it.

pd.read_csv(sample.csv)

0.3s

Python Tutorial (Python)

It is a good idea to move the locked cell into the Appendix as well and get it out of the way of the real work.

You can also just simply copy-paste the file from one notebook to another by using Command/Ctrl + C and Command/Ctrl + V which will copy the reference including the name and you can use the file directly in another notebook. It's that easy.

Another approach is to download data from a bucket in S3 or GCS, explained here. You can also access a database using secrets, a detailed description is here.

Technical details

When a file is uploaded or saved to the special /results directory a reference to the file is created and the file is uploaded to a general storage in the cloud. That's why the local directory is empty. That means this file is available even after the notebook was shutdown and restarted. Output files should also be written to the special /results directory. The content that is stored in the special /results directory is automatically versioned, allowing us to restore any previous version of our work. Every file that is written to any other directory will be gone after a restart and needs to be re-created.

But keep in mind that the files stored in /results are read-only and once written cannot be changed.

You can also display the underlying reference by using the bash command echo

echo test.csv

1.0s

Bash

This will display the full path to the file and you can use this path in another notebook. No need to download the file again. But copy-paste is much easier.

Working with functions

Like in any other computational notebook you can define functions to reuse code.

def square(x):  return x * x

0.2s

Python Tutorial (Python)

square(10)

0.2s

Python Tutorial (Python)

But let's say we wrote a more complex function that we would like to use in different notebooks. There is three ways to tackle this and we want to describe each of them.

Installing a GitHub repository

Number one is the standard way of installing a python package via github. Basically just:

pip install git+https://github.com/user/repository

This should install the package and add the files to the filesystem. Keep in mind this should be done in the Libraries environment in order to not install the package after each shutdown.

pip install git+https://github.com/heikoschmidle/demo

4.2s

Python Tutorial (Bash in Python)

pip show demo

1.8s

Python Tutorial (Bash in Python)

from demo.mammals import MammalsMammals().printMembers()

0.4s

Python Tutorial (Python)

Cloning a GitHub repository

The second option is to clone a repository containing the source code and mounting it to the filesystem. This is the better way in case you need to access a private repository and want to use credentials that are stored in the Nextjournal Secrets Storage.

Click the Add Content button (or the “+” button) and choose GitHub Repository. You need to add the name of the repository, and maybe an access token in case the repository is private. You need to choose which runtime should mount the repository.

heikoschmidle/demo2

main

This repo is mounted by: Python Tutorial

Verifying everything worked and is mounted:

ls /demo2

0.7s

Python Tutorial (Bash in Python)

Everything left is to just install the local package via pip:

pip install /demo2

3.0s

Python Tutorial (Bash in Python)

from demo2.mammals import MammalsMammals().printMembers()

0.4s

Python Tutorial (Python)

Now you can use the functions in your notebook. Also in this case it would be great to do the exact same steps within the Libraries runtime, so you don't need to re-run the code after a shutdown. We show this in more detail here.

Code listings and files

The third option is to create a code listing and mount the resulting file to the filesystem. Let's go through an example step-by-step. First, we create a directory and add the obligatory __init__.py to be able to import from there later.

mkdir -p /opt/python_modules/touch /opt/python_modules/__init__.py

0.7s

Python Tutorial (Bash in Python)

Next, we create a code listing by clicking the Add Content button, like before, and choosing Code Listing from the options. We add another class to this source file.

class Birds:    def __init__(self):        ''' Constructor for this class. '''        # Create some member animals        self.members = ['Sparrow', 'Robin', 'Duck', 'No Birds']      def printMembers(self):        print('Printing members of the Birds class')        for member in self.members:           print('\t%s ' % member)

Birds.py

Now we need to mount that file. We can do so by clicking the gear button of the Python Tutorial runtime, choosing Add Mount, and adding the following:

The newly created Birds.py is now available within our mounted directory.

ls /opt/python_modules

1.2s

Python Tutorial (Bash in Python)

In order to import it within out Python runtime we need to add this new directory to the PYTHONPATH. We just click the gear sign next to the runtime and add the Environment Variable PYTHONPATH with the value /opt/python_modules, which is the location where we mounted the code listing, and Python will find it now.

from Birds import Birds

0.2s

Python Tutorial (Python)

Birds().printMembers()

0.4s

Python Tutorial (Python)

Since we mounted that file and it is part of the Docker image, it will stay where it is, even after the runtime shuts down.

Another option of mounting a source code file is to read and execute it within a code cell. Let's create a code listing with a simple function and name it example.py. You can create a code listing by choosing Code Listing from the Add Content button.

def hello_from_listing():  print("Hello from a code listing!")

example.py

Once created, we need to give it a name by clicking on the “•••” button next to it and Assign Name.

We need to mount the file again by clicking the gear button of our Python Tutorial runtime and then Add Mount. Now we have mounted the third file and it looks like this:

We are able to read the source file and execute it by simply doing:

exec(open("example.py").read())hello_from_listing()

0.5s

Python Tutorial (Python)

These are convenient ways to re-use your code in Nextjournal and share your definitions. All mounts to the filesystem by adding repositories, code listings, or files are added to the environment of that runtime. This means exporting the environment will allow other user to import the definitions and functions directly and reuse it.

Don't forget that you always import a specific snapshot of the environment in order to make everything reproducible. If you change the definition of a function or the import of a GitHub repository, you need to Save the changes of the environment and re-import the latest version of that environment in the other notebook.

Summary

This short tutorial introduced the main concepts of Nextjournal, independent of the programming language you want to use. We showed the main features by using Python to demonstrate the functionality.

If you also think the world needs more reproducibility and collaboration, Nextjournal is free for private use, so you can sign-up and start using it right away.

Was there something we missed in our tutorial? Please feel free to remix, add your work and share it with us and the world!

Read more awesome notebooks created in Nextjournal related to Data Science, Machine Learning, Scientific Publishing here.

If you have questions about us, the pricing, or how we can help you to set up Nextjournal for your team, get in touch!

Happy collaborating ❤️

Appendix

Move the code cell of the Libraries runtime here by clicking on the “•••” button next to it and dragging the cell here.

Python Tutorial

Highlights

Sharing

Mixing runtimes

History

Locking cells

Cloud computing

Production code

Exporting

Basic concepts

Technical details

Running Python code

Technical details

Installing libraries

Technical details

Working with data

Technical details

Working with functions

Installing a GitHub repository

Cloning a GitHub repository

Code listings and files

Summary

Appendix

Runtimes (3)