diff: Nextjournal and Jupyter

While Nextjournal can run Jupyter notebooks, it is based on a fundamentally different technology to solve several of the issues that notebook authors encounter.

Fewer Issues With Dependencies

📋 Summary

Nextjournal allows you to build reliable, reusable, and reproducible environments with Bash - just like you would on your local machine. Unlike Jupyter, there is no kernel middleman, no magic commands, and no installation workarounds.

👀 Take a Look

Environments are reusable, reproducible containers that include everything necessary to run valid code. This notebook (re)uses Nextjournal's default Python 3 environment.

import platform; platform.python_version()
'3.6.8'

Environments are configured using /bin/bash, just as you would on your local machine.

pip install --no-deps haishoku

Jupyter users should note: there is no kernel mediating the Bash commands in these cells. This solves two immediate issues that arise in Jupyter notebooks.

  1. The notebooks are not dependent on the system's version of Python. Use any version of Python you want. Upgrade the system without fear of breaking an unrelated notebook or its dependencies.
  2. Package installation is simplified. On any given computer, a Jupyter kernel's python executable not always the same as the command line's python executable. Related dependency errors are difficult to trace.

A more detailed explanation can be found in Which Python Do You Mean?.

Nextjournal environments offer two other significant advantages over traditional Jupyter workflows.

  1. Dependencies are automatically version controlled. Therefore, previous computational environments are reproducible.
  2. Sharing a notebook includes the computational environment. Dependency updates are propagated wherever the notebook is run.

This is true even when running supported Jupyter kernels on Nextjournal. Imported Jupyter notebooks can be run without modification, including custom Magics. For example, here is a Jupyter IPython kernel using the %%bash Magics to run bash commands:

%%bash
jupyter --version

👆 Voilà. An instantly reproducible, version controlled Jupyter notebook that is easily shared with peers and readers.

Version Control Everything, Automatically

📋 Summary

Nextjournal boasts automatic, synchronous version control across all code, commentary, and data hosted on the platform. All changes are recorded. Go back in time, anytime.

Jupyter wasn't designed with version control in mind. Several libraries and extensions exist to mitigate this legacy, many of which use Git. No single solution is comprehensive across code, commentary, data, and computational environment.

👀 Details

Version control impacts every stage of research - from early collaborative efforts to peer reviewing published work. Nextjournal automatically versions notebooks in a few separate ways to provide the most flexibility.

This architecture ensures a notebook's reproducibility and provides the flexibility needed to reuse components between notebooks.

Most Jupyter version control schemes use existing systems like Git, which is already a poor choice for versioning data and computational environments. But even the core .ipynb notebook requires additional considerations to play nice with online VCS offerings.

A deep dive into the problem is beyond the scope of this notebook. How to Version Control Jupyter Notebooks offers a more comprehensive look at the issue and each solution's various tradeoffs. However, a simple notebook, simple-nb.ipynb, will help illustrate why this is such a problem.

simple-nb.ipynb

Search the notebook for any images to reveal how they are stored.

grep -nri "image" 
simple-nb.ipynb
> /tmp/notebook-image.txt fold -s -w80 /tmp/notebook-image.txt

Binary blobs such as these don't tell the user anything about the information being committed to a code repository. Convert the .ipynb file to .html to render it in this Nextjournal notebook.

jupyter nbconvert 
simple-nb.ipynb
--output-dir="/tmp" --to="html" cp /tmp/simple-nb.html /results/simple-nb.html

👆 That's what information should look like! The image is actually a plot of a sine wave generated by Python.

Every change to every plot is recorded in Nextjournal's history and it is reviewed visually, taking full advantage of the notebook format. As is all data stored on Nextjournal used to generate the plot. As is the computational environment used to execute the plot's code.

Collaborate on Your Work in Your Notebook

Real Time Collaboration and Fully Reproducible Environments

Sharing a notebook means sharing the computational environment as well as the data needed to run. If the data is stored somewhere other than Nextjournal, the platform makes it easy to share keys with other members of the team.

Real Time Teamwork

Nextjournal offers a full set of collaboration tools. Invite people to help edit your work and collaborate in real-time in the notebook.

Computational environments and data travel with the notebook. Public or private data databases and repositories hosted outside of Nextjournal are explicitly mounted within the notebook. For example, a S3 bucket:

nextjournal-s3-demo
Public

And a GitHub repository:

All this can be accomplished on a per-notebook basis or by creating a group. Groups allow the added benefit of reuse, private group databases and repositories, and the ability to publish under a single group profile.

Share Secrets With Colleagues

Nextjournal stores your secrets in a fully encrypted in a vault separate from your notebook. Stored secrets can be referenced in notebooks, in runtime environment variables, and shared between your collaborators. Group management makes it even easier to setup and continually share access with select collaborators.

Jupyter

Collaboration is not a core component of the original Jupyter project. Most collaborative aspects are managed through version control using external tools like GitHub.

JupyterHub is an initiative to host Jupyter notebooks for multiple users on a remote server (or collection of servers). It requires considerable investment to setup, so distributions have been created to ease the process.

Platforms such as CoCalc have been built on top of JupyterHub and offers integrated version control and a chat window, but not Google Docs-style real-time collaboration.

The JupyterHub roadmap suggests that real-time collaboration is being considered but not in the immediate future.

Further friction is created when attempting to share data or recreate computational environments. The latter will be explored later in this notebook.

No More Compute Configuration

Single Click Allocation and Scripted Pipelines

Resourcing

Nextjournal runs on a fully managed cloud computing infrastructure — no setup or maintenance required. Compute resources can be fully customized (machine types, number of instances, etc...) with full GPU support.

Creating a new cell using Nextjournal's standard PyTorch environment is simple:

import platform, torch

print("This environment runs PyTorch version {0} on {1} {2}".format(torch.__version__, torch.cuda.device_count(), torch.cuda.get_device_name(0)))

At this point, it's also worth noting that the PyTorch runtime joins NJ, the Nextjournal Python default runtime, and the Jupyter runtime in this notebook. All were added with just a few clicks.

New PyTorch, Tensorflow, TFLearn, and Keras notebooks can be created using Nextjournal's single click defaults.

Pipelines

Allocating computational resources is an important part in many data-driven pipelines. For example, after all data and environments are in place, a computationally intense Bash script running on a GPU can feed an R cell for plotting:

python -c 'import torch; print(torch.rand(3,3).cuda())' > /results/big-process.txt
empty

Now to pipe to R

import torch;
import matplotlib.pyplot as plt

x = (torch.rand(100,100).cuda())

# torch.rand(100,100).cuda();

def showTensor(aTensor):
    plt.figure()
    plt.imshow(aTensor.cpu().numpy())
    plt.colorbar()
    plt.show()
    plt.savefig("/results/test.png")
    
showTensor(x);

Data flow is simplified by Nextjournal's integration of data sources and computational environments within the notebook. The result is an easier time spawning new runtimes and moving data between them. It is all available from the Nextjournal GUI - no extensions or extra notebook installations required.

Jupyter

Jupyter can be configured to work with many cloud compute providers. Like the other examples on this list, it requires significant configuration to work.

Generally speaking, the steps include choosing and configuring a provider like Amazon or Google, configuring a local machine to work with the provider, installing a remote Jupyter instance and required packages, generating and configuring security certificates, configuring Jupyter to work with the GPU and ensuring proper security restrictions are in place, and finally installing the latest versions of your chosen libraries.

Pipelines starting from Bash can be tricky with the requirement of special Jupyter Magic commands. A provisioned data source that works well on one machine may need to be reconfigured at an indefinite point in the future or when run on another machine. The same is true for the computational environment on which the pipeline depends.

Edit and Share Your Work in One Place

Technical Problems? Ask For Help and Collaborate on Drafts

Create your project in our state-of-the-art editor which includes code auto-completion and language-specific documentation. If you get stuck on an error, use the Ask for help button.

Generate, share (and revoke) secret links to your working drafts for review and collaboration. Once ready, publish versions under your Nextjournal profile on a permanent URL.

Jupyter

Jupyter notebooks are automatically rendered in GitHub repositories and nbviewer remains a popular way to share notebooks online. If the reader wants to interact with these notebooks, they will have to download the .ipynb file and install all dependencies or run it on a cloud service like Binder.

Reproducibility With a Single Click

  • Common Jupyter solutions create a plain text file that points to the source repositories, which will need to be downloaded on a new computer. Binary storage and version control require additional tools.
  • Common Jupyter cloud solutions do not offer robust version control

Instant Reproducibility and Component Reusability

Click the remix button to create an instant copy of the notebook including all dependencies down to the operating system:

The duplicate can be explored, edited, and run without a single extra step of configuration or installation.

Remix relies on Nextjournal's synchronous version control across all code, commentary, and data. Changes to the remix do not affect the original; changes to the original do not affect the remix. The original author automatically retains all attribution.

Environments can be reused on any computer with a web browser or Docker installed locally.

civisanalytics/civis-jupyter-python3
Download as Docker image from:
Copy
This image was imported from: latest

Environments can be built once and reused indefinitely. In fact, the Nextjournal team depends on the reproducible nature of our notebooks to build our default environments for Python, R, Julia, and Clojure. They are created like any other article with all the same benefits from remixing.

Jupyter

When reproducing results from a Jupyter notebook, the notebook file (.ipynb), runtime environment, and data must all be available. Jupyter cloud solutions can help smooth this process, but there are drawbacks. This notebook will continue to look at the core Jupyter experience, which will provide insight into how the cloud services actually work.

Export

Conda offers a number of command line utilities for managing environments. This is essential for building the runtime environment for the .ipynb file. The simplest, conda env export, will prepare a plain text file that can be used to build the environment on another computer.

!conda env export -n base -q > /results/environment.yml
empty

Import

The following cell takes the Jupyter environment and imports it into the NJ runtime.

cat 
environment.yml
> env.yml # Move the YAML file to another part of the system conda env create -p /opt/conda2 -f env.yml

There are multiple commands and configuration options when working with Jupyter environments that Nextjournal automates. Keep in mind:

  • When running a project on a new machine, conda env create will download all dependencies again. If there is an issue with the repository at some point in the future, the environment will not be easily reproducible.
  • Conda cannot guarantee package parity between operating systems - Linux, macOS, and Windows. This is not an issue with Nextjournal.

conda list --explicit will download an identical set of dependencies on a second computer. Note the output of the process indicates the dependency stack is for linux-64. The software spec-file.txt downloads will not be compatible with colleagues using Windows or macOS.

%%bash
conda list --explicit > spec-file.txt
head spec-file.txt

If a repository moves or disappears before spec-file.txt is referenced, the only solution will be to find the dependency using some other method.

If you use GitHub for version control, it is possible to use yet another piece of software called Binder to turn a Git repo into a Jupyter notebook running in the browser able to reproduce results. However, it does not offer any security features and offers no direct version control integration.

Conclusion

Nextjournal was built for researchers, journalists, and scientists who want to focus on their work. The hours spent configuring a working system are better spent elsewhere. Best of all, if the convenience of Nextjournal is not self-evident after using the platform, there is no lock in. It's easy to import existing Jupyter/iPython, RMarkdown or Markdown notebooks and export any Nextjournal notebook to Markdown.