help / Jan 21 2019

Saving Runtime State


1.
Introduction

On Nextjournal the current process state of your runtime can be saved including all currently declared variables and its values, functions, modules, loaded data etc. It stores the whole REPL/Language process and can resume it on the next session, when you continue to work on your article the next time.

It also includes all hidden state, which occurs e.g. when you executed some code parts several times and also allows other authors to continue immediately with your current values and results when the article is remixed.

2.
Enable Saving

Saving runtimes is currently an experimental feature. To enable it, you must append ?checkpoints=1 to the edit url of the article in your browser:

enable saving of runtimes

You can then enable autosaving for each runtime in its settings:

enable autosaving

This will cause Nextjournal to save your runtime's process state and filesystem state automatically before the (idle) runner terminates. For Bash cells it saves the filesystem state only, because they don't have a single, permanently running process.

When executing any cell on that runtime the next time, it will load that saved process state and continue from that state like the runtime was never stopped.

restoring on next start

If the runtime is running, you can also save the state manually…

manual saving

… or manually restore an older state.

manual restore

3.
General Concept

Behind the scenes Nextjournal uses CRIU (Checkpoint/Restore In Userspace) to save and restore the in-memory state of your software. You might know "checkpointing" already from Virtual Machines (VMs), where you can restore the Operating System state from a checkpoint instead of booting it again from scratch.

CRIU does the same, but not for the whole Operating System, but for a single Process, the REPL process in your runtime. While in a VM checkpoint the whole state of the machine is saved, including the state of all (emulated) hardware, a CRIU checkpoint contains only the process state and CRIU must ensure during restoring, that the environment is still the same or must do some adjustments to handle the changed environment.

E.g. it must ensure that no opened files are deleted, that same mount points exist, it must restore shared memory areas, file handles, restore it with a different PID etc.

Because Nextjournal runs your environments in isolated Docker containers, we can restore your runtime without any conflicts with other software and can ensure, that the filesystem state is exactly the same when restoring the checkpoint as it was during checkpointing.

4.
Known Limitations

There are still some states we cannot restore. E.g. we cannot restore opened (TCP) connections, because the other end might be gone and because the connection state is held by the Operating System. We also cannot ensure, that a runtime can be restored if you change the requirements on the CPU hardware, because the checkpoint also includes the CPU state, which differs on differen CPU models and the number of CPUs used.

If you upgrade the RAM requirements for your runtime, you will be most likely able to restore the runtimes state on the more powerful machine and be able to continue from the current state, not for sure if you downgrade the RAM. If you are using GPUs, runtimes can't be saved, because CRIU cant save the GPU state.

But for all other usages, you will be able to easily continue your last session with all its in-memory state, created / opened files, shared memory etc or anybody else which remixes your article. If it cannot be restored, the runtime is booted from scratch.

5.
Example