Julia and Jupyter Notebooks
1. An Introduction to Julia and Jupyter Notebooks
This tutorial is based on a tutorial presented at the 2019 JUmP-dev conference
1.1. Jupyter Notebooks
1.1.1. Jupyter Notebook combines live code, graphics, visualizations, and text in shareable notebooks that run in a web browser
"Data scientists’ computational notebook of choice" - Nature
Jupyter is two parts 1) the interface (what you are looking at now) and 2) the
kernel
, which is the backend that runs the code.Cells can be Markdown or Code.
Jupyter gets it's name from Julia, Python, and R.
Similar to R Markdown (R) or Weave (Julia / Python).
Usage is growing exponentially
Is portable, with Jupyter installed can simply share the
.ipynb
file.Can export into a multitude of formats, and easily convert into presentation.
1.1.2. Installation
Easiest way to install Jupyter is to install Anaconda. Once that's done, run jupyter notebook
to start a notebook server.
1.1.3. Usage
Write code in the [In]
cells, hit Ctrl
+ Enter
to run the code.
A lot of more advanced features (e.g. slideshows, exporting different formats, interactivity, themes, etc).
Most common source of trouble: you can execute cells in any order, but if you restart and Run All
it will execute top to bottom.
1.1.4. Why Julia?
Speed of execution, friendly but very powerful syntax, thoughtful packages, supportive community, flexible typing, unicode, and much more!
Users: BlackRock, Aviva for risk/capital modeling, the Federal Reserve, NASA, CISCO, Conning. Google's AI Brain team and Facebook's AI chief have also shown interest in Julia for being a really great fit for future development
1.1.5. Q: Where does Julia fit?
A: It's a tool in a toolbox.
Here's a subjective comparison of the languages:
"A language that doesn't affect the way you think about programming, is not worth knowing." -- Alan Perlis
1.2. How to Install Julia and Packages
To install Julia, simply download and install from JuliaLang.org
To install packages, open up a terminal and run julia
. Once in the REPL, hit the ]
key to enter into the package manager. From there, it's as simple as typing add MortalityTables
and hitting enter.
1.2.1. Note: to make things faster later on and because of using NextJournal, run this next block to get things set up. Later on, I will re-reference the related packages so you can see when/where they are used
import Pkg # grab the packages not built-in to nextjournal Pkg.add("MortalityTables") # not part of NextJournal's default packages :( Pkg.add("ProgressMeter") Pkg.add("Distributions") using StatsBase, MortalityTables, Plots, ProgressMeter, Dates, BenchmarkTools, Test, CSV, DataFrames, Distributions
1.2.2. Other great references
Learn Julia in Y Minutes a great way to get a quick tour of the language and how to do common things.
Julia Introductory Notebook that gets a bit more into the types, control flow, mutability, etc.
Introduction to Julia Wikibook
Intro to Julia - A Youtube introduction
1.3. Quick how-to
Syntax and basic operations
1 + -2
# Numbers/ Operations # All of the normal infix operators are available. 1 + 1 # => 2 8 - 1 # => 7 10 * 2 # => 20 35 / 5 # => 7.0 10 / 2 # => 5.0 # dividing integers always results in a Float64 div(5, 2) # => 2 # for a truncated result, use div 5 \ 35 # => 7.0 2^2 # => 4 # power, not bitwise xor 12 % 10 # => 2
# Bitwise/Boolean Operators ~true # => false # bitwise not true & false # => false # bitwise and true | false # => true # bitwise or xor(true, true) # => false # bitwise xor
# Boolean operators !true # => false !false # => true 1 == 1 # => true 2 == 1 # => false 1 != 1 # => false 2 != 1 # => true 1 < 10 # => true 1 > 10 # => false 2 <= 2 # => true 2 >= 2 # => true # Comparisons can be chained 1 < 2 < 3 # => true 2 < 3 < 2 # => false
#arrays/lists a = [1,2,3,4,5] letters = ["a","b","c","d","e"] #note that only the last evaluated thing gets 'output' in the notebook
sum(a)
1.3.1. Types
Every object has a DataType
and for the most part you don't need to actively worry about them. Having more Number
types than just one allows for:
specialization (the way to do
Complex
arithmetic is more, well, complex thanReal
)Optimization
Multiple dispatch: cleaner code, smarter design. More on this later!
typeof(1), typeof(1.0), typeof(1+0im), typeof(π) # note the unicode support!
1 + π # julia usually knows how to make different types talk to each other
other common types
typeof(a), typeof([1,2,3]), typeof("I am a string"), typeof(true)
1.3.2. Arrays
b = [2, 3, 4, 5, 6]
using a .
tells julia to "broadcast" the operation across each element
a .+ b # the dot syntax (aka "broadcasting")
Note that Julia's indices run from 1:n
a[1:3]
end
is a keyword that means the last index
a[3:(end-1)]
1.3.3. Dictionaries
A data structure that maps a key
to a value
d = Dict("a" => 1, "b" => 2 )
d["a"]
Iterate over the dictionary by going over each key
and each value
for (key, val) in d key,val end
sum(val for (key, val) in d)
1.3.4. Strings
# Strings are created with " "This is a string."
# Strings are just arrays of characters, so you can do 'array type stuff' with them "This is a string"[end-5:end]
# String can be compared lexicographically "good" > "bye" # => true "good" == "good" # => true
Interpolation
lets you mix code with strings very easily. Julia will evaluate things with $
like code:
today = Dates.today() y = Dates.year(today) m = Dates.month(today) d = Dates.day(today) file_to_export = "output data $y-$m-$d.csv"
Working with filepaths can be tricky because filepaths contain characters like \
that are ambiguous when strings
. You can get around this by escaping charaters with \
path = "data\\input.csv"
or you can tell Julia to interpret the string literally with raw
:
path = raw"data\input.csv"
1.4. A quick tour of cool lanugage features
using MortalityTables, Dates # this is how you use packages const mt = MortalityTables # shortening how we can reference the package tables = mt.Tables() cso = tables["2001 CSO Super Preferred Select and Ultimate - Male Nonsmoker, ANB"]
age = 30 duration = 5 mt.qx(cso,age,duration)
Comprehensions
are really efficient, clean ways of constructing and working with data
qs = [mt.qx(cso,age,dur) for dur in 1:50]
1.5. Macros
Because Julia is homoiconic
, meaning that at a very fundamental level, the code is data. This is extremely powerful and allows some really cool ways to work with your code. A few examples:
1.5.1. Adding progress meters to normal code
Simply use the ProgressMeter
package and the @showprogress
macro
# basic loop that will take 5 seconds for i in 1:5 sleep(1) end
using ProgressMeter for i in 1:5 sleep(1) end
1.5.2. Benchmarking your code
Want to get a sense of how fast or slow your algorithm will run?
using BenchmarkTools,StatsBase # define my own mean function that will return the average of the numbers in the vector function my_mean(vector) sum(vector) / length(vector) end # a test vector tv = rand(1000);
my_mean($tv)
mean($tv)
Also note that here we were able to write Julia code that was just as fast as what's built in! I.e. you can write super fast, optimized code very easily.
1.5.2.1. Quick comparison with R:
Script:
install.packages(c("microbenchmark"));
library(microbenchmark) my_mean <- function(x) { return(sum(x) / length(x)) } x <- runif(1000) # 1000 unifrom random numbers mbm <- microbenchmark(my_mean(x)) print(mbm) mbm2 <- microbenchmark(mean(x)) print(mbm2)
On my home computer, the results were:
1.5.3. Testing your code
Whether you are just wanting to make sure that something works the way you expect, or you are incorporating the philosophy of Test Driven Development, Julia makes it really easy.
using Test 1 == 1
"simple math" begin 1 == 1 1 != 2 end
1.5.3.1. Showing code
This one is pretty simple, but sometimes it's nicer to show the full line and the result of the code, so there is the @show
macro. This is great for debugging, summarizing, etc.
x = rand(10); # this will show the result even though I've supressed the normal Jupyter output with `;`
1.6. Plotting
Plotting is easy with 'smart' packages like Plots.jl
which try and make plotting and simple and straightforward as possibe, or give you full control with Grammar of Graphics type of syntax with Gadfly.jl
using Plots plot(qs) # the first time you try to plot something, the computer is compiling the package. Subsequent runs are very fast.
1.7. Object-oriented thinking
1.7.1. Making the problem easier with Objects
The following are examples of how to use structs
which you can think of as objects that you can manipulate/compute things with using functions
.
The 'problem' here is looking at how to use object-oriented design to make working with and performing analysis on things easier. In the example, we are going to look at very simple cases with Life insurance as the context. There will be people (Lives
) who own policies (Policy
). The insurance company shares some of the risk associated with the policy by sharing it with other companies via cessions (Cession
) that keep track of the other companies' responsibility.
struct SimpleCession # a cession object face_ceded # the amount of face ceded end
struct SimplePolicy # a simple policy object face # the face amount issue_age # the issue age cessions # an array of cessions end
cessions = [SimpleCession(2.5e5),SimpleCession(2.5e5)] # two 250k cessions pol1 = SimplePolicy(2e6,55,cessions) # a 2m policy
Let's calculate how much this policy has ceded. By having defined the problem context the way we have, we can almost use natural language to express the problem
#the amount that pol1 has ceded to other companies sum(cess.face_ceded for cess in pol1.cessions)
Now let's make it a function to make it more general than just pol1
and calcualte how much the policy has retained
function retained(pol::SimplePolicy) return pol.face - sum(cess.face_ceded for cess in pol.cessions) end
retained(pol1)
Taking it to the next level with abstract types, dictionaries, and multiple dispatch!
abstract type Cession end # create the 'idea' or abstract reprsentation of a Cession # another type of cession object struct Ceded <: Cession # a cession object face # the amount of face ceded end # another type of cession object struct Assumed <: Cession face # the amount of face assumed end struct Policy # a simple policy object face # the face amount issue_age # the age cessions # an array of cessions end struct Life # represents a person policies end
for each of these objects, let's define what "retained" means
function retained(cess::Ceded) return -cess.face end function retained(cess::Assumed) return cess.face end function retained(pol::Policy) return pol.face - sum(cess.face for (treaty, cess) in pol.cessions) #same as before end function retained(l::Life) return sum(retained(pol) for (id, pol) in l.policies) end
Now use a dictionary to make the relationships more meaningful and create the objects
# two 1m cessions and a 500k assumed cession cessions1 = Dict( "REINS2019" => Ceded(1e6), "REINS2018" => Ceded(1e6), "ASSUMED2015" => Assumed(5e5) ) # one 3m cession cessions2 = Dict("REINS2019" => Ceded(3e6)) # one 5m policy and one 10m policy policies = Dict( "1" => Policy(5e6,30,cessions1), "2" => Policy(10e6,30,cessions2) ) # one life with the two policies alice = Life(policies)
Now let's do some analysis!
# how much does Alice have retained? retained(alice)
# how much does each policy have retained? Dict(pol_id => retained(pol) for (pol_id, pol) in alice.policies)
# how much have we ceded to reinsurer "REINS2019" ? ceded = 0.0 for (p_id, pol) in alice.policies for (c_id, cess) in pol.cessions if c_id == "REINS2019" ceded += cess.face end end end ceded
1.7.1.1. Final note on object-orientation
This is on the surface very similar to Object-Oriented languages, but because of struct
s, function
s and multiple dispatch, Julia's approach encourages separation of data (aka objects) and functions (methods in Object-Oriented-speak). This guide gets into a lot more of the details of what I mean.
1.8. Helping you when things go inevitably wrong
Error messages are helpful and point you to where something goes wrong.
You can see what the computer thinks of your code with the
@code_llvm
macroBuilt-in documentation reference with
?
Of these, I'm only going to show the last two right now.
1.8.1. Helpful error messages
To demonstrate, let's do something wrong with the objects we were working with earlier.
function get_cession_count(l::Life) total = 0 for (id,pol) in l.policies for (id,cess) in pol.cession # this will error because it should be `cessions` (plural) total += 1 end end return total end
get_cession_count(alice)
Julia helpfully <span style="background-color: #64FE2E">tells you what went wrong</span>, <span style="background-color: #2ECCFA">what file/cell it went wrong in</span>, and <span style="background-color: #FFFF00">what line it was</span>.
1.8.2. Getting help
Type a Julia type/object/function/etc to get help on it. It will show you similarly named functions and give you the help text and examples for most things.
?mean
1.9. Dataframes and Data IO
using CSV, DataFrames, Distributions ENV["COLUMNS"] = 1000; # show 1000 characters-worth of columns in tables
Load the data from the file, normalize the names (e.g. eliminate whitespace, weird characters), and then "pipe" into a datafame
df = CSV.read(Experience Data SOA Cred Example.csv, normalizenames=true) |> DataFrame
df.Exposure_by_Amount = df.Face_Amount .* df.Exposure_by_Count; # note that the dot ("broadcasting") syntax translates here
# define a function to give the percentile for a given distribution and a confidence level (with default of 95%) estimate_interval(d,p=0.95) = quantile(d,1-(1+p)/2),quantile(d,(1+p)/2)
It's really easy to apply the Split-Apply-Combine strategy in Julia. Note the syntax :Duration
- here Duration is a Symbol
. Here we aren't asking to do things by passing a datafame a string, rather we are trying to do stuff with the data that's connected to the symbol.
# create a new variable called df2 that's the result of creating a subdataframe of `df` grouped by Duration df2 = by(df, [:Duration]) do subdf # mean = Σ(amount of claims) / Σ(Exposure by Amount) q̂ = sum(subdf.Amount_of_Claims) / sum(subdf.Exposure_by_Amount) # variance = Σ(exposure²) / Σ(exposure)² * q̂ * (1-q̂) σ² = sum(subdf.Exposure_by_Amount .^ 2) / sum(subdf.Exposure_by_Amount) ^ 2 * q̂ * (1 - q̂) # sum of the count n_claims = sum(subdf.Number_of_Claims) # create a confidence interval using a Normal distribution and given mean and variance conf_int = estimate_interval(Normal(q̂, sqrt(σ²))) (q = q̂,var = σ², se = sqrt(σ²),ci_05 = conf_int[1],ci_95 = conf_int[2], n = n_claims) end sort!(df2)
2. This is not the end
While this is the end of the lunch and learn, below is more about different parts of Julia and worth going through. It's taken from the same 2019 JUmP-dev conference mentioned earlier.
2.1. Vectors, Matrices and Arrays
Similar to Matlab, Julia has native support for vectors, matrices and tensors; all of which are represented by arrays of different dimensions.
Vectors are constructed by comma-separated elements surrounded by square brackets:
b = [5, 6]
Matrices can by constructed with spaces separating the columns, and semicolons separating the rows:
A = [1 2; 3 4]
We can do linear algebra:
x = A \ b
A * x
A * x == b
Note that when multiplying vectors and matrices, dimensions matter. For example, you can't multiply a vector by a vector:
b * b
But multiplying transposes works:
b' * b b * b';
2.2. Tuples
Julia makes extensive use of a simple data structure called Tuples. Tuples are immutable collections of values.
For example,
t = ("hello", 1.2, :foo)
typeof(t)
Tuples can be accessed by index, similar to arrays,
t[2]
And can be "unpacked" like so,
a, b, c = t b
The values can also be given names, which is a convenient way of making light-weight data structures.
t = (word="hello", num=1.2, sym=:foo)
Then values can be accessed using a dot syntax,
t.word
2.3. Dictionaries
Similar to Python, Julia has native support for dictionaries. Dictionaries provide a very generic way of mapping keys to values. For example, a map of integers to strings,
d1 = Dict(1 => "A", 2 => "B", 4 => "D")
Looking up a values uses the bracket syntax,
d1[2]
Dictionaries support non-integer keys and can mix data types,
Dict("A" => 1, "B" => 2.5, "D" => 2 - 3im)
Dictionaries can be nested
d2 = Dict("A" => 1, "B" => 2, "D" => Dict(:foo => 3, :bar => 4))
d2["B"]
d2["D"][:foo]
2.4. For-Each Loops
Julia has native support for for-each style loops with the syntax for <value> in <collection> end
.
for i in 1:5 println(i) end
for i in [1.2, 2.3, 3.4, 4.5, 5.6] println(i) end
This for-each loop also works with dictionaries.
for (key, value) in Dict("A" => 1, "B" => 2.5, "D" => 2 - 3im) println("$key: $value") end
Note that in contrast to vector languages like Matlab and R, loops do not result in a significant performance degradation in Julia.
2.5. Control Flow
Julia control flow is similar to Matlab, using the keywords if-elseif-else-end
, and the logical operators ||
and &&
for or and and respectively.
i = 10 for i in 0:3:15 if i < 5 println("$(i) is less than 5") elseif i < 10 println("$(i) is less than 10") else if i == 10 println("the value is 10") else println("$(i) is bigger than 10") end end end
2.6. Comprehensions
Similar to languages like Haskell and Python, Julia supports the use of simple loops in the construction of arrays and dictionaries, called comprehenions.
A list of increasing integers,
[i for i in 1:5]
Matrices can be built by including multiple indices,
[i*j for i in 1:5, j in 5:10]
Conditional statements can be used to filter out some values,
[i for i in 1:10 if i%2 == 1]
A similar syntax can be used for building dictionaries
Dict("$i" => i for i in 1:10 if i%2 == 1)
2.7. Functions
A simple function is defined as follows,
function print_hello() println("hello") end print_hello()
Arguments can be added to a function,
function print_it(x) println(x) end print_it("hello") print_it(1.234) print_it(:my_id)
Optional keyword arguments are also possible
function print_it(x; prefix="value:") println("$(prefix) $x") end print_it(1.234) print_it(1.234, prefix="val:")
The keyword return
is used to specify the return values of a function.
function mult(x; y=2.0) return x * y end mult(4.0)
mult(4.0, y=5.0)
2.8. Other notes on types
Usually, specifing types is not required to use Julia. However, it can be helpful to understand the basics of Julia types for debugging.
For example this list has a type of Array{Int64,1}
indicating that it is a one dimensional array of integer values.
[1, 5, -2, 7]
In this example, the decimal values lead to a one dimensional array of floating point values, i.e. Array{Float64,1}
. Notice that the integer 7
is promoted to a Float64
, because all elements in the array need share a common type.
[1.0, 5.2, -2.1, 7]
2.9. Mutable vs immutable objects
Some types in Julia are mutable, which means you can change the values inside them. A good example is an array. You can modify the contents of an array without having to make a new array.
In contrast, types like Float64
are immutable. You can't modify the contents of a Float64
.
This is something to be aware of when passing types into functions. For example:
function mutability_example(mutable_type::Vector{Int}, immutable_type::Int) mutable_type[1] += 1 immutable_type += 1 return end mutable_type = [1, 2, 3] immutable_type = 1 mutability_example(mutable_type, immutable_type) println("mutable_type: $(mutable_type)") println("immutable_type: $(immutable_type)")
Because Vector{Int}
is a mutable type, modifying the variable inside the function changed the value outside of the function. In constrast, the change to immutable_type
didn't modify the value outside the function.
You can check mutability with the isimmutable
function.
isimmutable([1, 2, 3]) isimmutable(1);
3. Why is Julia fast?
Answer: it compiles a different version of each function depending on the type of the arguments.
Corollary: put code inside functions!
foobar(x) = x + x
Calling foobar
with an Int64
compiles a method that calls add_int
.
foobar(1)
Calling foobar
with a Float64
compiles a method that calls add_float
foobar(1.0)
3.1. Using Packages and the Package Manager
No matter how wonderful Julia's base language is, at some point you will want to use an extension package. Some of these are built-in, for example random number generation is available in the Random
package in the standard library. These packages are loaded with the commands using
and import
.
using Random [rand() for i in 1:10]
The Package Manager is used to install packages that are not part of Julia's standard library.
For example the following can be used to install JuMP,
using Pkg
Pkg.add("JuMP")
For a complete list of registed Julia packages see the package listing at https://pkg.julialang.org/.
From time to you may wish to use a Julia package that is not registered. In this case a git repository URL can be used to install the package.
using Pkg
Pkg.add("https://github.com/user-name/MyPackage.jl.git")
Note that for clarity this example uses the package manager Pkg
. Julia 1.0 includes an interactive package manager that can be accessed using ]
. This video gives an overview of using the interactive package manager environment.
The state of installed packages can also be saved in two files: Project.toml
and Manifest.toml
. If these files are stored in the same directory than a notebook, the state of the packages can be recovered by running
import Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()
We will use this in the JuMP section of the tutorial.
3.2. HELP!
3.3. Some Common Gotchas
3.3.1. MethodError
A common error in Julia is MethodError
, which indicates that the function is not defined for the given value. For example, by default the ceil
function is not defined for complex numbers. The "closest candidates" list suggest some Julia types that the function is defined for.
ceil(1.2 + 2.3im)