Why I love Julia

Languages and the communities around them take a looooong time to mature. Especially ones that are centered on concepts that are not mainstream yet.

A timeline of some major programming languages

First public appearance of many more programming languages

Julia 1.0 was released Aug. 2018, and is seeing explosive growth. This notebook is an attempt at explaining why.

Walks like Python

Julia will feel familiar to Python users:

d = Dict()

for i in 1:26
  c = Char('a' + i - 1)
  d[c] = i
end

d

2.4s

Julia

Dict{Any,Any} with 26 entries: 'n' => 14 'f' => 6 'w' => 23 'd' => 4 'e' => 5 'o' => 15 'h' => 8 'j' => 10 'i' => 9 'k' => 11 'r' => 18 's' => 19 't' => 20 'q' => 17 'y' => 25 'a' => 1 'c' => 3 'p' => 16 'm' => 13 ⋮ => ⋮

for (k, v) in d
  println(k => v)
end

1.1s

Julia

for k in keys(d)
  if k ≤ 'c'
    println(k)
  end
end

0.7s

Julia

(notice the unicode character; type the latex command "\le" and tab complete.)

Comprehensions

l = [ 2 * x for x in values(d) if x < 4 ]

1.4s

Julia

3-element Array{Int64,1}: 2 6 4

would have been d.values() in python

gen = ( 2 * x for x in values(d) if x ≤ 3 )

1.0s

Julia

Generator{Filter{##8#10,ValueIterator{Dict{Any,Any}}},##7#9}(##7#9(), Filter{##8#10,ValueIterator{Dict{Any,Any}}}(##8#10(), Any[14, 6, 23, 4, 5, 15, 8, 10, 9, 11 … 1, 3, 16, 13, 26, 7, 22, 12, 21, 24, 2]))

l = collect(gen)

0.5s

Julia

3-element Array{Int64,1}: 2 6 4

python equivalent is list(gen)

l, r = zip(  ( (abs(x), -abs(x)) for x in values(d) if x ≤ 4 )... )
#collect(l)

0.8s

Julia

Zip{NTuple{4,Tuple{Int64,Int64}}}(((4, -4), (1, -1), (3, -3), (2, -2)))

Sets

u = union( Set(l), Set(r) )

0.9s

Julia

Set([4, -4, -3, 2, 3, -2, -1, 1])

u = Set(r) ∪ Set(l)

0.2s

Julia

Set([4, -4, -3, 2, -2, -1, 3, 1])

Defaults and keyword args in functions

function myfunc2(required, optional=0, optional2=7; mykwarg="6")
    println(optional * 2)
end

myfunc2("unused", 21, mykwarg="???")

0.6s

Julia

Functional tools

unioned(ls) = reduce( ∪ , map( Set, ls ) )

unioned([l, r])

0.6s

Julia

Set([4, -4, -3, 2, 3, -2, -1, 1])

and also...

dynamic / duck typing
gentleman's interfaces, by convention not contract
introspection / meta-programming (more than in Python actually)

Python constructs not in Julia

generator functions (yield statements)
classes / objects
here's a list of mostly syntactic differences

Runs like C

Under the hood, Julia generates native code that is fast.

quadratic(a, sqr_term, b) = (-b + sqr_term) / 2a

0.9s

Julia

quadratic (generic function with 1 method)

JIT-compiles native code for every tuple of arg types

@code_warntype quadratic(42.0, 42.0, 42.0)

3.1s

Julia

At JIT-compile time, all type information is available. There is no type erasure or distinction between compile time and runtime types.

Statically compiled languages need interfaces to be defined in advance for 2 modules to work together. Not so with Julia.

In most use cases of statically compiled languages, modules are separate units of compilation. Module A is typically compiled separately from module B, so the compiler cannot do optimizations knowing data types and function definitions in both modules. Julia can use all the type information to generate aggressively optimized native code.

@code_llvm quadratic(42.0, 42.0, 42.0)

0.7s

Julia

@code_llvm quadratic(42, 42, 42)

0.4s

Julia

@code_native quadratic(42, 42.0, 42.0)

0.6s

Julia

loops are fast

naively-written for loop to sum over elements of an array

function summed(a)
	result = 0
  for x in a
    result += x
  end
  return result
end

0.6s

Julia

summed (generic function with 1 method)

using BenchmarkTools
using Random

0.4s

Julia

@btime evaluates an expression multiple times until timing noise is averaged over

a = randn(42000)
@btime summed(a)

8.7s

Julia

200.737

Meanwhile in Python land...

def summed(a):
  result = 0
  for x in a:
    result += x
  return result

import timeit
import random

a = [ random.random() for _ in range(42000) ]
n = 500
t = timeit.timeit(lambda: summed(a), number=n) / n

print("  {:.3} seconds".format(t))

1.5s

Python

...or more realistically numpy land...

import numpy as np

a = np.random.randn(42000)
n = 1000
t = timeit.timeit(lambda: a.sum(), number=n) / n

print("  {:.3} seconds".format(t))

0.4s

Python

Our naive julia loop runs about as fast as numpy.

We can easily speed up our loop by letting the compiler know that the operations in the loop are associative so they are amenable to SIMD parallelism:

function sumsimd(a)
	result = zero(eltype(a))
  @simd for x in a
    result += x
  end
  return result
end

a = randn(42000)
@btime sumsimd(a)

7.1s

Julia

-6.61857

Significantly faster than numpy, and it's a simple, legible, element-type-agnostic loop!

Here's Julia's impl Base.sum, which uses a recursive binary divide-and-conquer strategy to reduce floating point approximation errors and is almost as fast:

# Base.sum uses @simd via Base.mapreduce(f, a::AbstractArray)
a = randn(42000)
@btime sum(a)

6.8s

Julia

62.1939

take a look at the actual impl https://github.com/JuliaLang/julia/blob/v1.2.0/base/reduce.jl#L155

structs like C

isbits types

any immutable type that contains no references to other values
primitive types like UInt8 and Float64 are isbits
immutable structs and tuples of isbits types are isbits
isbits types have a defined layout compatible with C
arrays of isbits types are stack-allocated, not on the heap

struct YupIsBits
  a::Int
  b::Tuple{Float64, UInt8}
end

yup = YupIsBits(23, (1.5, 42))

@show isbits(yup)

0.8s

Julia

true

Not surprisingly, Julia can call C functions directly, without any boilerplate or APIs.

struct NopeNotIsBits
  bar
  baz::Int
  qux::Float64
end

nope = NopeNotIsBits("Hello, world.", 23, 1.5)

@show isbits(nope)

0.5s

Julia

false

Random related notes:

Julia lets you declare your own primitive types
values of isbits types can be used as type parameters

Here's a parameterized immutable struct:

struct Point{T <: Number}
  x::T
  y::T
end

0.2s

Julia

Structs are invariant in their type parameters, because they have different memory layouts:

Float64 <: Real

0.2s

Julia

true

Point{Float64} <: Point{Real}

0.2s

Julia

false

Getting to know Julia part 1

JIT, multiple-dispatch, rich runtime type system

multi-dimensional arrays, meta-program like it's Lisp, fusing broadcast

multiple dispatch

h(x::Number, y::Number) = x

h(x::Float64, y::Float64) = x + y

h(42, 42.0)

0.2s

Julia

h("uh-oh", 0)

1.7s

Julia

# generic fallback
h(x,y) = println("Whoa there, Nelly.")

h("ah-ha", 1)

0.4s

Julia

methods(h)

1.0s

Julia

3 methods for generic function h:

h(x::Float64, y::Float64) in Main at cell:3
h(x::Number, y::Number) in Main at cell:1
h(x, y) in Main at cell:2

g(x::Float64, y) = 2x + y

g(x, y::Float64) = x + 2y

g(2.0, 3)

0.3s

Julia

7.0

g(2.0, 2.0)

0.8s

Julia

multiple-dispatch is used everywhere

methods(+)

0.7s

Julia

( ...161 methods, expand the view of the result above on the left to see them all... )

same_type(a, b) = false
same_type(a::T, b::T) where T = true

0.6s

Julia

same_type (generic function with 2 methods)

same_type(42.0, -42.0)

0.2s

Julia

true

same_type("nope", 0)

0.2s

Julia

false

rich runtime types

# ( this cell is a code listing that would throw errors if run, because it re-defines types defined in Julia's Base )


# abstract types cannot be instantiated
# but they are very useful in combination with multiple dispatch
abstract type Number end
abstract type Real     <: Number end
abstract type AbstractFloat <: Real end
abstract type Integer  <: Real end
abstract type Signed   <: Integer end
abstract type Unsigned <: Integer end
# and they can carry around info in their type parameters

# A primitive type is a concrete type whose data consists of plain old bits.
primitive type Float16 <: AbstractFloat 16 end
primitive type Float32 <: AbstractFloat 32 end
primitive type Float64 <: AbstractFloat 64 end

primitive type Bool <: Integer 8 end
primitive type Char <: AbstractChar 32 end
# ...

#################################################################
#
#   we've seen immutable structs and parameterized structs above
#
#################################################################

# mutable structs
mutable struct MyMut
  a
  b::Int
end

# type Unions
Union{Int,AbstractString}

# parametric abstract types
abstract type Pointy{T} end

# Tuple types are covariant in their type params
Tuple{Int, AbstractString} <: Tuple{Real,Any}       # true

# Also, UnionAll types, Vararg tuple types, NamedTuples, singleton types, parametric primitive types, type aliases...

# btw: any isbits value can be a Value type or type parameter

# example: AbstractArray{T, N}

AbstractArray{Float64, 3}  # 3rd order tensor with elements of type Float64

0.5s

Julia

AbstractArray{Float64,3}

Types are runtime constructs: no type erasure, complete access to type info at runtime!

@show isa(1, Int)

@show typeof( 4 // 3 )

@show typeof(Rational{Int})

@show typeof(Union)

@show typeof(DataType)

@show supertype(Float64)

@show supertype(Number)

@show supertype(Any)

@show subtypes(Number)

1.2s

Julia

2-element Array{Any,1}: Complex Real

What's the big deal? the Expression problem

A programming language should allow separate developers to:

extend each others' datatypes
write new functions over existing datatypes
maintaining good performance => NO python duck types / Ruby open classes
maintaining developer sanity => NO type classes or CoProducts of functors
big bonus if you can extend datatypes and functions without requiring developers of different packages to opting in or coordinate in advance

(requirements 3 and 4 were added to the original formulation of the problem because if they do not hold, developers will avoid using the language features that allow you to do the first 2 for any significant functionality, they would reach for constructs that are fast and sane instead)

Julia satisfies all these requirements, including the bonus. I'm curious to know if there are other languages out there that do the same for some reasonable notion of 'fast' and 'sane'. Shoot me a line if you find one :)

Here's an example taken from The unreasonable effectiveness of multiple dispatch https://www.juliaopt.org/meetings/santiago2019/slides/stefan_karpinski.pdf

start with a generic method

using LinearAlgebra

function f(A, vs)
  t = zero(eltype(A))
  for v in vs
    t += inner(v, A, v)         # <= multiple dispatch
  end
  return t
end

inner(v, A, w) = dot(v, A*w)    # generic fallback defintion

0.6s

Julia

inner (generic function with 1 method)

A = randn(42, 42)
vs = [ randn(42) for _ in 1:10 ]

@btime f(A, vs)

5.3s

Julia

-70.9659

define a new datatype

Now another developer wants to define a new datatype that will work with this generic method. And they want it to run fast.

import Base: size, getindex, *

struct OneHotVector <: AbstractVector{Bool}
 len :: Int
 ind :: Int
end

# define minimal set of AbstractVector methods
size(v::OneHotVector) = (v.len,)
getindex(v::OneHotVector, i::Integer) = i == v.ind

0.6s

Julia

getindex (generic function with 204 methods)

onehot = OneHotVector(42000, 42)

@show isbits(onehot)
dump(onehot)

1.0s

Julia

vs = [ OneHotVector(42, i) for i in 1:10 ]

@btime f(A, vs)

7.1s

Julia

-3.13214

Works! But this is using generic fallbacks for * and dot that are slow:

The only methods defined on our OneHotVectors are size and getindex, so that's what the generic fallbacks are using. The multiplication A * w loops through all indices of w from 1 to w.len. So does the call to dot.

Let's do better.

(*)(A::AbstractMatrix, o::OneHotVector) = A[:, o.ind]

AbstractMatrix  # btw, AbstractMatrix is an alias for AbstractArray{T,2} where T

0.5s

Julia

AbstractArray{T,2} where T

@btime f(A, vs)

2.5s

Julia

-3.13214

...or even better...

inner(v::OneHotVector, A::AbstractMatrix, w::OneHotVector) = A[v.ind, w.ind]

0.2s

Julia

inner (generic function with 2 methods)

@btime f(A, vs)

2.2s

Julia

-3.13214

Challenge: how would you do this in another language? Is it still fast and sane?

Things that get in the way:

too much privacy; closed classes (Ruby has open classes, but slow)
single dispatch is awkward for numerical ops, since the first arg is not special
static languages need type classes or similar, but what about developer sanity?

Signs that a language has solved the Expression problem:

it's fast enough that you want to use it
taking a look at the source code of the language's base and standard libs is a pleasant, fulfilling experience
people write and maintain separate packages, and they can be used together with minimal coordination between package devs

Getting to know Julia part 2

multi-dimensional arrays

A = zeros(Float64, (2, 3))
B = zeros((2,3))
C = [1 2 3; 4 5 6]

@show size.([A, B, C])

@show size( [A B C] )

M = [A; B; C]
@show size( M )

@show size( M[2:end-1, 2:end] )

# hcat, vcat, ones, ...

1.3s

Julia

(4, 2)

and a rich ecosystem of AbstractArray impls

https://github.com/JuliaArrays

https://github.com/JuliaGPU

...

fusing broadcast

add a dot to your function call to vectorize it

@show h.(randn(2), randn(2))

a, b = randn(4, 1), randn(1, 3)

h.(a, b) .* 42 .+ randn(4, 3)      # compiles to single fused kernel!

1.6s

Julia

4×3 Array{Float64,2}: 54.9949 3.00959 -86.385 74.4662 27.9268 -63.5211 52.7469 2.9036 -85.7972 61.6012 10.235 -78.7654

meta-programming

Lisp heritage: homoiconic, macros, hooks to manipulate lowered code.

Just a quick flavor here:

expr = :(a+b*c+1)

@show expr

dump(expr)

0.7s

Julia

expr = quote
  x = 1
  y = 2
  x + y
end

expr

0.5s

Julia

quote #= cell:2 =# x = 1 #= cell:3 =# y = 2 #= cell:4 =# x + y end

and a rich ecosystem of metaprogramming tools

https://github.com/jrevels/Cassette.jl

https://github.com/MikeInnes/IRTools.jl

Composable multi-threaded parallelism

As of version 1.3, Julia code can make use of all those CPU cores using simple language constructs. Just spawn tasks and don't worry where they run.

The underlying task scheduler was specifically designed to compose well, so that multi-threaded code calling multi-threaded code continues to give performance improvements. More traditional task schedulers tend to not do well in this kind of scenario. Given how the Julia ecosystem grows as smaller, often performance-oriented libraries that call each other, this is an important performance property of the multi-threading infrastructure.

Btw, Julia has had Go-style channel-based concurrency for a while, which used to all run on the same thread. Now that multi-threading has landed in 1.3, the plan is to harmonize these concurrency and parallelism constructs.

Tooling

Package manager with conda-like envs is built in, accessible through language

using Pkg
Pkg.add("DataFrames")

Search all Packages on github

Debuggers

Memory and runtime Profiling

IDEs / IDE plugins

Interop with other languages

Python and related tools

R, C++, MATLAB and others

(note: JavaCall works with Julia 1.0 but not 1.1, 1.2 and 1.3, scheduled to be fixed in 1.4)

Learning resources

The manual, it's very well written!

"Take the time to study the source code of Base and Standard libraries, it's like learning Kung-Fu from Bruce-Lee for free."

Community highlights

Julia has > 23,870 stars, > 875 contributors on

Packages registered as of Sept. 2019: 3163

Undergoing explosive growth (compare with 2462 registered packages 9 months ago)

Julia packages that are state-of-the-art

Differential Equations

Mathematical Optimization

...and more, depending on who you ask...

Dataframes

JuliaDB has some out-of-memory and distributed capabilities

QueryVerse brings various packages together for a unified data exploration experience

DataFrames.jl

...with interfaces between them via Tables.jl or TableTraits.jl

Autodiff / Differentiable programming / Deep Learning

Several implementations

Several listed https://github.com/JuliaDiff/

https://github.com/FluxML/Flux.jl elegant deep learning framework

https://github.com/denizyuret/Knet.jl GPU-centered DL framework that is very good at optimizing memory reuse

Generic GPU / TPU support

a nice blog post on leveraging AbstractArray interface, metaprogramming, multiple dispatch, for awesome GPU support

support for XLA / TPUs

Probabilistic programming

A budding field that is a particularly good match for Julia

https://github.com/TuringLang/Turing.jl

https://probcomp.github.io/Gen/

https://github.com/cscherrer/Soss.jl

https://github.com/tpapp/DynamicHMC.jl

Packages can very often be used together!

DiffEq + Flux = Neural ODEs DiffEqFlux

Most code involving multi-dim arrays, on CPU, GPU, etc, will just work if you use e.g.:

Measurements with error bars Measurements.jl
Numbers with Units Unitful.jl
(it'll work because the info is propagated using types or separate arrays)

Here an example of integrating ODEs with error bars.

... lots more cross-compatibility (happens when you've tamed the Expression problem without requiring developers to coordinate in advance...)

Near future

Priorities of the compiler team

multi-threading (mostly done, foundations will ship with 1.3, more polished interface later)
better JIT compilation latency (work starting now; maybe some improvements in 1.4, probably continual improvements later?)