by Kolia SadeghiOct 10 2019 UTC

Why I love Julia

Languages and the communities around them take a looooong time to mature. Especially ones that are centered on concepts that are not mainstream yet.

Julia 1.0 was released Aug. 2018, and is seeing explosive growth. This notebook is an attempt at explaining why.

Walks like Python

Julia will feel familiar to Python users:

d = Dict()

for i in 1:26
  c = Char('a' + i - 1)
  d[c] = i

Dict{Any,Any} with 26 entries: 'n' => 14 'f' => 6 'w' => 23 'd' => 4 'e' => 5 'o' => 15 'h' => 8 'j' => 10 'i' => 9 'k' => 11 'r' => 18 's' => 19 't' => 20 'q' => 17 'y' => 25 'a' => 1 'c' => 3 'p' => 16 'm' => 13 ⋮ => ⋮
for (k, v) in d
  println(k => v)
for k in keys(d)
  if k  'c'

(notice the unicode character; type the latex command "\le" and tab complete.)


l = [ 2 * x for x in values(d) if x < 4 ]
3-element Array{Int64,1}: 2 6 4

would have been d.values() in python

gen = ( 2 * x for x in values(d) if x  3 )
Generator{Filter{##8#10,ValueIterator{Dict{Any,Any}}},##7#9}(##7#9(), Filter{##8#10,ValueIterator{Dict{Any,Any}}}(##8#10(), Any[14, 6, 23, 4, 5, 15, 8, 10, 9, 11 … 1, 3, 16, 13, 26, 7, 22, 12, 21, 24, 2]))
l = collect(gen)
3-element Array{Int64,1}: 2 6 4

python equivalent is list(gen)

l, r = zip(  ( (abs(x), -abs(x)) for x in values(d) if x  4 )... )
Zip{NTuple{4,Tuple{Int64,Int64}}}(((4, -4), (1, -1), (3, -3), (2, -2)))


u = union( Set(l), Set(r) )
Set([4, -4, -3, 2, 3, -2, -1, 1])
u = Set(r)  Set(l)
Set([4, -4, -3, 2, -2, -1, 3, 1])

Defaults and keyword args in functions

function myfunc2(required, optional=0, optional2=7; mykwarg="6")
    println(optional * 2)

myfunc2("unused", 21, mykwarg="???")

Functional tools

unioned(ls) = reduce(  , map( Set, ls ) )

unioned([l, r])
Set([4, -4, -3, 2, 3, -2, -1, 1])

and also...

  • dynamic / duck typing
  • gentleman's interfaces, by convention not contract
  • introspection / meta-programming (more than in Python actually)

Python constructs not in Julia

  • generator functions (yield statements)
  • classes / objects
  • here's a list of mostly syntactic differences

Runs like C

Under the hood, Julia generates native code that is fast.

quadratic(a, sqr_term, b) = (-b + sqr_term) / 2a
quadratic (generic function with 1 method)

JIT-compiles native code for every tuple of arg types

@code_warntype quadratic(42.0, 42.0, 42.0)

At JIT-compile time, all type information is available. There is no type erasure or distinction between compile time and runtime types.

Statically compiled languages need interfaces to be defined in advance for 2 modules to work together. Not so with Julia.

In most use cases of statically compiled languages, modules are separate units of compilation. Module A is typically compiled separately from module B, so the compiler cannot do optimizations knowing data types and function definitions in both modules. Julia can use all the type information to generate aggressively optimized native code.

@code_llvm quadratic(42.0, 42.0, 42.0)
@code_llvm quadratic(42, 42, 42)
@code_native quadratic(42, 42.0, 42.0)

loops are fast

naively-written for loop to sum over elements of an array

function summed(a)
	result = 0
  for x in a
    result += x
  return result
summed (generic function with 1 method)
using BenchmarkTools
using Random

@btime evaluates an expression multiple times until timing noise is averaged over

a = randn(42000)
@btime summed(a)

Meanwhile in Python land...

def summed(a):
  result = 0
  for x in a:
    result += x
  return result

import timeit
import random

a = [ random.random() for _ in range(42000) ]
n = 500
t = timeit.timeit(lambda: summed(a), number=n) / n

print("  {:.3} seconds".format(t))

...or more realistically numpy land...

import numpy as np

a = np.random.randn(42000)
n = 1000
t = timeit.timeit(lambda: a.sum(), number=n) / n

print("  {:.3} seconds".format(t))

Our naive julia loop runs about as fast as numpy.

We can easily speed up our loop by letting the compiler know that the operations in the loop are associative so they are amenable to SIMD parallelism:

function sumsimd(a)
	result = zero(eltype(a))
  @simd for x in a
    result += x
  return result

a = randn(42000)
@btime sumsimd(a)

Significantly faster than numpy, and it's a simple, legible, element-type-agnostic loop!

Here's Julia's impl Base.sum, which uses a recursive binary divide-and-conquer strategy to reduce floating point approximation errors and is almost as fast:

# Base.sum uses @simd via Base.mapreduce(f, a::AbstractArray)
a = randn(42000)
@btime sum(a)

structs like C

isbits types

  • any immutable type that contains no references to other values
  • primitive types like UInt8 and Float64 are isbits
  • immutable structs and tuples of isbits types are isbits
  • isbits types have a defined layout compatible with C
  • arrays of isbits types are stack-allocated, not on the heap
struct YupIsBits
  b::Tuple{Float64, UInt8}

yup = YupIsBits(23, (1.5, 42))

@show isbits(yup)

Not surprisingly, Julia can call C functions directly, without any boilerplate or APIs.

struct NopeNotIsBits

nope = NopeNotIsBits("Hello, world.", 23, 1.5)

@show isbits(nope)

Random related notes:

  • Julia lets you declare your own primitive types
  • values of isbits types can be used as type parameters

Here's a parameterized immutable struct:

struct Point{T <: Number}

Structs are invariant in their type parameters, because they have different memory layouts:

Float64 <: Real
Point{Float64} <: Point{Real}

Getting to know Julia part 1

JIT, multiple-dispatch, rich runtime type system

multi-dimensional arrays, meta-program like it's Lisp, fusing broadcast

multiple dispatch

h(x::Number, y::Number) = x

h(x::Float64, y::Float64) = x + y

h(42, 42.0)
h("uh-oh", 0)
# generic fallback
h(x,y) = println("Whoa there, Nelly.")

h("ah-ha", 1)
3 methods for generic function h:
  • h(x::Float64, y::Float64) in Main at cell:3
  • h(x::Number, y::Number) in Main at cell:1
  • h(x, y) in Main at cell:2
g(x::Float64, y) = 2x + y

g(x, y::Float64) = x + 2y

g(2.0, 3)
g(2.0, 2.0)

multiple-dispatch is used everywhere


( ...161 methods, expand the view of the result above on the left to see them all... )

same_type(a, b) = false
same_type(a::T, b::T) where T = true
same_type (generic function with 2 methods)
same_type(42.0, -42.0)
same_type("nope", 0)

rich runtime types

# ( this cell is a code listing that would throw errors if run, because it re-defines types defined in Julia's Base )

# abstract types cannot be instantiated
# but they are very useful in combination with multiple dispatch
abstract type Number end
abstract type Real     <: Number end
abstract type AbstractFloat <: Real end
abstract type Integer  <: Real end
abstract type Signed   <: Integer end
abstract type Unsigned <: Integer end
# and they can carry around info in their type parameters

# A primitive type is a concrete type whose data consists of plain old bits.
primitive type Float16 <: AbstractFloat 16 end
primitive type Float32 <: AbstractFloat 32 end
primitive type Float64 <: AbstractFloat 64 end

primitive type Bool <: Integer 8 end
primitive type Char <: AbstractChar 32 end
# ...

#   we've seen immutable structs and parameterized structs above

# mutable structs
mutable struct MyMut

# type Unions

# parametric abstract types
abstract type Pointy{T} end

# Tuple types are covariant in their type params
Tuple{Int, AbstractString} <: Tuple{Real,Any}       # true

# Also, UnionAll types, Vararg tuple types, NamedTuples, singleton types, parametric primitive types, type aliases...

# btw: any isbits value can be a Value type or type parameter
# example: AbstractArray{T, N}

AbstractArray{Float64, 3}  # 3rd order tensor with elements of type Float64

Types are runtime constructs: no type erasure, complete access to type info at runtime!

@show isa(1, Int)

@show typeof( 4 // 3 )

@show typeof(Rational{Int})

@show typeof(Union)

@show typeof(DataType)

@show supertype(Float64)

@show supertype(Number)

@show supertype(Any)

@show subtypes(Number)
2-element Array{Any,1}: Complex Real

What's the big deal? the Expression problem

A programming language should allow separate developers to:

  • extend each others' datatypes
  • write new functions over existing datatypes
  • maintaining good performance => NO python duck types / Ruby open classes
  • maintaining developer sanity => NO type classes or CoProducts of functors
  • big bonus if you can extend datatypes and functions without requiring developers of different packages to opting in or coordinate in advance

(requirements 3 and 4 were added to the original formulation of the problem because if they do not hold, developers will avoid using the language features that allow you to do the first 2 for any significant functionality, they would reach for constructs that are fast and sane instead)

Julia satisfies all these requirements, including the bonus. I'm curious to know if there are other languages out there that do the same for some reasonable notion of 'fast' and 'sane'. Shoot me a line if you find one :)

start with a generic method

using LinearAlgebra

function f(A, vs)
  t = zero(eltype(A))
  for v in vs
    t += inner(v, A, v)         # <= multiple dispatch
  return t

inner(v, A, w) = dot(v, A*w)    # generic fallback defintion
inner (generic function with 1 method)
A = randn(42, 42)
vs = [ randn(42) for _ in 1:10 ]

@btime f(A, vs)

define a new datatype

Now another developer wants to define a new datatype that will work with this generic method. And they want it to run fast.

import Base: size, getindex, *

struct OneHotVector <: AbstractVector{Bool}
 len :: Int
 ind :: Int

# define minimal set of AbstractVector methods
size(v::OneHotVector) = (v.len,)
getindex(v::OneHotVector, i::Integer) = i == v.ind
getindex (generic function with 204 methods)
onehot = OneHotVector(42000, 42)

@show isbits(onehot)
vs = [ OneHotVector(42, i) for i in 1:10 ]

@btime f(A, vs)

Works! But this is using generic fallbacks for * and dot that are slow:

The only methods defined on our OneHotVectors are size and getindex, so that's what the generic fallbacks are using. The multiplication A * w loops through all indices of w from 1 to w.len. So does the call to dot.

Let's do better.

(*)(A::AbstractMatrix, o::OneHotVector) = A[:, o.ind]

AbstractMatrix  # btw, AbstractMatrix is an alias for AbstractArray{T,2} where T
AbstractArray{T,2} where T
@btime f(A, vs)

...or even better...

inner(v::OneHotVector, A::AbstractMatrix, w::OneHotVector) = A[v.ind, w.ind]
inner (generic function with 2 methods)
@btime f(A, vs)

Challenge: how would you do this in another language? Is it still fast and sane?

Things that get in the way:

  • too much privacy; closed classes (Ruby has open classes, but slow)
  • single dispatch is awkward for numerical ops, since the first arg is not special
  • static languages need type classes or similar, but what about developer sanity?

Signs that a language has solved the Expression problem:

  • it's fast enough that you want to use it
  • taking a look at the source code of the language's base and standard libs is a pleasant, fulfilling experience
  • people write and maintain separate packages, and they can be used together with minimal coordination between package devs

Getting to know Julia part 2

multi-dimensional arrays

A = zeros(Float64, (2, 3))
B = zeros((2,3))
C = [1 2 3; 4 5 6]

@show size.([A, B, C])

@show size( [A B C] )

M = [A; B; C]
@show size( M )

@show size( M[2:end-1, 2:end] )

# hcat, vcat, ones, ...
(4, 2)

and a rich ecosystem of AbstractArray impls


fusing broadcast

add a dot to your function call to vectorize it

@show h.(randn(2), randn(2))

a, b = randn(4, 1), randn(1, 3)

h.(a, b) .* 42 .+ randn(4, 3)      # compiles to single fused kernel!
4×3 Array{Float64,2}: 54.9949 3.00959 -86.385 74.4662 27.9268 -63.5211 52.7469 2.9036 -85.7972 61.6012 10.235 -78.7654


Lisp heritage: homoiconic, macros, hooks to manipulate lowered code.

Just a quick flavor here:

expr = :(a+b*c+1)

@show expr

expr = quote
  x = 1
  y = 2
  x + y

quote #= cell:2 =# x = 1 #= cell:3 =# y = 2 #= cell:4 =# x + y end

and a rich ecosystem of metaprogramming tools

Composable multi-threaded parallelism

As of version 1.3, Julia code can make use of all those CPU cores using simple language constructs. Just spawn tasks and don't worry where they run.

The underlying task scheduler was specifically designed to compose well, so that multi-threaded code calling multi-threaded code continues to give performance improvements. More traditional task schedulers tend to not do well in this kind of scenario. Given how the Julia ecosystem grows as smaller, often performance-oriented libraries that call each other, this is an important performance property of the multi-threading infrastructure.

Btw, Julia has had Go-style channel-based concurrency for a while, which used to all run on the same thread. Now that multi-threading has landed in 1.3, the plan is to harmonize these concurrency and parallelism constructs.


Package manager with conda-like envs is built in, accessible through language

using Pkg

Interop with other languages

Python and related tools

(note: JavaCall works with Julia 1.0 but not 1.1, 1.2 and 1.3, scheduled to be fixed in 1.4)

Learning resources

The manual, it's very well written!

"Take the time to study the source code of Base and Standard libraries, it's like learning Kung-Fu from Bruce-Lee for free."

Community highlights

Julia has > 23,870 stars, > 875 contributors on

Packages registered as of Sept. 2019: 3163

Undergoing explosive growth (compare with 2462 registered packages 9 months ago)

Julia packages that are state-of-the-art


JuliaDB has some out-of-memory and distributed capabilities

QueryVerse brings various packages together for a unified data exploration experience

...with interfaces between them via Tables.jl or TableTraits.jl

Autodiff / Differentiable programming / Deep Learning

Several implementations elegant deep learning framework GPU-centered DL framework that is very good at optimizing memory reuse

Generic GPU / TPU support

a nice blog post on leveraging AbstractArray interface, metaprogramming, multiple dispatch, for awesome GPU support

support for XLA / TPUs

Probabilistic programming

Packages can very often be used together!

DiffEq + Flux = Neural ODEs DiffEqFlux

Most code involving multi-dim arrays, on CPU, GPU, etc, will just work if you use e.g.:

  • Measurements with error bars Measurements.jl
  • Numbers with Units Unitful.jl
  • (it'll work because the info is propagated using types or separate arrays)

... lots more cross-compatibility (happens when you've tamed the Expression problem without requiring developers to coordinate in advance...)

Near future

  • multi-threading (mostly done, foundations will ship with 1.3, more polished interface later)
  • better JIT compilation latency (work starting now; maybe some improvements in 1.4, probably continual improvements later?)