Why I love Julia
Languages and the communities around them take a looooong time to mature. Especially ones that are centered on concepts that are not mainstream yet.
Julia 1.0 was released Aug. 2018, and is seeing explosive growth. This notebook is an attempt at explaining why.
Walks like Python
Julia will feel familiar to Python users:
d = Dict() for i in 1:26 c = Char('a' + i - 1) d[c] = i end d
for (k, v) in d println(k => v) end
for k in keys(d) if k ≤ 'c' println(k) end end
(notice the unicode character; type the latex command "\le" and tab complete.)
Comprehensions
l = [ 2 * x for x in values(d) if x < 4 ]
would have been d.values() in python
gen = ( 2 * x for x in values(d) if x ≤ 3 )
l = collect(gen)
python equivalent is list(gen)
l, r = zip( ( (abs(x), -abs(x)) for x in values(d) if x ≤ 4 )... ) #collect(l)
Sets
u = union( Set(l), Set(r) )
u = Set(r) ∪ Set(l)
Defaults and keyword args in functions
function myfunc2(required, optional=0, optional2=7; mykwarg="6") println(optional * 2) end myfunc2("unused", 21, mykwarg="???")
Functional tools
unioned(ls) = reduce( ∪ , map( Set, ls ) ) unioned([l, r])
and also...
- dynamic / duck typing
- gentleman's interfaces, by convention not contract
- introspection / meta-programming (more than in Python actually)
Python constructs not in Julia
- generator functions (yield statements)
- classes / objects
- here's a list of mostly syntactic differences
Runs like C
Under the hood, Julia generates native code that is fast.
quadratic(a, sqr_term, b) = (-b + sqr_term) / 2a
JIT-compiles native code for every tuple of arg types
quadratic(42.0, 42.0, 42.0)
At JIT-compile time, all type information is available. There is no type erasure or distinction between compile time and runtime types.
Statically compiled languages need interfaces to be defined in advance for 2 modules to work together. Not so with Julia.
In most use cases of statically compiled languages, modules are separate units of compilation. Module A is typically compiled separately from module B, so the compiler cannot do optimizations knowing data types and function definitions in both modules. Julia can use all the type information to generate aggressively optimized native code.
quadratic(42.0, 42.0, 42.0)
quadratic(42, 42, 42)
quadratic(42, 42.0, 42.0)
loops are fast
naively-written for loop to sum over elements of an array
function summed(a) result = 0 for x in a result += x end return result end
using BenchmarkTools using Random
@btime evaluates an expression multiple times until timing noise is averaged over
a = randn(42000) summed(a)
Meanwhile in Python land...
def summed(a): result = 0 for x in a: result += x return result import timeit import random a = [ random.random() for _ in range(42000) ] n = 500 t = timeit.timeit(lambda: summed(a), number=n) / n print(" {:.3} seconds".format(t))
...or more realistically numpy land...
import numpy as np a = np.random.randn(42000) n = 1000 t = timeit.timeit(lambda: a.sum(), number=n) / n print(" {:.3} seconds".format(t))
Our naive julia loop runs about as fast as numpy.
We can easily speed up our loop by letting the compiler know that the operations in the loop are associative so they are amenable to SIMD parallelism:
function sumsimd(a) result = zero(eltype(a)) for x in a result += x end return result end a = randn(42000) sumsimd(a)
Significantly faster than numpy, and it's a simple, legible, element-type-agnostic loop!
Here's Julia's impl Base.sum, which uses a recursive binary divide-and-conquer strategy to reduce floating point approximation errors and is almost as fast:
# Base.sum uses @simd via Base.mapreduce(f, a::AbstractArray) a = randn(42000) sum(a)
take a look at the actual impl https://github.com/JuliaLang/julia/blob/v1.2.0/base/reduce.jl#L155
structs like C
isbits types
- any immutable type that contains no references to other values
- primitive types like UInt8 and Float64 are isbits
- immutable structs and tuples of isbits types are isbits
- isbits types have a defined layout compatible with C
- arrays of isbits types are stack-allocated, not on the heap
struct YupIsBits a::Int b::Tuple{Float64, UInt8} end yup = YupIsBits(23, (1.5, 42)) isbits(yup)
Not surprisingly, Julia can call C functions directly, without any boilerplate or APIs.
struct NopeNotIsBits bar baz::Int qux::Float64 end nope = NopeNotIsBits("Hello, world.", 23, 1.5) isbits(nope)
Random related notes:
- Julia lets you declare your own primitive types
- values of isbits types can be used as type parameters
Here's a parameterized immutable struct:
struct Point{T <: Number} x::T y::T end
Structs are invariant in their type parameters, because they have different memory layouts:
Float64 <: Real
Point{Float64} <: Point{Real}
Getting to know Julia part 1
JIT, multiple-dispatch, rich runtime type system
multi-dimensional arrays, meta-program like it's Lisp, fusing broadcast
multiple dispatch
h(x::Number, y::Number) = x h(x::Float64, y::Float64) = x + y h(42, 42.0)
h("uh-oh", 0)
# generic fallback h(x,y) = println("Whoa there, Nelly.") h("ah-ha", 1)
methods(h)
- h(x::Float64, y::Float64) in Main at cell:3
- h(x::Number, y::Number) in Main at cell:1
- h(x, y) in Main at cell:2
g(x::Float64, y) = 2x + y g(x, y::Float64) = x + 2y g(2.0, 3)
g(2.0, 2.0)
multiple-dispatch is used everywhere
methods(+)
( ...161 methods, expand the view of the result above on the left to see them all... )
same_type(a, b) = false same_type(a::T, b::T) where T = true
same_type(42.0, -42.0)
same_type("nope", 0)
rich runtime types
# ( this cell is a code listing that would throw errors if run, because it re-defines types defined in Julia's Base ) # abstract types cannot be instantiated # but they are very useful in combination with multiple dispatch abstract type Number end abstract type Real <: Number end abstract type AbstractFloat <: Real end abstract type Integer <: Real end abstract type Signed <: Integer end abstract type Unsigned <: Integer end # and they can carry around info in their type parameters # A primitive type is a concrete type whose data consists of plain old bits. primitive type Float16 <: AbstractFloat 16 end primitive type Float32 <: AbstractFloat 32 end primitive type Float64 <: AbstractFloat 64 end primitive type Bool <: Integer 8 end primitive type Char <: AbstractChar 32 end # ... ################################################################# # # we've seen immutable structs and parameterized structs above # ################################################################# # mutable structs mutable struct MyMut a b::Int end # type Unions Union{Int,AbstractString} # parametric abstract types abstract type Pointy{T} end # Tuple types are covariant in their type params Tuple{Int, AbstractString} <: Tuple{Real,Any} # true # Also, UnionAll types, Vararg tuple types, NamedTuples, singleton types, parametric primitive types, type aliases... # btw: any isbits value can be a Value type or type parameter
# example: AbstractArray{T, N} AbstractArray{Float64, 3} # 3rd order tensor with elements of type Float64
Types are runtime constructs: no type erasure, complete access to type info at runtime!
isa(1, Int) typeof( 4 // 3 ) typeof(Rational{Int}) typeof(Union) typeof(DataType) supertype(Float64) supertype(Number) supertype(Any) subtypes(Number)
What's the big deal? the Expression problem
A programming language should allow separate developers to:
- extend each others' datatypes
- write new functions over existing datatypes
- maintaining good performance => NO python duck types / Ruby open classes
- maintaining developer sanity => NO type classes or CoProducts of functors
- big bonus if you can extend datatypes and functions without requiring developers of different packages to opting in or coordinate in advance
(requirements 3 and 4 were added to the original formulation of the problem because if they do not hold, developers will avoid using the language features that allow you to do the first 2 for any significant functionality, they would reach for constructs that are fast and sane instead)
Julia satisfies all these requirements, including the bonus. I'm curious to know if there are other languages out there that do the same for some reasonable notion of 'fast' and 'sane'. Shoot me a line if you find one :)
start with a generic method
using LinearAlgebra function f(A, vs) t = zero(eltype(A)) for v in vs t += inner(v, A, v) # <= multiple dispatch end return t end inner(v, A, w) = dot(v, A*w) # generic fallback defintion
A = randn(42, 42) vs = [ randn(42) for _ in 1:10 ] f(A, vs)
define a new datatype
Now another developer wants to define a new datatype that will work with this generic method. And they want it to run fast.
import Base: size, getindex, * struct OneHotVector <: AbstractVector{Bool} len :: Int ind :: Int end # define minimal set of AbstractVector methods size(v::OneHotVector) = (v.len,) getindex(v::OneHotVector, i::Integer) = i == v.ind
onehot = OneHotVector(42000, 42) isbits(onehot) dump(onehot)
vs = [ OneHotVector(42, i) for i in 1:10 ] f(A, vs)
Works! But this is using generic fallbacks for * and dot that are slow:
The only methods defined on our OneHotVectors are size and getindex, so that's what the generic fallbacks are using. The multiplication A * w loops through all indices of w from 1 to w.len. So does the call to dot.
Let's do better.
(*)(A::AbstractMatrix, o::OneHotVector) = A[:, o.ind] AbstractMatrix # btw, AbstractMatrix is an alias for AbstractArray{T,2} where T
f(A, vs)
...or even better...
inner(v::OneHotVector, A::AbstractMatrix, w::OneHotVector) = A[v.ind, w.ind]
f(A, vs)
Challenge: how would you do this in another language? Is it still fast and sane?
Things that get in the way:
- too much privacy; closed classes (Ruby has open classes, but slow)
- single dispatch is awkward for numerical ops, since the first arg is not special
- static languages need type classes or similar, but what about developer sanity?
Signs that a language has solved the Expression problem:
- it's fast enough that you want to use it
- taking a look at the source code of the language's base and standard libs is a pleasant, fulfilling experience
- people write and maintain separate packages, and they can be used together with minimal coordination between package devs
Getting to know Julia part 2
multi-dimensional arrays
A = zeros(Float64, (2, 3)) B = zeros((2,3)) C = [1 2 3; 4 5 6] size.([A, B, C]) size( [A B C] ) M = [A; B; C] size( M ) size( M[2:end-1, 2:end] ) # hcat, vcat, ones, ...
and a rich ecosystem of AbstractArray impls
...
fusing broadcast
add a dot to your function call to vectorize it
h.(randn(2), randn(2)) a, b = randn(4, 1), randn(1, 3) h.(a, b) .* 42 .+ randn(4, 3) # compiles to single fused kernel!
meta-programming
Lisp heritage: homoiconic, macros, hooks to manipulate lowered code.
Just a quick flavor here:
expr = :(a+b*c+1) expr dump(expr)
expr = quote x = 1 y = 2 x + y end expr
and a rich ecosystem of metaprogramming tools
Composable multi-threaded parallelism
As of version 1.3, Julia code can make use of all those CPU cores using simple language constructs. Just spawn tasks and don't worry where they run.
The underlying task scheduler was specifically designed to compose well, so that multi-threaded code calling multi-threaded code continues to give performance improvements. More traditional task schedulers tend to not do well in this kind of scenario. Given how the Julia ecosystem grows as smaller, often performance-oriented libraries that call each other, this is an important performance property of the multi-threading infrastructure.
Btw, Julia has had Go-style channel-based concurrency for a while, which used to all run on the same thread. Now that multi-threading has landed in 1.3, the plan is to harmonize these concurrency and parallelism constructs.
Tooling
Package manager with conda-like envs is built in, accessible through language
using Pkg Pkg.add("DataFrames")
Search all Packages on github
Interop with other languages
Python and related tools
(note: JavaCall works with Julia 1.0 but not 1.1, 1.2 and 1.3, scheduled to be fixed in 1.4)
Learning resources
The manual, it's very well written!
"Take the time to study the source code of Base and Standard libraries, it's like learning Kung-Fu from Bruce-Lee for free."
Community highlights
Julia has > 23,870 stars, > 875 contributors on
Packages registered as of Sept. 2019: 3163
Undergoing explosive growth (compare with 2462 registered packages 9 months ago)
Julia packages that are state-of-the-art
Dataframes
JuliaDB has some out-of-memory and distributed capabilities
QueryVerse brings various packages together for a unified data exploration experience
...with interfaces between them via Tables.jl or TableTraits.jl
Autodiff / Differentiable programming / Deep Learning
Several implementations
Several listed https://github.com/JuliaDiff/
https://github.com/FluxML/Flux.jl elegant deep learning framework
https://github.com/denizyuret/Knet.jl GPU-centered DL framework that is very good at optimizing memory reuse
Generic GPU / TPU support
a nice blog post on leveraging AbstractArray interface, metaprogramming, multiple dispatch, for awesome GPU support
support for XLA / TPUs
Probabilistic programming
A budding field that is a particularly good match for Julia
Packages can very often be used together!
DiffEq + Flux = Neural ODEs DiffEqFlux
Most code involving multi-dim arrays, on CPU, GPU, etc, will just work if you use e.g.:
- Measurements with error bars Measurements.jl
- Numbers with Units Unitful.jl
- (it'll work because the info is propagated using types or separate arrays)
... lots more cross-compatibility (happens when you've tamed the Expression problem without requiring developers to coordinate in advance...)
Near future
- multi-threading (mostly done, foundations will ship with 1.3, more polished interface later)
- better JIT compilation latency (work starting now; maybe some improvements in 1.4, probably continual improvements later?)