How Fast Is R with FastR?

benchmarking ray traced shadows for heightfields
I've been following GraalVM since the first publication so I was naturally very interested in the benchmarks posted in faster-r-with-fastr.
It shows impressive speed ups and claims that GraalVM introduces "A world where R code is as fast as (or even faster than) equivalent C, C++, or Fortran code".
That calls for reproducing the benchmarks and actually comparing it to a baseline C/C++/Fortran implementation. Nextjournal seems to be the perfect tool to freeze code, data and runtimes into one article and enable all readers to execute the code with the same, reproducible setup! You can simply click on remix to edit this article and run the code (for an account: signup).
Nextjournal already built a FastR runtime in another article, which I was able to update to work with the newest version (sdanisch/fastr). The article sets up the docker image we use here and one can use it in a new R code cell by transcluding sdanisch/fastr
into the Environment in the runner menu to the right!
print(sessionInfo())
1. FastR
Let's take the ray tracing code from the original article and benchmark it. I took the freedom to simplify the bilinear implementation to get an extra performance boost:
# Code taken from article: http://www.tylermw.com/throwing-shade/ # Author: Tyler Morgan-Wall # simple bilinear interpolation: faster_bilinear <- function (Z, x0, y0){ i = floor(x0) j = floor(y0) XT = (x0 - i) YT = (y0 - j) result = (1 - YT) * (1 - XT) * Z[i, j] nx = nrow(Z) ny = ncol(Z) if(i + 1 <= nx){ result = result + (1-YT) * XT * Z[i + 1, j] } if(j + 1 <= ny){ result = result + YT * (1-XT) * Z[i, j + 1] } if(i + 1 <= nx && j + 1 <= ny){ result = result + YT * XT * Z[i + 1, j + 1] } result } bench_rays <- function() { volcanoshadow = matrix(1, ncol = ncol(volcano), nrow = nrow(volcano)) volc = list(x=1:nrow(volcano), y=1:ncol(volcano), z=volcano) sunangle = 45 / 180*pi angle = -90 / 180 * pi diffangle = 90 / 180 * pi numberangles = 25 anglebreaks = sapply(seq(angle, diffangle, length.out = numberangles), tan) maxdistance = floor(sqrt(ncol(volcano)^2 + nrow(volcano)^2)) sinsun = sin(sunangle) cossun = cos(sunangle) for (i in 1:nrow(volcano)) { for (j in 1:ncol(volcano)) { vij = volcano[i, j] for (anglei in anglebreaks) { for (k in 1:maxdistance) { xcoord = (i + sinsun*k) ycoord = (j + cossun*k) if(xcoord > nrow(volcano) || ycoord > ncol(volcano) || xcoord < 0 || ycoord < 0) { break } else { tanangheight = vij + anglei * k if (all(c(volcano[ceiling(xcoord), ceiling(ycoord)], volcano[floor(xcoord), ceiling(ycoord)], volcano[ceiling(xcoord), floor(ycoord)], volcano[floor(xcoord), floor(ycoord)]) < tanangheight)) next if (tanangheight < faster_bilinear(volcano, xcoord, ycoord)) { volcanoshadow[i, j] = volcanoshadow[i, j] - 1 / length(anglebreaks) break } } } } } } } min_time = 99999.0 for(i in 1:10){ start.time <- Sys.time() bench_rays() end.time <- Sys.time() time.taken <- end.time - start.time print(sprintf("time taken: %f", time.taken)) min_time = min(min_time, time.taken) } min_time
2. GNU R
Now let's use the default GNU R runtime and execute exactly the same code as above using GNU R.
min_time
3. Baseline
I promised a C++/Fortran baseline implementation - I'm sorry to say, that instead I will implement it in Julia. It's much easier and Julia can be expected to offer the same speed as C++ or Fortran. That is no surprise, since Julia is compiled via LLVM which is also used in Clang to compile C++ with state of the art performance. As a matter of fact, Julia has reimplemented quite a few former C/Fortran functions in pure Julia, while preserving performance and even yielding speed ups. You can see that e.g. `sin` is implemented in Julia with `@which`. Click the link and it will take you to the implementation:
using InteractiveUtils sin(1.0)
We can upload files directly to the article, which is the easiest way to package data in a reproducible way:
I tweaked the Julia implementation a bit, since I rather work with tuples instead of single numbers for x and y. To be sure, I first made a direct port, and verified that the performance is the same.
import Pkg; Pkg.add("BenchmarkTools")
using Plots, BenchmarkTools, DelimitedFiles function faster_bilinear(Z::AbstractMatrix{T}, x0, y0) where T i = floor(Int, x0) j = floor(Int, y0) XT = (x0 - i) YT = (y0 - j) result = (1 - YT) * (1 - XT) * Z[i, j] nx, ny = size(Z) (i + 1 <= nx) && (result = result + (1-YT) * XT * Z[i + 1, j]) (j + 1 <= ny) && (result = result + YT * (1-XT) * Z[i, j + 1]) if i + 1 <= nx && j + 1 <= ny result = result + YT * XT * Z[i + 1, j + 1] end result end function shadows( Z, shadow = fill(one(eltype(Z)), size(Z)); sunangle = 45 / 180*pi, angle = -90 / 180 * pi, diffangle = 90 / 180 * pi, angles = 25, anglebreaks = range(angle, stop = diffangle, length = angles), maxdistance = floor(Int, sqrt(sum(size(Z).^2))) ) tan_angles = tan.(anglebreaks) sun = sincos(sunangle) for I in CartesianIndices(size(Z)) # iterate over all I=(i, j) aij = Z[I]; ivec = Tuple(I) for tana in tan_angles for k in 1:maxdistance tanheight = aij + tana * k coord = ivec .+ sun .* k clow = floor.(Int, coord); chigh = ceil.(Int, coord) checkbounds(Bool, Z, clow...) && checkbounds(Bool, Z, chigh...) || break if (Z[clow[1], clow[2]] < tanheight || Z[chigh[1], chigh[2]] < tanheight || Z[clow[1], chigh[2]] < tanheight || Z[chigh[1], clow[2]] < tanheight) continue end if tanheight < faster_bilinear(Z, coord...) shadow[I] = shadow[I] - 1 / angles break end end end end shadow end # just press @ and select the file volcano, to insert a file reference volcano = readdlm(volcano.csv↩, ',', Float32, header = true)[1][:, 2:end]
b = shadows($volcano) julia_time = minimum(b).time / 10^9 # time in nanoseconds
heatmap(shadows(volcano), aspect_ratio = 1)
Now let's make an animation to see what we've created:
gr(format = :png) volcano = copy(volcano') anim = for i in range(0, stop = 2π, length = 60) heatmap(shadows(volcano, sunangle = i), aspect_ratio = 1) end cp(anim.filename, "/results/output.gif")
Finally, we can plot the speed up by referencing the result values from the R cells and plotting it in Julia:
pip install pythran
cat shade_pythran.py↩ > shade_pythran.py
apt-get update && apt-get install g++ clang
printf "[compiler]\nCXX=clang\nCC=clang" > ~/.pythranrc
pythran -O3 -march=native -DUSE_BOOST_SIMD shade_pythran.py
import numpy as np import sys sys.path.append(".") import shade_pythran io = open(volcano.csv↩, 'rb') volcano = np.loadtxt(io, dtype=np.float32, delimiter=',', skiprows=1, usecols=range(1, 62)) result = shade_pythran.bench_rays(volcano) import matplotlib.pyplot as plt import plotly.tools as tls # redefine some plotting methods for Nextjournal support plt.show = plt.gcf plt.imshow(result.T, origin='lower') plt.show()
import timeit setup = """ import shade_pythran import numpy as np io = open(NJ__REF0c35941c_3771_4d34_8eca_efe866f1f807_volcano_csv, 'rb') volcano = np.loadtxt(io, dtype=np.float32, delimiter=',', skiprows=1, usecols=range(1, 62)) """ code = """ result = shade_pythran.bench_rays(volcano) """ timings = timeit.repeat(code, setup, repeat=10, number=1) print(timings) min(timings)
using Plots plotly() # use the plotly plotting backend fastr = nil↩ julia = julia_time r = nil↩ pythran = nil↩ times = [r, fastr, julia, pythran] speedup = maximum(times) ./ times labels = ["R", "FastR", "Julia", "Pythran"] println("speedup relative to R: ", speedup) println("speedup relative to FastR: ", fastr ./ times); bar(labels, speedup, legend = false, fillcolor = :grey)
using DelimitedFiles using Plots gr(format = :png) pythrandata = readdlm(pythran.csv↩, ',') heatmap(pythrandata', aspect_ratio = 1)