How Fast Is R with FastR?

benchmarking ray traced shadows for heightfields

I've been following GraalVM since the first publication so I was naturally very interested in the benchmarks posted in faster-r-with-fastr.

It shows impressive speed ups and claims that GraalVM introduces "A world where R code is as fast as (or even faster than) equivalent C, C++, or Fortran code".

That calls for reproducing the benchmarks and actually comparing it to a baseline C/C++/Fortran implementation. Nextjournal seems to be the perfect tool to freeze code, data and runtimes into one article and enable all readers to execute the code with the same, reproducible setup! You can simply click on remix to edit this article and run the code (for an account: signup).

Nextjournal already built a FastR runtime in another article, which I was able to update to work with the newest version (sdanisch/fastr). The article sets up the docker image we use here and one can use it in a new R code cell by transcluding sdanisch/fastr into the Environment in the runner menu to the right!

print(sessionInfo())

1.
FastR

Let's take the ray tracing code from the original article and benchmark it. I took the freedom to simplify the bilinear implementation to get an extra performance boost:

15.5s
# Code taken from article: http://www.tylermw.com/throwing-shade/
# Author: Tyler Morgan-Wall

# simple bilinear interpolation:
faster_bilinear <- function (Z, x0, y0){
  i = floor(x0)
  j = floor(y0)
  XT = (x0 - i)
  YT = (y0 - j)
  result = (1 - YT) * (1 - XT) * Z[i, j]
  nx = nrow(Z)
  ny = ncol(Z)
  if(i + 1 <= nx){
    result = result + (1-YT) * XT * Z[i + 1, j]
  }
  if(j + 1 <= ny){
    result = result + YT * (1-XT) * Z[i, j + 1]
  }
  if(i + 1 <= nx && j + 1 <= ny){
    result = result + YT * XT * Z[i + 1, j + 1]
  }
  result
}
bench_rays <- function() {
  volcanoshadow = matrix(1, ncol = ncol(volcano), nrow = nrow(volcano))

  volc = list(x=1:nrow(volcano), y=1:ncol(volcano), z=volcano)
  sunangle = 45 / 180*pi
  angle = -90 / 180 * pi
  diffangle = 90 / 180 * pi
  numberangles = 25
  anglebreaks = sapply(seq(angle, diffangle, length.out = numberangles), tan)
  maxdistance = floor(sqrt(ncol(volcano)^2 + nrow(volcano)^2))
  sinsun = sin(sunangle)
  cossun = cos(sunangle)
  for (i in 1:nrow(volcano)) {
    for (j in 1:ncol(volcano)) {
      vij = volcano[i, j]
      for (anglei in anglebreaks) {
        for (k in 1:maxdistance) {
          xcoord = (i + sinsun*k)
          ycoord = (j + cossun*k)
          if(xcoord > nrow(volcano) || 
             ycoord > ncol(volcano) || 
             xcoord < 0 || ycoord < 0) {
            break
          } else {
            tanangheight = vij + anglei * k 
            if (all(c(volcano[ceiling(xcoord), ceiling(ycoord)],
                      volcano[floor(xcoord), ceiling(ycoord)],
                      volcano[ceiling(xcoord), floor(ycoord)],
                      volcano[floor(xcoord), floor(ycoord)]) < tanangheight)) next
            if (tanangheight < faster_bilinear(volcano, xcoord, ycoord)) {
              volcanoshadow[i, j] =  volcanoshadow[i, j] - 1 / length(anglebreaks)
              break
            }
          }
        }
      }
    }
  }
}
min_time = 99999.0
for(i in 1:10){
  start.time <- Sys.time()
  bench_rays()
  end.time <- Sys.time()
  time.taken <- end.time - start.time
  print(sprintf("time taken: %f", time.taken))
  min_time = min(min_time, time.taken)
}
min_time

2.
GNU R

Now let's use the default GNU R runtime and execute exactly the same code as above using GNU R.

min_time

3.
Baseline

I promised a C++/Fortran baseline implementation - I'm sorry to say, that instead I will implement it in Julia. It's much easier and Julia can be expected to offer the same speed as C++ or Fortran. That is no surprise, since Julia is compiled via LLVM which is also used in Clang to compile C++ with state of the art performance. As a matter of fact, Julia has reimplemented quite a few former C/Fortran functions in pure Julia, while preserving performance and even yielding speed ups. You can see that e.g. `sin` is implemented in Julia with `@which`. Click the link and it will take you to the implementation:

using InteractiveUtils
@which sin(1.0)

We can upload files directly to the article, which is the easiest way to package data in a reproducible way:

volcano.csv

I tweaked the Julia implementation a bit, since I rather work with tuples instead of single numbers for x and y. To be sure, I first made a direct port, and verified that the performance is the same.

pkg"add BenchmarkTools"
62.8s
using Plots, BenchmarkTools, DelimitedFiles
function faster_bilinear(Z::AbstractMatrix{T}, x0, y0) where T
  i = floor(Int, x0)
  j = floor(Int, y0)
  XT = (x0 - i)
  YT = (y0 - j)
  result = (1 - YT) * (1 - XT) * Z[i, j]
  nx, ny = size(Z)
  (i + 1 <= nx) && (result = result + (1-YT) * XT * Z[i + 1, j])
  (j + 1 <= ny) && (result = result + YT * (1-XT) * Z[i, j + 1])
  if i + 1 <= nx && j + 1 <= ny
    result = result + YT * XT * Z[i + 1, j + 1]
  end
  result
end
function shadows(
    Z, shadow = fill(one(eltype(Z)), size(Z)); 
    sunangle = 45 / 180*pi, 
    angle = -90 / 180 * pi, diffangle = 90 / 180 * pi, angles = 25,
    anglebreaks = range(angle, stop = diffangle, length = angles),
    maxdistance = floor(Int, sqrt(sum(size(Z).^2)))
  )
  tan_angles = tan.(anglebreaks)
  sun = sincos(sunangle)
  for I in CartesianIndices(size(Z)) # iterate over all I=(i, j)
    aij = Z[I]; ivec = Tuple(I)
    for tana in tan_angles
      for k in 1:maxdistance
        tanheight = aij + tana * k
        coord = ivec .+ sun .* k
        clow = floor.(Int, coord); chigh = ceil.(Int, coord)
        checkbounds(Bool, Z, clow...) && 
          checkbounds(Bool, Z, chigh...) || break
        coords = (
          Z[clow...], Z[chigh...], Z[clow[1], chigh[2]], 
          Z[chigh[1], clow[2]]
        )
        all(coords .< tanheight) && continue
        if tanheight < faster_bilinear(Z, coord...)
          shadow[I] = shadow[I] - 1 / angles
          break
        end
      end
    end
  end
  shadow
end
# just press @ and select the file volcano, to insert a file reference
volcano = readdlm(volcano.csv, ',', Float32, header = true)[1][:, 2:end]
b = @benchmark shadows($volcano)
julia_time = minimum(b).time / 10^9 # time in nanoseconds

Now let's make an animation to see what we've created:

gr(format = :png)
volcano = copy(volcano')
anim = @gif for i in range(0, stop = 2π, length = 60)
  heatmap(shadows(volcano, sunangle = i), aspect_ratio = 1)
end
cp(anim.filename, "/results/output.gif")

Finally, we can plot the speed up by referencing the result values from the R cells and plotting it in Julia:

plotly() # use the plotly plotting backend
fastr = nil
julia = julia_time
r = nil
times = [r, fastr, julia]
speedup = maximum(times) ./ times
labels = ["R", "FastR", "Julia"]
println("speedup relative to R: ", speedup)
println("speedup relative to FastR: ", fastr ./ times);
bar(labels, speedup, legend = false, fillcolor = :grey)


So FastR is ~23 times faster than R and only ~7x slower than Julia. That's very impressive, as R is considered to be one of the harder languages to optimize, and FastR does it with a language agnostic VM!

© 2018 Nextjournal GmbH