Julia on Nextjournal
This computational notebook demonstrates a few features of the Julia language. It was written to accompany a Nature Toolbox article on Julia, published 30 July 2019 (Nature 572, 141-142 (2019)).
This is an executable document. To modify and run it, create an account on Nextjournal, then open this notebook and click the 'remix' button to create your own working copy. (A copy of the underlying Markdown document is available on GitHub.)
Syntactic sugar
Julia allows programmers to insert Greek characters (really, any Unicode symbol) into their code, making it easier to read. Some of these symbols are predefined, such as pi (3.141593...). We'll use that symbol to calculate the area and circumference of a circle. (Type \pi [TAB]
to create the pi symbol.)
# We use semicolons to suppress the output of certain cells r = 5; # radius of a circle A = π * r^2; C = 2 * π * r;
println("Area = ", round(A, digits = 2)) println("Circumference = ", round(C, digits = 2))
Greek characters can also represent functions. Here, we'll define two functions f(x) and g(x), and use the Greek sigma \Sigma
) to sum those functions over an interval.
function ∑(f::Function,from::Integer,to::Integer) # note that Julia allows you to specify the 'type' of parameters passed to # a function, which makes for more optimized (and error-checked) code. # Here, the parameter 'f' is defined as a function; 'from' and 'to' # are integers. sum = 0 for i = from:to sum += f(i) end return sum end
f(x) = 2x + 4 # function `f` (note that no multiplication symbol is required between 2 and x.) g(x) = x^2 + 4 # function `g` sum = ∑(f,1,10); # note the '$sum' string formatting syntax println("The sum of \'f(x) = 2x + 4\' over the interval [1,10] is: $sum") sum = ∑(g,1,10); println("The sum of \'g(x) = x^2 + 4\' over the interval [1,10] is: $sum")
If you're so inclined, you can even use emoji as variable and function names. How about a 'smiley cat'? (Type \:smiley_cat: [TAB]
)
😺= 7 for i in 1:😺 println(i) end
Distributed computing
Julia also supports distributed computing -- the distribution of computational work across multiple processor cores, whether locally or remotely. The following example was generously provided by Simon Byrne, a software engineer at Caltech and member of the Climate Modeling Alliance team.
Consider a simple Monte Carlo estimation of
Then we can estimate
Firstly we consider a simple serial implementation, where we accumulate into a variable c
:
function mc_pi(n) c = 0 for i = 1:n x = rand() y = rand() c += (x^2 + y^2 <= 1) end c/n*4 end
Julia code is compiled 'just-in-time'. The first time it runs, it is relatively slow. So, we run it twice, and time the output. We run the calculation a billion times.
mc_pi(10^9)
This second run takes advantage of precompiled code, and thus should be slightly faster
mc_pi(10^9)
To do this in parallel, first we load the Distributed
package (which comes with Julia), and add some processes
using Distributed localprocs = addprocs(4); # start 4 workers on my local computer
We need to slightly rewrite the code to be amenable to parallelism: in this case, we use the @distributed
macro, specifying a "reducer" function (in this case, +
), which will combine the outputs of all the loops.
function mc_pi_parallel(n) c = (+) for i = 1:n x = rand() y = rand() x^2 + y^2 <= 1 end c/n*4 end
Again, the first time we run this, the code is slowed due to JIT compilation overhead
mc_pi_parallel(10^9)
If we rerun the code, we should see a small speedup -- my local machine only has 2 physical cores. Still, the code is about 43% faster than the non-parallelized implementation above (2.4 sec vs 4.2 sec). And it's also faster than the first run of mc_pi_parallel()
, which had to be compiled.
mc_pi_parallel(10^9) # only a small speedup
rmprocs(localprocs...) # release the allocated workers...
Graphing
Next, we'll graph the Fibonacci sequence, where each value equals the sum of the two previous values:
function fibonacci(i = 25) # compute the Fibonacci sequence. By default, return the first 25 values. (i >= 2) # basic input check f1 = 0 f2 = 1 result = [f1, f2] for i in 1:(i-2) f3 = f1 + f2 push!(result, f3) # add the new value to the array f1 = f2 f2 = f3 end return result end
myarray = fibonacci();
We use Julia's package manager to load the libraries needed to plot our results.
using Pkg
Pkg.add("Plots") using Plots Pkg.add("PyPlot") pyplot()
plot(1:25,myarray,label="",color="blue", markershape = :circle, markercolor = :red, xlabel = "Sequence no.", ylabel = "Fibonacci no.", title = "The first 25 Fibonacci numbers")
Sequence analysis
Julia can also handle DNA sequence analysis, using the BioJulia packages.
Pkg.add("BioSequences") using BioSequences
First, we'll read in a sequence file -- a circular DNA, called a plasmid, from the bacterium, Yersinia pestis. sequence.fasta
is a FASTA-formatted file downloaded from https://www.ncbi.nlm.nih.gov/nuccore/NC_005816.1?report=fasta&log$=seqview.
reader = FASTA.Reader(open(sequence.fasta,"r")) my_dna = read(reader) close(reader)
Convert the file into a BioJulia DNASequence
object.
my_dna = DNASequence(sequence(my_dna))
From https://github.com/jperkel/example_notebook/blob/master/My_sample_notebook.ipynb, we know that there is a 'hypothetical protein' located at 3485:3857 in this plasmid. But those coordinates are from Python, which is zero-indexed (that is, the first element of an array is 0, not 1). For Julia, which is not (the first element is 1), the location would be 3486:3857. Here, we extract that gene sequence, convert it to RNA, and translate it.
my_gene=my_dna[3486:3857] my_rna=RNASequence(my_gene)
my_ptn = BioSequences.translate(my_rna)
Now let's see what info the NCBI Entrez database has on this protein. From https://github.com/jperkel/example_notebook/blob/master/My_sample_notebook.ipynb we learned that the accession number for this sequence is NP_995570.1.
Pkg.add("BioServices") using BioServices.EUtils Pkg.add("EzXML") using EzXML
res = efetch(db="protein",id="NP_995570.1",rettype="gb",retmode="xml");
We'll need to parse the XML output. What gene is this?
doc = parsexml(res.body) seq = findfirst("/GBSet/GBSeq",doc) nodecontent(findfirst("GBSeq_definition", seq))
What is its sequence?
entrez_ptn = AminoAcidSequence(nodecontent(findfirst("GBSeq_sequence",seq)))
Let's compare the Entrez sequence with our FASTA sequence using a pairwise alignment...
Pkg.add("BioAlignments") using BioAlignments
costmodel = CostModel(match=0, mismatch=1, insertion=1, deletion=1); pairalign(EditDistance(), my_ptn, entrez_ptn, costmodel)
It's a match!
Calling Python
Finally, we'll demonstrate Julia's ability to utilize code written in another language, using the PyCall
package to call the Python folium
mapping library (see our June 2018 mapping feature).
Pkg.add("PyCall") using PyCall
If you don't have folium
installed, you'll need to do so. You can do that here using the Julia Conda
package.
Pkg.add("Conda") using Conda
Conda.add_channel("conda-forge") Conda.add("folium")
Now we create a simple map: a few points in London, Oxford and Cambridge, overlaid on either a street map, or on a map of geological data provided by the Macrostrat Project. Note that the code below is written in Python. By bracketing the code with the py
macro (py"""{code}"""
or py"{code}"
), Julia knows to pass that code to PyCall
directly.
The resulting map is interactive: you can zoom, pan, click the points of interest, and alternate between the two layers (by clicking on the tiles icon in the upper-right corner of the map). If you click anywhere on the map, a popup will appear showing the latitude and longitude of that position. [NOTE: IF THE MAP RENDERS WITH NO LAYER SHOWING, CLICK THE TILES ICON IN THE UPPER-RIGHT CORNER TO ACTIVATE THEM]
py""" import folium def draw_map(): coords = { 0: { "name": "Nature", "lat": 51.533925, "long": -0.121553 }, 1: { "name": "Francis Crick Institite", "lat": 51.531877, "long": -0.128767 }, 2: { "name": "MRC Laboratory for Molecular Cell Biology", "lat": 51.524435, "long": -0.132495 }, 3: { "name": "Kings College London", "lat": 51.511573, "long": -0.116083 }, 4: { "name": "Imperial College London", "lat": 51.498780, "long": -0.174888 }, 5: { "name": "Cambridge University", "lat": 52.206960, "long": 0.115034 }, 6: { "name": "Oxford University", "lat": 51.754843, "long": -1.254302 }, 7: { "name": "Platform 9 3/4", "lat": 51.532349, "long": -0.123806 } } m = folium.Map(location = [51.8561, -0.2966], tiles = 'CartoDB positron', zoom_start = 9) # add the points of interest... for key in coords.keys(): folium.CircleMarker( location=[coords[key]['lat'], coords[key]['long']], popup=coords[key]['name'], color=('crimson' if coords[key]['name'] == 'Nature' else 'blue'), fill=False, ).add_to(m) # pull in the Macrostrat tile layer folium.TileLayer(tiles='https://tiles.macrostrat.org/carto/{z}/{x}/{y}.png', attr='Macrostrat', name='Macrostrat').add_to(m) folium.LayerControl().add_to(m) # allow user to switch between map layers folium.LatLngPopup().add_to(m) # provide a lat/long popup return m """ # draw the map py"draw_map()"
Computational reproducibility
Document session for computational reproducibility.
versioninfo()
Pkg.installed()