Plotting in Nextjournal
Data visualization is at the core of Nextjournal. With no installation required, Nextjournal allows you to create beautiful, interactive graphics across multiple langauges within the same article.
hist(rnorm(200, 1), breaks=100, main = "Random Deviates of a Normal Distribution", xlab = "Events")
The output of the previous cell can be referenced in a Bash cell. The
file command provides more insight into the nature of the visualization:
file -b Imagehistogram↩
The output, called
svg file. Nextjournal can display other datatypes:
file -b Imagepng-output↩
Works as expected.
svg file and
Take note of
png('/results/my-plot.png') in the code cell above. Nextjournal will attempt to work with any file added to the
/results directory. This may be uploaded data or images or, as in this case, the result of a calculation. For more information on
/results, refer to Understanding Results.
Working With Data
There are several ways to work with data on Nextjournal. You can
- Use the ➕ insert menu or the ··· action menu and select File to upload data
- Use command line tools like
- Use output generated from code cells
This example will use the first option. Both the ➕ insert menu and the ··· action menu are exposed when selecting or hovering over article elements like paragraphs or code cells.
The Default R Graphics Package
This first example uses
smoothScatter() function to plot the birth year of artists represented in the Tate Museum's permanent collection. Note that
graphics does not require the loading of any dependencies.
artists <- read.csv(artist_data.csv↩, header=T) born <- artists$yearOfBirth birth_distribution = smoothScatter(born, 1:length(born), axes=FALSE, xlab="Year", ylab="", main="Distribution of Artist's Birth Years at the Tate") axis(1, col.ticks="blue") birth_distribution
Working With Dependencies
ggplot2 are external dependencies that offer more features than the default R graphics.
tidyverse collection of R packages, which includes two dependencies used in the upcoming sections,
plotly package provides two important plotting functions,
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.
artists <- read_csv(artist_data.csv↩) born <- artists$yearOfBirth df <- data.frame(born) ggplot(df, mapping=aes(x = born, y = as.numeric(row.names(df)))) + geom_point(size=2.2, alpha=0.4, shape=15) + labs(x = "Year", y=element_blank(), title = "Distribution of Artist's Birth Years at the Tate", subtitle = "From the Museum's Permanent Collection") + theme_bw() + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), panel.grid.minor=element_blank(), panel.grid.major.y=element_blank())
This histogram compares the acquisition of male artists of a certain age versus female artists. The interactivity
plotly offers is especially important here because the data entirely overlaps. Turning off the male histogram gives a better sense of the growth of female acquisition; turning on the male histogram shows how far institutions have yet to go.
artists <- read_csv(artist_data.csv↩) female_artists <- artists[artists$gender == "Female",] male_artists <- artists[artists$gender == "Male",] plot_ly(alpha=0.6) %>% add_histogram(data=female_artists, x=~yearOfBirth, name="Females") %>% add_histogram(data=male_artists, x=~yearOfBirth, name="Males") %>% layout(barmode="overlay", xaxis=list(title="Year of Birth"))
ggplot2 generates static plots; however, if the
plotly package is loaded then you can convert your
plotly ones via the
ggplotly() function. In this way you can gain some of
plotly's interactive functionality.
artists <- read_csv(artist_data.csv↩) born <- artists$yearOfBirth df <- data.frame(born) id <- as.numeric(row.names(df)) ggplotly(ggplot(df, mapping=aes(x = born, y = id)) + geom_point(size=1.5, alpha=0.4, shape=15) + labs(x = "Year", y="", title = "Distribution of Artist's Birth Years at the Tate") + theme_bw() + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), panel.grid.minor=element_blank(), panel.grid.major.y=element_blank()))
A Nextjournal cell can show multiple graphs—the runner will detect each new figure automatically and display them in order.
artworks <- read_csv(artwork_data.csv↩) drop <- c("accession_number", "artistRole", "artistId", "dateText", "creditLine", "units", "inscription", "thumbnailCopyright", "thumbnailUrl", "url") artworks_rem <- artworks[ , !(names(artworks) %in% drop)] artworks_size <- artworks_rem[!(is.na(artworks_rem$height & artworks_rem$width & artworks_rem$year)), ] artworks_size$size <- artworks_size$height * artworks_size$width metal <- artworks_size[artworks_size$medium == "Steel" | artworks_size$medium=="Bronze",] plot_ly(data=metal, x=~acquisitionYear, name="Sculptural Acquisitions") plot_ly(data=metal, x=~year, y=~acquisitionYear, z=~size, color=~medium, colors = c('#BF382A', '#0C4B8E'), text=~artist, marker=list(size=4, opacity=0.5)) %>% add_markers() %>% layout(scene = list(xaxis = list(title = 'Year Created'), yaxis = list(title = 'Year of Acquisition'), zaxis = list(title = 'Size')), annotations = list( x = 1.13, y = 1.05, text = 'Material', xref = 'paper', yref = 'paper', showarrow = FALSE ))
matplotlib is a library for making 2D plots of arrays in Python.
import matplotlib import matplotlib.pyplot as plt import numpy as np import pandas as pd artwork_data = pd.read_csv(artwork_data.csv↩) artwork_data.drop(columns=["accession_number", "artistRole", "artistId", "dateText", "acquisitionYear", "dimensions", "width", "height", "depth", "creditLine", "units", "inscription", "thumbnailCopyright", "thumbnailUrl", "url"]) # Drop the rows listed as NaN, otherwise indexing oil, acrylic, and watercolour artworks yeild the error "ValueError: cannot index with vector containing NA / NaN values." Replace this line with something more sensible to get a more complete dataset. artwork_data.dropna(subset=['medium'],inplace=True) artwork_data["year"] = pd.to_numeric(artwork_data["year"], errors="coerce") oil=artwork_data[artwork_data["medium"].str.contains("oil", case=False)] acrylic=artwork_data[artwork_data["medium"].str.contains("acrylic", case=False)] watercolour=artwork_data[artwork_data["medium"].str.contains("watercolour", case=False)] fig, ax = plt.subplots() ax.set(xlabel='year', ylabel='number of works', title='Paintings at the Tate, by Medium') ax.hist([oil["year"], acrylic["year"], watercolour["year"]], stacked=True) fig
plotly's Python graphing library wraps
matplotlib to create interactive, publication-quality graphs online.
import pandas as pd # plotly imports import plotly.plotly as py import plotly.figure_factory as ff # plotly.graph_objs contains all the helper classes to make/style plots import plotly.graph_objs as go artist_data = pd.read_csv(artist_data.csv↩) # Display the first 12 rows and 3 columns of the dataframe ff.create_table(artist_data.iloc[:12,:3], index=False)
Plot two histograms that compare the number of male artists in the Tate collection as compared to the number of female artists, distributed by their year of birth.
import numpy as np artist_data = pd.read_csv(artist_data.csv↩) male = artist_data['gender'] == 'Male' female = artist_data['gender'] == 'Female' trace1 = go.Histogram( x=np.array((artist_data[female]['yearOfBirth'])), name='Female') trace2 = go.Histogram( x=np.array((artist_data[male]['yearOfBirth'])), name='Male') trace_data = [trace1, trace2] layout = go.Layout( bargroupgap=0.3) go.Figure(data=trace_data, layout=layout)
Note that the data points can be hovered over to view the data for each, both here and in the published view. Traces can also be toggled on and off by clicking in the legend.
plots offers the most flexible way to visualize data using Julia in Nextjournal. This preinstalled library provides a unified interface to different plotting libraries, including
plotly graphs are interactive, while
gr is faster for large data sets.
using Plots; plotly()
scatter(rand(10), rand(10), title="Plot.ly Backend")
using Plots; gr()
gr produces a
png file which is displayed by Nextjournal.
scatter(rand(10), rand(10), title="GR Backend")