Working examples
Iris dataset
The Iris dataset, or sometimes called the Anderson's iris dataset, comprises the measure ments of four variables (sepal and petal width and length) from 150 plants belonging to three different species: Iris setosa, Iris virginica, and Iris versicolor. This dataset was analised by Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis" (Gedeon, 2003).

]add Turing, RDatasets, StatsPlots, MLDataUtils, NNlib
# Load Turing.
using Turing
# Load RDatasets.
using RDatasets
# Functionality for splitting and normalizing the data.
using MLDataUtils: shuffleobs, splitobs, rescale!
# We need a softmax function which is provided by NNlib.
using NNlib: softmax
# Set a seed for reproducibility.
using Random
Random.seed!(0)
# Hide the progress prompt while sampling.
Turing.setprogress!(false);
# Load StatsPlots for visualizations and diagnostics.
using StatsPlots
# Import the "iris" dataset.
data = RDatasets.dataset("datasets", "iris");
# Show twenty random rows.
data[rand(1:size(data, 1), 20), :]
# Recode the `Species` column.
species = ["setosa", "versicolor", "virginica"]
data[!, :Species_index] = indexin(data[!, :Species], species)
# Show twenty random rows of the new species columns
data[rand(1:size(data, 1), 20), [:Species, :Species_index]]
# Split our dataset 50%/50% into training/test sets.
trainset, testset = splitobs(shuffleobs(data), 0.5)
# Define features and target.
features = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]
target = :Species_index
# Turing requires data in matrix and vector form.
train_features = Matrix(trainset[!, features])
test_features = Matrix(testset[!, features])
train_target = trainset[!, target]
test_target = testset[!, target]
# Standardize the features.
μ, σ = rescale!(train_features; obsdim = 1)
rescale!(test_features, μ, σ; obsdim = 1);
# Bayesian multinomial logistic regression
function logistic_regression(x, y, σ)
n = size(x, 1)
length(y) == n || throw(DimensionMismatch("number of observations in `x` and `y` is not equal"))
# Priors of intercepts and coefficients.
intercept_versicolor ~ Normal(0, σ)
intercept_virginica ~ Normal(0, σ)
coefficients_versicolor ~ MvNormal(4, σ)
coefficients_virginica ~ MvNormal(4, σ)
# Compute the likelihood of the observations.
values_versicolor = intercept_versicolor .+ x * coefficients_versicolor
values_virginica = intercept_virginica .+ x * coefficients_virginica
for i in 1:n
# the 0 corresponds to the base category `setosa`
v = softmax([0, values_versicolor[i], values_virginica[i]])
y[i] ~ Categorical(v)
end
end;
chain = sample(logistic_regression(train_features, train_target, 1), HMC(0.05, 10), MCMCThreads(), 1500, 4)
plot(chain)
GSoC 2021 work product
1. Violin plots
Violin plots are similar to box plots, with the addition of a rotated kernel density plot on one or both sides. Use the call plot(chain::Chains; kwargs...)
with seriestype = :violinplot
, or the shorthands version violinplot(chain::Chains; kwargs...)
for plotting. Use the kwarg colordim
to create violin plots grouped by chains (colordim = :chains
) or by parameters (colordim = :parameters
).
violinplot(chain; colordim = :chain)
violinplot(chain; colordim = :parameters)
If the kwarg combined = true
, chains are appended and only one plot per parameter is returned. In this case colordim := :chain
. Otherwise (combined = false
), a violin plot is returned as defined by colordim
.
NOTE: Discrete parameters are plotted as defined in StatsPlots.jl.
For plotting multiple parameters, Ridgeline, Forest and Caterpillar plots can be useful.
2. Ridgeline
Given a chain
object, ridgelineplot(chain::Chains, par_names::Vector{Symbol}; kwrags...)
returns a Ridgeline plot for the sampled parameters specified on par_names
.
For ridgelineplots, the following attributes are defined:
ridgelineplot(chain, chain.name_map[:parameters])
(a) Fill
Fill area below the curve can be determined by quantiles interval (fill_q = true
) orhdp interval (fill_hpd = true
). Default options are fill_hpd = true
and fill_q = false
. If both fill_q = false
and fill_hpd = false
, then the whole area below the curve will be filled. If no fill color is desired, it should be specified with series attributes. These fill options are mutually exclusive.
(b) Mean and median
A vertical line can be plotted repesenting the mean (show_mean = true
) or median (show_median = true
) of the density (kde) distribution. Both options can be plotted at the same time.
(c) Intervals
At the bottom of each density plot, a quantile interval (show_qi = true
) or HPD interval (show_hdpi = true
) can be plotted. These options are mutually exclusive. Default options are show_qi = false
and show_hpdi = true
. To plot quantile intervals, the values specified as q
will be taken, and for HPD intervals, only the smaller value specified in hpd_val
will be used.
Note: When one parameter is given, it will be plotted as a density plot with all the elements described above.
3. Forest and Caterpillar plots
References
Gideon, T. D. (2003). AI 2003: Advances in Artificial Intelligence: 16th Australian Conference on AI, Perth. Springer Science & Business Media. 1075 pp.