Training neural networks using Flux.jl [Master]

using Fluxusing CSV using DataFramesusing Randomusing Statisticsusing StatsPlots

84.3s

Julia

# Generate a temporary file pathtmp = tempname()

0.7s

Julia

"/tmp/jl_3hKpZj"

# Download data in the temporary file path from the UCI websitedownload("https://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt", tmp)

2.2s

Julia

"/tmp/jl_3hKpZj"

# Read the seeds dataset# Values are separated by one or more tabulation# There are no missing values# There are no column namesseeds = dropmissing(CSV.read(tmp; header=0, delim='\t'))

1.2s

Julia

# Name the variables (measures of wheat kernels = grains) # 1 = area (A)# 2 = perimeter (P)# 3 = compactness (C = 4*pi*A/P^2)# 4 = length of kernel# 5 = width of kernel# 6 = asymmetry coefficient# 7 = length of kernel groove# 8 = cultivar (1, 2 or 3) : variety of wheatrename!(seeds,  [:Column1 => :area, :Column2 => :perimeter,  :Column3 => :compactness, :Column4 => :kernel_length,  :Column5 => :kernel_width, :Column6 => :asymmetry,  :Column7 => :kernel_groove, :Column8 => :cultivar]  )

0.2s

Julia

# Set seed for replicabilityRandom.seed!(42)

1.6s

Julia

# Number of samples in training set# Around 70% of datan_training = convert(Int64, round(0.7*size(seeds, 1);digits=0))

0.2s

Julia

139

# Indices of training and testing sets# Training set: n unique random indices# Testing set: other indicesseeds = seeds[shuffle(1:end), :]

0.8s

Julia

# Training setstrn_sets = seeds[1:n_training, :]

0.7s

Julia

# Testing setstst_sets = seeds[n_training:end, :]

0.6s

Julia

# Build training set for predictors (features)trn_features = transpose(convert(Matrix, trn_sets[:, 1:(end-1)]))

0.4s

Julia

7×139 Transpose{Float64,Array{Float64,2}}: 12.73 11.48 16.23 13.74 … 14.69 20.24 11.14 14.49 13.75 13.05 15.18 14.05 14.49 16.91 12.79 14.61 0.8458 0.8473 0.885 0.8744 0.8799 0.8897 0.8558 0.8538 5.412 5.18 5.872 5.482 5.563 6.315 5.011 5.715 2.882 2.758 3.472 3.114 3.259 3.962 2.794 3.113 3.533 5.876 3.769 2.932 … 3.586 5.901 6.388 4.116 5.067 5.002 5.922 4.825 5.219 6.188 5.049 5.396

# Build testing set for predictors (feautures)tst_features = transpose(convert(Matrix, tst_sets[:, 1:(end-1)]))

0.2s

Julia

7×61 Transpose{Float64,Array{Float64,2}}: 14.49 12.74 15.99 12.44 … 17.08 13.22 15.78 13.54 14.61 13.67 14.89 13.59 15.38 13.84 14.91 13.85 0.8538 0.8564 0.9064 0.8462 0.9079 0.868 0.8923 0.8871 5.715 5.395 5.363 5.319 5.832 5.395 5.674 5.348 3.113 2.956 3.582 2.897 3.683 3.07 3.434 3.156 4.116 2.504 3.336 4.924 … 2.956 4.157 5.593 2.587 5.396 4.869 5.144 5.27 5.484 5.088 5.136 5.178

# 1. Build training set for the predicted variable (cultivars)# 2. Transform the cultivar variable into 3 columns (one-hot encoded)			# Rows are types of cultivar			# Columns are training samples			# Sorting labels allows corresponding rows to refer to the same cultivar trn_cultivar = trn_sets[:, end]trn_labels = Flux.onehotbatch(trn_cultivar, sort(unique(trn_cultivar)))

0.2s

Julia

3×139 OneHotMatrix{Array{OneHotVector,1}}: 1 0 0 1 0 1 0 0 0 0 1 0 1 … 0 0 0 0 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0

# 1. Build testing set for the predicted variable (cultivars)# 2. Transform the cultivar variable into 3 columns (one-hot encoded)			# Rows are types of cultivar			# Columns are testing samples			# Sorting labels allows corresponding rows to refer to the same cultivar tst_cultivar = tst_sets[:, end]tst_labels = Flux.onehotbatch(tst_cultivar, sort(unique(tst_cultivar)))

0.3s

Julia

3×61 OneHotMatrix{Array{OneHotVector,1}}: 1 1 0 0 0 0 0 1 1 1 0 0 0 … 1 1 0 0 1 0 0 0 1 1 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0

# Simple model # Fully collected layer of 7 features and 3 possible outputs# Result: output node with the highest score (softmax)# Untrained modelone_layer = Chain(Dense(7, 3), softmax)

0.2s

Julia

Chain(Dense(7, 3), softmax)

# Train the model with a gradient descent optimiser # First-order optimization algorithm dependent on the first order derivative of a loss function. # How to alter the weights so that the loss function can reach a local minima# Low learning rate of 0.01optimizer = Descent(0.01)

0.2s

Julia

Descent(0.01)

# Loss function (cross entropy)loss(x, y) = Flux.crossentropy(one_layer(x), y)

0.1s

Julia

loss (generic function with 1 method)

# Data iterator to handle training epochs # Every element in data_e represent one epoch# One epoch = one forward and backward pass of all the training examplesdata_e = Iterators.repeated((trn_features, trn_labels), 2000)

0.2s

Julia

Take{Repeated{Tuple{Transpose{Float64,Array{Float64,2}},OneHotMatrix{Array{OneHotVector,1}}}}}(Repeated{Tuple{Transpose{Float64,Array{Float64,2}},OneHotMatrix{Array{OneHotVector,1}}}}(([12.73 11.48 … 11.14 14.49; 13.75 13.05 … 12.79 14.61; … ; 3.533 5.876 … 6.388 4.116; 5.067 5.002 … 5.049 5.396], Bool[1 0 … 0 1; 0 0 … 0 0; 0 1 … 1 0])), 2000)

# Train model Flux.train!(loss, params(one_layer), data_e, optimizer)

1.1s

Julia

# Accuracymean(Flux.onecold(one_layer(trn_features)) .== Flux.onecold(trn_labels))

0.2s

Julia

0.942446

# Confusion matrix# Predicted in rows, reference in columns# Most of the values are on the diagonal (which is good)function confusion_matrix(ft, lb)  plb = Flux.onehotbatch(Flux.onecold(one_layer(ft)), 1:3)  lb * plb'endconfusion_matrix(tst_features, tst_labels)

0.1s

Julia

3×3 Array{Int64,2}: 20 3 2 0 20 0 2 0 14

# Add one hidden layer with 14 nodes# Sigmoid activation in the input layerhidden_size = 14model = Chain(  Dense(7, hidden_size, σ),  Dense(hidden_size, 3),  softmax  )

0.2s

Julia

Chain(Dense(7, 14, σ), Dense(14, 3), softmax)

# Define loss function v2_loss(x, y) = Flux.crossentropy(model(x), y)

0.2s

Julia

v2_loss (generic function with 1 method)

# Data iterator to handle training epochs rather than looping# Every element in data_e represent one epochdata_e = Iterators.repeated((trn_features, trn_labels), 2000)

0.2s

Julia

Take{Repeated{Tuple{Transpose{Float64,Array{Float64,2}},OneHotMatrix{Array{OneHotVector,1}}}}}(Repeated{Tuple{Transpose{Float64,Array{Float64,2}},OneHotMatrix{Array{OneHotVector,1}}}}(([12.73 11.48 … 11.14 14.49; 13.75 13.05 … 12.79 14.61; … ; 3.533 5.876 … 6.388 4.116; 5.067 5.002 … 5.049 5.396], Bool[1 0 … 0 1; 0 0 … 0 0; 0 1 … 1 0])), 2000)

# Train model Flux.train!(v2_loss, params(model), data_e, optimizer)

1.1s

Julia

# Accuracymean(Flux.onecold(model(tst_features)) .== Flux.onecold(tst_labels))

0.2s

Julia

0.836066

# Confusion matrix# Worse than previous model function v2_confusion_matrix(ft, lb)  plb = Flux.onehotbatch(Flux.onecold(model(ft)), 1:3)  lb * plb'endv2_confusion_matrix(tst_features, tst_labels)

0.2s

Julia

3×3 Array{Int64,2}: 17 3 5 0 20 0 2 0 14

Training neural networks using Flux.jl [Master]

1. Load packages

2. Download and clean data (seeds dataset)

3. Split dataset into testing and training sets

4. Single-layer neural network

Build and train model

Accuracy

5. Deep neural network

Build and train model

Accuracy

Acknowledgment

More open resources on neural network

Runtimes (1)