MLJTime: Benchmarking, KNN & TSF
MLJTime is an open-source Julia toolkit for time series analysis and fully compatible with MLJ.
MLJTime was originally developed as a Google Summer of Code project in 2020. Please refer to my previous blog post for a brief introduction to MLJTime.
using Pkg
Pkg.add("Plots")
Pkg.add("CSV")
Pkg.add("StatsPlots")
Pkg.add("DataFrames")
Pkg.add(PackageSpec(url="https://github.com/alan-turing-institute/MLJTime.jl.git"))
What is Time Series Data?
Time series data is everywhere. Time series can be found in applications ranging from bedside monitoring of patients in hospitals to spectroscopy wavelength data from chemical samples. What defines a time series? I take a time series to be a sequence of (value, time stamps) tuples. I also assume the time stamps to be equally spaced.
Time series give rise to various learning tasks. By learning task I mean the problem we are trying to solve. In the standard tabular (or cross-sectional) context, examples of learning tasks are regression and classification. In a temporal data context, one of the most common learning tasks is forecasting. Here, the goal is to make a temporal forward prediction of a time series based on consistent patterns observed in the past.
Another common learning task is time series classification - the focus of this blog post. Here, the goal is to assign a class label to a time series based on consistent patterns across multiple independent instances of time series and their class labels. Compared to the tabular classification setting, the only difference is that features are no longer primitive values like numbers of categories but entire time series.
There are a wide variety of problems that can be formulated as time series classification problems. For example look at arrow head problem. The arrowhead data consists of outlines of the images of arrow heads. The classification of projectile points is an important topic in anthropology. The classes are based on shape distinctions such as the presence and location of a notch in the arrow. The shapes of the projectile points are converted into a sequence using the angle-based method as described in this blog post. Transformation of data goes from images to outlines to sequences/time series.
Why should you use MLJTime?
Most of the tabular algorithms typically do not take into account the ordered-ness (sequence) of the observations
Because there are many time series specific algorithms that tend to perform better than the tabular algorithms.
The Great Time Series Classification Bake off paper provides a detailed explanation to these algorithms.
Time Series Forest
Time Series Forest (TSF) is one of the simpler bespoke time series classifiers and hence was the first algorithm we implemented. TSF is an ensemble of tree classifiers, each constructed on a different feature space. For any given ensemble member, a set of random intervals is selected, and three summary statistics (mean, standard deviation and slope) are calculated for each interval. The statistics for each interval are concatenated to form the new feature space.
TSF is based on local feature extraction.
The steps are as follows:
1) Split the series into multiple random intervals,
using MLJTime
X, y = ts_dataset("Chinatown");
X = matrix(X)
yplain = MLJTime.MLJModelInterface.int(y)
Interval_features, Intervals = MLJTime.IntervalBasedForest.InvFeatureGen(X, 10, 3, 0);
Intervals
2) Extract features (mean, standard deviation and slope) from each interval (random part, local feature)
3) Train a decision tree on the extracted features,
MLJTime.IntervalBasedForest.build_tree(yplain, Interval_features[1])
4) Ensemble steps 1 - 3.
KNN with dynamic time wrapping (DTW)
Another bespoke approach to time series classification is K-Nearest-Neighbors together with a dynamic time warping (DTW) distance. In time series analysis, distance based methods work by measuring similarity between two entire time series. For instance, DTW can suspect if there is a shift in the peak or occurrence of an event whereas the euclidean distance just fails to do the same.
The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification.
How KNN with DTW is different than basic KNN?
Basic KNN uses Euclidian distance whcih can't be applied on series of different length.
Again with respect accuracy in finding similarities between two time series dwt does better job. Let's take an example:
Here we take a and b are two time series from first dimension of BasicMotions dataset.
X, y = load_NDdataset("BasicMotions");
a, b = X[4, :, 1], X[5, :, 1]
using Plots
plot(a, label = "series a")
plot!(b, label = "series b")
xlabel!("Time Stamps")
ylabel!("Distance")
euclidian_dist = sqrt(sum( (a .- b).^2 ) )
dtw_dist = sqrt(MLJTime.dtw_distance(a, b, -1))
Distance based methods have been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analyzed with DTW.
Benchmarking on 104 data sets
My goal for MLJTime is to provide time series classification functionality equivalent to that in a related Python package called sktime. In this section we evaluate and compare classifiers in MLJTime and their equivalents in sktime. We demonstrate correctness through equivalence of accuracy on a range of standard test problems and compare the computational time for fitting and predicting of the implementations in Julia and Python.
There are 120 univariate datasets available on timeseriesclassification.com. Here, we are using only 104 because we exclude the ones with variable length time series as the TSF implementations in MLJTime and sktime do not support variable length data yet.
using Plots
using CSV
using DataFrames
path = Benchmarking_results.txt
df = DataFrame(CSV.File(path))
In the graph below, each point represents one of the data sets.
if point lies on line y2(y = 1) the both MJTime and same have same measure speed or accuracy
if point is above line (y = 1) then MLJTime is better.
if point is below line (y = 1) then sktime is better.
plot(df.MLJ_Accuracy, df.sktime_Accuracy, seriestype = :scatter, legend = false)
plot!([0,1], [0, 1]) # diagonal line
xlabel!("MLJ accuracy")
ylabel!("sktime accuracy")
In the following graph, we compare at the total computational time for fitting and predicting for each data set. With just glance we can say that Julia is faster almost on all of the datasets.
df[!, "n_datapoints"] = df.series_length .* df.test_instsnce
sort!(df, [:n_datapoints], rev=true)
plot((((df.MLJ_fit + df.MLJ_predict) .- (df.sktime_fit + df.sktime_predict))./ (df.MLJ_fit + df.MLJ_predict)), seriestype = :bar, legend = false)
# plot!([0, 1], [1, 1])
xlabel!("Data Sets odered by n_datapoints")
ylabel!("Time in seconds")
plot(df.MLJ_fit + df.MLJ_predict, label = "MLJTime total time")
plot!(df.sktime_fit + df.sktime_predict, label = "sktime total time")
xlabel!("Data Sets")
ylabel!("Time in seconds")
As you can see fit time is better in MLJTime (Julia) for almost all of the datasets.
plot(df.MLJ_predict, label = "MLJTime predict time")
plot!(df.sktime_predict, label = "sktime predict time")
xlabel!("Data Sets")
ylabel!("Time in seconds")
After sorting the benchmark with number of test instances we find that as test instances increases Factor predict time decreases & for very high number (In graph last 100 - 104) sktime tend to perform better.
sort!(df, [:test_instsnce])
plot(df.sktime_predict./df.MLJ_predict, seriestype = :scatter, label = "Factor predict time")
plot!([1,104], [1, 1], label = "{y =1 line}")
xlabel!("Data Sets (ordered by number of test instances)")
ylabel!("Number of times faster is MLJTime")
plot(df.MLJ_predict, label = "MLJTime predict time")
plot!(df.sktime_predict, label = "sktime predict time")
xlabel!("Data Sets")
ylabel!("Time in seconds")
After sorting the benchmark with series_length.
sort!( df, [:series_length])
plot(df.MLJ_predict, label = "MLJTime predict time")
plot!(df.sktime_predict, label = "sktime predict time")
xlabel!("Data Sets")
ylabel!("Time in seconds")
MLJTime is better in fit time for all of the datasets by 3.4 times.
plot(df.sktime_fit./df.MLJ_fit, seriestype = :scatter, label = "Factor fit time")
plot!([1,104], [1, 1], label = "{y =1 line}")
xlabel!("Data Sets")
ylabel!("MLJTime is Number of times faster")
sum(df.sktime_fit./df.MLJ_fit)/length(df.MLJ_fit)
using StatsPlots
df boxplot(:MLJ_fit, marker=(0.5,:orange, stroke(2)), label = "MLJTime fit time")
df boxplot!(:sktime_fit, marker=(0.5, :black, stroke(2)), label = "sktime fit time")
ylabel!("Time in Seconds")
xlabel!("Packages")
df boxplot(:MLJ_fit + :MLJ_predict, marker=(0.5,:orange, stroke(2)), label = "MLJTime fit time")
df boxplot!(:sktime_fit + :sktime_predict, marker=(0.5, :black, stroke(2)), label = "sktime fit time")
ylabel!("Time in Seconds")
xlabel!("Libraey")
df boxplot((:MLJ_fit + :MLJ_predict) - (:sktime_fit + :sktime_predict), marker=(0.5,:orange, stroke(2)), label = "MLJTime fit time")
I find that there is no significant difference in accuracy. Still there is difference of +/- 0.05 causing us some pain in wiring integration tests. We suspect this difference is because of the randomness introduced inside the DecisionTree.jl package.
Future Work
Support for multivariate time series,
Shapelet based classification algorithms,
Enhancements to KNN (KDTree and BallTree algorithms),
Forecasting framework.
Acknowledgments
This project was originally developed as part of the Google Summer of Code 2020 with the support of the Julia community and my mentors Sebastian Vollmer and Markus Löning.
Active maintainers:
Markus Löning
Sebastian Vollmer