Yash Patel / Aug 29 2019
Remix of Julia by Nextjournal

JSoC 2019: Journey and Conclusion

Julia Season of Contributions 2019 has come to its end. There are things I want to discuss about what I achieved and what I couldn't do. My proposal was to implement ULMFiT model in Julia Language for Binary and Fine-grained sentiment analysis. Below are some glimpse of what the implementation looks like.

Implementation Details

The whole implementation will be inside ULMFiT folder inside TextAnalysis.jl repository. It is divided mainly into three files for language model pretraining, language model fine-tuning and classifier fine-tuning. It is still a PR to TextAnalysis.jl (PR #168). The API is ready to train model for any kind of Text classification task. As an example, it is trained for binary sentiment classification, but since the accuracy is about 83% (which is expected to be around 90%), so it is being trained more on IMDB movie review dataset to get desired accuracy.

Usage

Fine-tuning a classifier on ULMFiT pretrained Language model is easy to follow but might take time to converge. It all depends upon your task, desired accuracy and hyper-parameters chosen. I will got step-wise through the process of training:

Step 1:

First initialize an instance of LanguageModel struct with desired hyperparameters (dropouts and layer sizes) if you want to train a Language Model form scratch or else the pretrained Language model can be loaded:

"""
LanguageModel(load_pretrained::Bool=false, vocabpath::String="vocabs/lm_vocab.csv"
embedding_size::Integer=400, 
hid_lstm_sz::Integer=1150,
out_lstm_sz::Integer=embedding_size,    
embed_drop_prob::Float64 = 0.05, 
in_drop_prob::Float64 = 0.4, 
hid_drop_prob::Float64 = 0.5, 
layer_drop_prob::Float64 = 0.3, 
final_drop_prob::Float64 = 0.3)
"""
# To make an instance
lm = LanguageModel(false) # or lm = LanuguageModel()

# To load pretrained model
lm = LanguageModel(true)

To train a new Language model use pretrain_lm! function with proper arguments. [Also refer this blog]

pretrain_lm!(lm::LanguageModel=LanguageModel(), data_loader::Channel=load_wikitext_103; base_lr=0.004, epochs::Integer=1, checkpoint_iter::Integer=5000)

Step 2:

After training or loading pretrained Language Model, it should be fine-tuned for some epochs so the model can have a taste of what task it will be used for and can learn some features related to the task. The training procedure is identical to what we did while pretraining just the data used will be different. But, to fine-tune language model use fine_tune_lm! function instead of pretrain_lm!, because if contains some optimizations which are useful for fine-tuning. [Also refer this blog]

fine_tune_lm!(lm::LanguageModel, data_loader::Channel=imdb_fine_tune_data, stlr_cut_frac::Float64=0.1, stlr_ratio::Float32=32, stlr_η_max::Float64=4e-3; epochs::Integer=1, checkpoint_itvl::Integer=5000)

Step 3:

Here comes the last step and very critical step in training of a text classifier. For this step, first make an instance of the TextClassifier struct with the pretrained LanguageModel struct. Also, remember the data used here will in different format since the labels are corresponding to an example in the dataset, we need to consturct a Channel which can output text and labels separately. Use data_loader function for this, after pre-processing and saving the data in Vector of Vectors format in which each contain text as Document (refer TextAnalysis.jl docs to know what are Documents) and labels. Also, you need to pass a vector of all labels to the data_loader, it returns a Channel which is to be given as an argument to the train_classifier! function along with TextClassifier instance.

"""
TextClassifier(lm::LanguageModel=LanguageModel(), 
clsfr_out_sz::Integer=1, 
clsfr_hidden_sz::Integer=50, 
clsfr_hidden_drop::Float64=0.4)
"""
# Making instance using pretrained language model (lm)
tc = TextClassifier(lm)

train_classifier!(classifier::TextClassifier=TextClassifier(), classes::Integer=1, data_loader::Channel=imdb_classifier_data, hidden_layer_size::Integer=50; stlr_cut_frac::Float64=0.1, stlr_ratio::Number=32, stlr_η_max::Float64=0.01, val_loader::Channel=nothing, cross_val_batches::Union{Colon, Integer}=:, epochs::Integer=1, checkpoint_itvl=5000)

These steps tell my journey of JSoC 2019 in an indirect way, where first I started coding for pretraining language model then I trained a language model. After that, I started coding for fine-tuning part in paralle with pretraining, then used that for fine-tuning the language model for binary sentiment classification. At last I wrote code for the classifier. Here, the coding part was done but after that I tried to train model for several times for fine-tuning steps to get desired accuracy. This took major time of my JSoC period because the model was big and took time to get trained. But finally I got convergence and for the sentiment classification I am still getting convergence, which lets me know that the implementation is a success. [Also refer this blog]

What I couldn't do

My JSoC 2019 proposal was fully focused to get the classifiers for sentiment classification tasks done. But later I realized that writing an API would be a better contribution. So I focused primarily on that. Maybe because of this I could not complete the fine-grained sentiment classification task. But this is case for the JSoC period, I will surely add a fine-grained sentiment classifier trained by ULMFiT API in a short while.

What's next??

  1. As I have stated above my first priority is to get an better binary sentiment classifier (90% accurate)
  2. Adding some more classifiers, example, classifier for question classification, topic classifier etc , as many as from this list.
  3. Fine-grained sentiment classifier, I am keeping this later because I am planning to train it on Yelp fine-grained dataset, which is bigger to load to the machine I am training on, so I will have to think of something to get that done first.

In my proposal, I stated that if time permits I will add QRNN (Quasi-Recurrent Neural Network) to the model instead of using normal LSTM units and also Bi-directionality to the LSTM used now. But since all the time was consumed by the training and rectifying it. I could not go to these parts. The above listed tasks are all training basically, which will take time. So, in parallel to these I will work on QRNN and bidirectional LSTM for API. Therefore, the JSoC 2019 is over but there is still a long way to reach what I actually wanted to see in this API.

Conclusion

It feels great that there is a working API available to make use of such a successful model in NLP field to which I have contributed. The three simple steps above can give you a highly accurate text classifier. With this satisfaction, I conclude my work in this season of JSoC 2019.

I thank whole Julia community to give me this great opportunity to contribute to Julia and improve my skills. I learnt a lot about open-source and how to work in a community to develop something meaningful. I thank my mentor Avik Sengupta to guide and help me though out the my JSoC period and to all the members who helped me to figure out solutions whenever I got stuck.

Although, the season is over, but remember, this is just a trailer ;-)