GSoC 2020: Leveraging Hugging Face Transformers package in Julia
I was very lucky to be part of the GSoC 2020, Google Summer of Code. It has been a year since the previous JSoC 2019, during which I implemented the Google Bert model with JuliaLang and Flux.jl. you can found previous blogs here. And the result is at the Transformers.jl package.
Until today, there has been a lot of researches and projects build on top of bunch of Transformer-like models and many new pre-trained tasks and models are keep coming up. Under this trend, we see one company actually shining through. The Hugging Face and it's transformers package. More and more people are using huggingface/transformers and sharing their models on Hugging Face's model platform.
Unfortunately, the package is build on top of Python and mainly PyTorch. As a JuliaLang user, the preference is clear. But if I want to use the shared pretrained model or even share my own model to others, using Python seems to be a must do.
That's why I start to do this project. I hope this could allow Julia user to join the fast growing trend, as the old saying goes, "Join them if you can't beat them."
Main Goal
The main goal of the project can be set as three parts:
a loader for huggingface/transformers' pretrain model.
a few reimplementations of transformer-like model in JuliaLang.
a saver to save the model in a python loadable format that can be directly load by huggingface/transformers.
Each part will be described in the following sections.
Loader
The first part is the loader. As we mentioned above, we want to use the pretrained model from huggingface/transformers but still writing JuliaLang. To do so, we need to check the saved model format of huggingface/transformers. Fortunately, huggingface/transformers save the model with the standard PyTorch torch.save
and state_dict
api, which means it's saved in a Python pickle compatible format and it only contain the model weights. There are no actual architecture inside the saved file, so we don't need to bother translating between Flux.jl and PyTorch automatically.
Then the problem become "How can I load a Python pickle like object in Julia?". Using PyCall.jl? No, no. That's a overkill under current situation. Therefore, I made another package named Pickle.jl. I might write another blog for the introduce the pickle format, but right now let's just see how it fit in with the loader situation
Pickle.jl is registered, so we can simply add it with Pkg.
using Pkg
pkg"add Pickle"
And then using the Pickle.Torch
for loading torch pickles.
using Pickle
using Pickle.Torch
To show it works, I download a saved model from hugginface. Here I use Albert for demonstration since the model size is smaller.
wget https://cdn.huggingface.co/albert-base-v1-pytorch_model.bin
ls albert-base-v1-pytorch_model.bin
Then use the Torch.THload
to load the weights, or state_dict
in torch terms.
weights = Torch.THload("./albert-base-v1-pytorch_model.bin")
You can see that it can be load without any problem. the next part for the loader be rebuilding the model object and then assign these weights to the correct place.
Reimplement models
This part is quite straightforward. Since we don't have any exist implementation and there are no model architecture in the saved file. For the period during GSoC 2020, I plan to reimplement the following models:
GPT2
XLM
RoBERTa
DistalBERT
CTRL
XLM-RoBERTa
If we have enough time, these will also be covered
Transformers-XL
XLNet
Reformer
Saver
As we see in the previous Loader part. the saved model only contain the model weight and weight names. Once we have the implemented model, the only thing we need to do is to add a proper name to each weight and then save them in torch's pickle like format. After that, huggingface/transformers should be able to load it with proper model type without problems.
Secondary Goal
Beside the basic functionality, I also plan to port some of the tutorials/tools from huggingface/transformers so that people can easily turn to use JuliaLang.
Conclusion
This GSoC project aim to bridge the gap between Julia user and the current NLP research trend. With it, we should be able to: 1. get a new released pretrain model from Hugging Face 2. fine-tune the model with JuliaLang or use it in another Julia project 3. Release you custom model to Hugging Face platform so that other people can also share your result.
This is just a basic introduction of the project. The process will be covering in the next blog. If you have any question or encounter any problem with Transformers.jl, you can post an issue or tag me on the Julia slack/discourse with @chengchingwen.