GSoC 2020: Leveraging Hugging Face Transformers package in Julia

I was very lucky to be part of the GSoC 2020, Google Summer of Code. It has been a year since the previous JSoC 2019, during which I implemented the Google Bert model with JuliaLang and Flux.jl. you can found previous blogs here. And the result is at the Transformers.jl package.

Until today, there has been a lot of researches and projects build on top of bunch of Transformer-like models and many new pre-trained tasks and models are keep coming up. Under this trend, we see one company actually shining through. The Hugging Face and it's transformers package. More and more people are using huggingface/transformers and sharing their models on Hugging Face's model platform.

Unfortunately, the package is build on top of Python and mainly PyTorch. As a JuliaLang user, the preference is clear. But if I want to use the shared pretrained model or even share my own model to others, using Python seems to be a must do.

That's why I start to do this project. I hope this could allow Julia user to join the fast growing trend, as the old saying goes, "Join them if you can't beat them."

Main Goal

The main goal of the project can be set as three parts:

  1. a loader for huggingface/transformers' pretrain model.

  2. a few reimplementations of transformer-like model in JuliaLang.

  3. a saver to save the model in a python loadable format that can be directly load by huggingface/transformers.

Each part will be described in the following sections.


The first part is the loader. As we mentioned above, we want to use the pretrained model from huggingface/transformers but still writing JuliaLang. To do so, we need to check the saved model format of huggingface/transformers. Fortunately, huggingface/transformers save the model with the standard PyTorch and state_dict api, which means it's saved in a Python pickle compatible format and it only contain the model weights. There are no actual architecture inside the saved file, so we don't need to bother translating between Flux.jl and PyTorch automatically.

Then the problem become "How can I load a Python pickle like object in Julia?". Using PyCall.jl? No, no. That's a overkill under current situation. Therefore, I made another package named Pickle.jl. I might write another blog for the introduce the pickle format, but right now let's just see how it fit in with the loader situation

Pickle.jl is registered, so we can simply add it with Pkg.

using Pkg
pkg"add Pickle"

And then using the Pickle.Torch for loading torch pickles.

using Pickle
using Pickle.Torch

To show it works, I download a saved model from hugginface. Here I use Albert for demonstration since the model size is smaller.

Bash in Julia
ls albert-base-v1-pytorch_model.bin
Bash in Julia

Then use the Torch.THload to load the weights, or state_dict in torch terms.

weights = Torch.THload("./albert-base-v1-pytorch_model.bin")

You can see that it can be load without any problem. the next part for the loader be rebuilding the model object and then assign these weights to the correct place.

Reimplement models

This part is quite straightforward. Since we don't have any exist implementation and there are no model architecture in the saved file. For the period during GSoC 2020, I plan to reimplement the following models:

  • GPT2

  • XLM

  • RoBERTa

  • DistalBERT

  • CTRL


If we have enough time, these will also be covered

  • Transformers-XL

  • XLNet

  • Reformer


As we see in the previous Loader part. the saved model only contain the model weight and weight names. Once we have the implemented model, the only thing we need to do is to add a proper name to each weight and then save them in torch's pickle like format. After that, huggingface/transformers should be able to load it with proper model type without problems.

Secondary Goal

Beside the basic functionality, I also plan to port some of the tutorials/tools from huggingface/transformers so that people can easily turn to use JuliaLang.


This GSoC project aim to bridge the gap between Julia user and the current NLP research trend. With it, we should be able to: 1. get a new released pretrain model from Hugging Face 2. fine-tune the model with JuliaLang or use it in another Julia project 3. Release you custom model to Hugging Face platform so that other people can also share your result.

This is just a basic introduction of the project. The process will be covering in the next blog. If you have any question or encounter any problem with Transformers.jl, you can post an issue or tag me on the Julia slack/discourse with @chengchingwen.

Runtimes (1)