GSoC 2020-Blog#1(End of Phase 1): A HuggingFace model in Julia

It has been a month since GSoC 2020 began, which means we are reaching the end of phase 1. As discussed in the previous blog, there are 3 parts that need to be implemented. During the first month, We focus on the fundamental functionality like the model loader as well as APIs for managing model related files. I'll show and explain the available APIs and the current workflow in the following sections.

Currently the code we need are under the Transformers#gsoc2022 branch, so we need to update the package first.

]add Transformers#gsoc2020

Once the package is updated, we can use the Transformers.HuggingFace.

using Transformers.HuggingFace

File management

The module export two functions for model related file:

  • get_or_download_hgf_config(model_name; config=DEFAULT_CONFIG_NAME):

    try to find a model config with the given model name, or download from huggingface model hub if not found.

  • get_or_download_hgf_weight(model_name; weight=DEFAULT_WEIGHT_NAME):

    similar to get_or_download_hgf_config but for model weights.

These two functions will also register the download file to Artifacts.toml after downloading.

Download file from huggingface model hub

For example, we try to get the "bert-base-cased" config file from huggingface model hub since we don't have it locally.


Once the file is regestered, we can use the loading APIs to load the file

  • load_config(model_name): load a registered config.

  • load_state_dict(model_name): load a model weight in NamedTuple which preserve the model hierarchy.

cfg = load_config("bert-base-cased")
HGFBertConfig with 15 entries: :vocab_size => 28996 :hidden_size => 768 :num_hidden_layers => 12 :num_attention_heads => 12 :intermediate_size => 3072 :hidden_act => "gelu" :hidden_dropout_prob => 0.1 :attention_probs_dropout_prob => 0.1 :max_position_embeddings => 512 :type_vocab_size => 2 :initializer_range => 0.02 :layer_norm_eps => 1.0f-12 :pad_token_id => 0 :num_labels => 2 :is_decode => false

Add local file to Artifacts

If we have a custom model that is not on the huggingface model hub, then we can use another API that copy the local file and manage with the artifact system.

For example, we download another config as an illustration.

Bash in Julia
mkdir mymodel
mv ./bert-base-uncased-config.json ./mymodel/config.json
Bash in Julia

Here we have our own config file at "mymodel/config.json", then we can call HuggingFace.find_or_register_hgf_config_hash to register our own file to the artifact system.

HuggingFace.find_or_register_hgf_config_hash("./mymodel/", "my-bert-uncased-model")

After that, we can call load_config directly on our own model name.

mycfg = load_config("my-bert-uncased-model")
HGFBertConfig with 15 entries: :vocab_size => 30522 :hidden_size => 768 :num_hidden_layers => 12 :num_attention_heads => 12 :intermediate_size => 3072 :hidden_act => "gelu" :hidden_dropout_prob => 0.1 :attention_probs_dropout_prob => 0.1 :max_position_embeddings => 512 :type_vocab_size => 2 :initializer_range => 0.02 :layer_norm_eps => 1.0f-12 :pad_token_id => 0 :num_labels => 2 :is_decode => false

We also support reusing the TRANSFORMERS_CACHE from huggingface transformers. If you already have some model in the cache directory, we will copy that file directly.

Get Model weights

Similar to the process of getting a config file. we can get a model state like this:

state = load_state_dict("bert-base-cased") # too large to show


Once we have the config and state, we are ready to restore a pretrained model from them. As a proof of concept, I only implement Bert model classes for now. Others will be implemented during the 2nd and 3rd month. All the type names are aligned with corresponding class name with an extra prefix "HGF" stands for "HuGgingFace".

Create model

For example, we want to build a transformers.BertForQuestionAnswering model. Simply do:

model = HuggingFace.HGFBertForQuestionAnswering(cfg)

And you can also see the model hierarchy.


Then, we can load our state_dict with load_state:

load_state(model, state)

To show that the model is correctly load, let's get our old bert implemention in Transformers.jl

using Transformers
using Transformers.Pretrain
old_model = pretrain"bert:cased_L-12_H-768_A-12:bert_model"
TransformerModel{Bert}( embed = CompositeEmbedding(tok = Embed(768), segment = Embed(768), pe = PositionEmbedding(768, max_len=512), postprocessor = Positionwise(LayerNorm(768), Dropout(0.1))), transformers = Bert(layers=12, head=12, head_size=64, pwffn_size=3072, size=768), classifier = ( pooler => Dense(768, 768, tanh) masklm => ( transform => Chain(Dense(768, 768, gelu), LayerNorm(768)) output_bias => Array{Float32,1} ) nextsentence => Chain(Dense(768, 2), logsoftmax) ) )

you can test the value between two model

@assert old_model.transformers.ts.models[6].mh.iqproj.W == model.bert.encoder.layer._modules[6].attention.self.query.weight
@assert old_model.embed.embeddings.tok.embedding == model.bert.embeddings.word_embeddings.weight

Get our state_dict

We also provide get_state_dict function for our model. This is useful for the saver part.



Currently the model is only a struct for holding the parameter, which means it cannot be use for both training and testing. However, it shows that the workflow can work smoothly. The forward implementation will surely be done in the following month.

Runtimes (1)