GSoC 2020-Blog#1(End of Phase 1): A HuggingFace model in Julia
It has been a month since GSoC 2020 began, which means we are reaching the end of phase 1. As discussed in the previous blog, there are 3 parts that need to be implemented. During the first month, We focus on the fundamental functionality like the model loader as well as APIs for managing model related files. I'll show and explain the available APIs and the current workflow in the following sections.
Currently the code we need are under the Transformers#gsoc2022
branch, so we need to update the package first.
]add Transformers#gsoc2020
Once the package is updated, we can use the Transformers.HuggingFace
.
using Transformers.HuggingFace
File management
The module export two functions for model related file:
get_or_download_hgf_config(model_name; config=DEFAULT_CONFIG_NAME):
try to find a model config with the given model name, or download from huggingface model hub if not found.
get_or_download_hgf_weight(model_name; weight=DEFAULT_WEIGHT_NAME):
similar to
get_or_download_hgf_config
but for model weights.
These two functions will also register the download file to Artifacts.toml
after downloading.
Download file from huggingface model hub
For example, we try to get the "bert-base-cased"
config file from huggingface model hub since we don't have it locally.
get_or_download_hgf_config("bert-base-cased")
Once the file is regestered, we can use the loading APIs to load the file
load_config(model_name): load a registered config.
load_state_dict(model_name): load a model weight in NamedTuple which preserve the model hierarchy.
cfg = load_config("bert-base-cased")
Add local file to Artifacts
If we have a custom model that is not on the huggingface model hub, then we can use another API that copy the local file and manage with the artifact system.
For example, we download another config as an illustration.
wget https://cdn.huggingface.co/bert-base-uncased-config.json
mkdir mymodel
mv ./bert-base-uncased-config.json ./mymodel/config.json
Here we have our own config file at "mymodel/config.json"
, then we can call HuggingFace.find_or_register_hgf_config_hash
to register our own file to the artifact system.
HuggingFace.find_or_register_hgf_config_hash("./mymodel/", "my-bert-uncased-model")
After that, we can call load_config
directly on our own model name.
mycfg = load_config("my-bert-uncased-model")
We also support reusing the TRANSFORMERS_CACHE
from huggingface transformers. If you already have some model in the cache directory, we will copy that file directly.
Get Model weights
Similar to the process of getting a config file. we can get a model state like this:
get_or_download_hgf_weight("bert-base-cased")
state = load_state_dict("bert-base-cased") # too large to show
summary(state)
Model
Once we have the config and state, we are ready to restore a pretrained model from them. As a proof of concept, I only implement Bert model classes for now. Others will be implemented during the 2nd and 3rd month. All the type names are aligned with corresponding class name with an extra prefix "HGF"
stands for "HuGgingFace".
Create model
For example, we want to build a transformers.BertForQuestionAnswering
model. Simply do:
model = HuggingFace.HGFBertForQuestionAnswering(cfg)
And you can also see the model hierarchy.
Loading
Then, we can load our state_dict
with load_state
:
load_state(model, state)
To show that the model is correctly load, let's get our old bert implemention in Transformers.jl
using Transformers
using Transformers.Pretrain
ENV["DATADEPS_ALWAYS_ACCEPT"] = true
old_model = pretrain"bert:cased_L-12_H-768_A-12:bert_model"
you can test the value between two model
old_model.transformers.ts.models[6].mh.iqproj.W == model.bert.encoder.layer._modules[6].attention.self.query.weight
old_model.embed.embeddings.tok.embedding == model.bert.embeddings.word_embeddings.weight
Get our state_dict
We also provide get_state_dict function for our model. This is useful for the saver part.
HuggingFace.get_state_dict(model)
Conclusion
Currently the model
is only a struct for holding the parameter, which means it cannot be use for both training and testing. However, it shows that the workflow can work smoothly. The forward implementation will surely be done in the following month.