How to reduce the dictionary size of pre-training model？

  Hello, I saw "Specifically, bert-base-uncased/bert-base-cased/bert-base-german-cased are equipped with vocabularies containing 30k/29k/30k tokens, while the dictionary of bert-base-multilingual-cased contains 119k tokens, which is much larger because it consists of the common tokens among 104 languages. For each low-resource language considered in our experiments, directly loading the whole embedding matrix of the multilingual BERT model will waste a lot of GPU memory. Therefore we only consider tokens that appear in the training and validation set, and manually modify the checkpoint of the multilingual BERT to omit the embeddings of unused tokens. In this way, we obtain dictionaries that contain 24k/16k/17k/16k tokens for Ro/It/Es/Nl respectively, which ultimately save
around 77M parameters in average."in Appendix C of the original paper. 
  How did you do it in the code? What is manually modifying a checkpoint?
  Looking forward to your reply, thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reduce the dictionary size of pre-training model？ #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to reduce the dictionary size of pre-training model？ #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions