Skip to content

T5 support#1

Open
davidkaczer wants to merge 15 commits intomainfrom
t5
Open

T5 support#1
davidkaczer wants to merge 15 commits intomainfrom
t5

Conversation

@davidkaczer
Copy link
Owner

What does this PR do?

This PR adds support for pretrained LongT5 encoder-decoder models from HuggingFace.

General Changes

  • Add HuggingFacePretrainedEncoderDecoderModel class for loading LongT5
  • Add span masking collator ported from official Google implementation
  • Add test for span masking collator
  • Add example training configs for LongT5
  • Add logic to distinguish between decoder-only and encoder-decoder models when passing inputs

Breaking Changes

  • None intended

Checklist before submitting final PR

  • My PR is minimal and addresses one issue in isolation
  • I have merged the latest version of the target branch into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have checked that all tests run through (python tests/tests.py) (apparently some tests were failing already in upstream? also didn't test multi-GPU)
  • I have updated the internal changelog (CHANGELOG_DEV.md)

davidkaczer and others added 15 commits August 8, 2024 10:53
- add SpanMaskingCollateFn for span denoising objective
- support loading T5 checkpoint from huggingface
- support passing decoder inputs to T5 model
- add example config for pretraining T5 checkpoint from HF
…ew huggingface model wrapper for encoder decoder architecture
- add logic to model_predict_batch
- fix example config
- cleanup HF model implementation
- remove dependency in span masking collator
- refactor t5 config
- refactor collator test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant