T5 support by davidkaczer · Pull Request #1 · davidkaczer/modalities

davidkaczer · 2024-08-16T09:48:26Z

What does this PR do?

This PR adds support for pretrained LongT5 encoder-decoder models from HuggingFace.

General Changes

Add HuggingFacePretrainedEncoderDecoderModel class for loading LongT5
Add span masking collator ported from official Google implementation
Add test for span masking collator
Add example training configs for LongT5
Add logic to distinguish between decoder-only and encoder-decoder models when passing inputs

Breaking Changes

None intended

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py) (apparently some tests were failing already in upstream? also didn't test multi-GPU)
I have updated the internal changelog (CHANGELOG_DEV.md)

- add SpanMaskingCollateFn for span denoising objective - support loading T5 checkpoint from huggingface - support passing decoder inputs to T5 model - add example config for pretraining T5 checkpoint from HF

…ew huggingface model wrapper for encoder decoder architecture

- add logic to model_predict_batch - fix example config - cleanup HF model implementation

- remove dependency in span masking collator - refactor t5 config - refactor collator test

…odel - simplify future type checks

davidkaczer and others added 15 commits August 8, 2024 10:53

feat(t5): WIP: add support for T5 model

697d515

- add SpanMaskingCollateFn for span denoising objective - support loading T5 checkpoint from huggingface - support passing decoder inputs to T5 model - add example config for pretraining T5 checkpoint from HF

fix(t5): shift target tokens for input to decoder

be4d931

Merge branch 'Modalities:main' into t5

a90838b

chore: add span masking collator test

16b3690

Merge branch 't5' of https://github.com/davidkaczer/modalities into t5

0cc59dc

update training yaml file using T5 model and tokenizer, also create n…

8fe753a

…ew huggingface model wrapper for encoder decoder architecture

fix(t5): encoder and decoder model handling logic

41df9bf

- add logic to model_predict_batch - fix example config - cleanup HF model implementation

chore: pre-commit hook compliance

578e9e6

docs: add docstrings and type hints

e32f825

refactor: t5 config files

8699d46

fix: don't add EOS token to span masking batch

edfdc28

fix: set excluded weight groups in t5 config

6be5b34

fix: remove tokenizer dependency

1ba5006

- remove dependency in span masking collator - refactor t5 config - refactor collator test

refactor: cleaner typecheck in model_predict_batch

30ffca2

refactor: change parent class of HuggingFacePretrainedEncoderDecoderM…

81f43e0

…odel - simplify future type checks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5 support#1

T5 support#1
davidkaczer wants to merge 15 commits intomainfrom
t5

davidkaczer commented Aug 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidkaczer commented Aug 16, 2024

What does this PR do?

General Changes

Breaking Changes

Checklist before submitting final PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant