Guidance/plans to add encoder/decoder model support for e.g T5 model?

I'm trying to adapt the bitdistiller code for encoder-decoder models. 

Are there any plans to add support for this? Can some guidance be provided what parts need adaptation?

We're running a project to test the findings found in Table 5 where Llama 7B performed better as the teacher than 13B. We're testing the hypothesis you put forward across OPT models and now expanding our experiment to encoder-decoder models. Further, we're also running an experiment to sequentially introduce larger teachers. I.E self-distillation followed by a bigger model as teacher on the self-distilled model. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidance/plans to add encoder/decoder model support for e.g T5 model? #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidance/plans to add encoder/decoder model support for e.g T5 model? #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions