Skip to content

Guidance/plans to add encoder/decoder model support for e.g T5 model? #10

@mustavikhan05

Description

@mustavikhan05

I'm trying to adapt the bitdistiller code for encoder-decoder models.

Are there any plans to add support for this? Can some guidance be provided what parts need adaptation?

We're running a project to test the findings found in Table 5 where Llama 7B performed better as the teacher than 13B. We're testing the hypothesis you put forward across OPT models and now expanding our experiment to encoder-decoder models. Further, we're also running an experiment to sequentially introduce larger teachers. I.E self-distillation followed by a bigger model as teacher on the self-distilled model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions