Add functionality to support IA3 by SumanthRH · Pull Request #578 · huggingface/peft

SumanthRH · 2023-06-15T01:53:48Z

Hi,

We've added some code to support IA3 from the T-few paper. Most of the code is inspired by the LoRA implementation. Currently, implementation supports multiple adapters, int-8 training, merging and unmerging. I've only added a minimal set of models for now. With IA3, there are learned vectors added to key, value and feed-forward layers. The forward pass differs for feedforward vs non-feedforward layers.

I hope this can be merged into the repo!

Merge additional merge/unmerge features

review-notebook-app · 2023-06-15T01:53:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

HuggingFaceDocBuilderDev · 2023-06-15T07:36:32Z

The documentation is not available anymore as the PR was closed or merged.

pacman100

Thank you @SumanthRH for adding IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) PEFT method 🤗🚀✨! This will provide more options for the community to explore for their specific problems.

Could you please remove changes from existing example notebooks and only have new notebook examples specific to IA3?

I left a comment regarding refactoring some code.

src/peft/utils/save_and_load.py

pacman100 · 2023-06-16T08:58:36Z

Also, please run make style and make quality to resolve the quality issues

pacman100 · 2023-06-16T11:23:12Z

Hello, it would be great if you could add a test file for testing the minimal core components only (forward, save_pretrained, generate). cc @younesbelkada

SumanthRH · 2023-06-16T17:21:49Z

Hi @pacman100,

Thanks for checking out the pull request. I can add tests similar to what you have for LoRA/Prefix tuning. I wasn't sure if you would be merging this, so I hadn't added those changes yet.

younesbelkada

Hi @SumanthRH
Thank you very much for your great work
I second what @pacman100 said, it would be great if you can add a test file to make sure your implementation won't get broken by future PRs. For that, please have a look at : https://github.com/huggingface/peft/blob/main/tests/test_adaption_prompt.py and you can adapt it to your needs. Let us know if you need any help designing the tests

SumanthRH · 2023-06-26T01:21:42Z

Hi @pacman100 @younesbelkada

I looked through the tests folder and thought it was better to add IA3 to the existing set of common tests in tests_common. Currently, here are all the tests that are supported:

_test_model_attr
_test_prepare_for_training
_test_save_pretrained
_test_merge_layers (Caveat here: With LoRA, the initialization is random - thus the results after and before merging differ slightly. With IA3, the initialization of the learned vectors is all ones - as per the paper. So the results don't change. Not sure if I should do any additional tests here?)
_test_generate
_test_training

All of these tests are run for decoder models (T5 and BART) and the encoder-decoder models (GPT2, OPT, BLOOM, GPT-NeoX, GPT-Neo). Also, the original authors only specific the IA3 idea for the T0 model, so for other architectures I've simply made an appropriate choice (there could be other choices that are better for that architecture?). Let me know if this works!

TODO:

Half-precision support
Gradient checkpointing
~~Run training with GPT-2 and compare performance (additional sanity check for the fan_in_fan_out implementation )~~ I'm able to get performance better than prefix tuning for data in peft_prefix_tuning_clm.ipynb

SumanthRH · 2023-07-04T04:17:49Z

Hello @pacman100 , @younesbelkada . I cleaned up the code a bit more. The example notebooks might still need some edits as there's some code that's specific to my username (model saving part). But apart from that, I hope this looks good!

BenjaminBossan

Thanks a lot @SumanthRH for this addition. It looks really nice for the most part and is consistent with the existing code base, well done.

I'm not an expert on this topic (yet), so I cannot comment on every detail. I made a few suggestions, please take a look if they make sense.

In general, I think a docs entry would be good to have, since this introduces a completely new method to the library.

A question more in the direction of the other maintainers: There is a lot of code duplication between this and LoRA. Are we fine with keeping it as or would it be better to add some abstraction? As always, there are pros and cons to both. I just want to ensure that this duplication is indeed the desired state.

src/peft/tuners/ia3.py

SumanthRH · 2023-07-06T03:24:49Z

Also @BenjaminBossan, I noticed your comment on documentation a bit late, sorry about that! I've added some documentation for $\text{IA}^3$ now, with a basic conceptual guide. Docs should get updated here. In terms of supported methods/models, I've only indicated those covered by tests or in examples. Let me know if that works!

BenjaminBossan · 2023-07-06T10:36:38Z

I've added some documentation for IA3 now, with a basic conceptual guide

Fantastic, well written, thank you.

younesbelkada

Thanks a lot for your great work @SumanthRH , looking forward to merging this! Thanks also for adding the testing suite for that method!
I left few open questions, let me know what do you think
Also @BenjaminBossan raised a great question, regarding code de-duplication we should maybe start by doing the same approach as what we have in transformers # Copied from .. and test if the method/ class is effectively copied from somewhere else. I would say let's take care of that in a follow up PR

src/peft/tuners/ia3.py

younesbelkada · 2023-07-07T11:23:30Z

tests/testing_common.py

-            self.assertTrue(torch.allclose(logits_lora, logits_merged, atol=1e-4, rtol=1e-4))
-            self.assertFalse(torch.allclose(logits_merged, logits_transformers, atol=1e-10, rtol=1e-10))
+            self.assertTrue(torch.allclose(logits_unmerged, logits_merged, atol=1e-4, rtol=1e-4))
+            if config_cls == LoraConfig:  # merge does not change logits for IA3


can you elaborate on that? what do you mean specifcally by merge does not change logits for IA3? if merging is not supported I think we should just raise an error. WDYT?

So merging is supported, but the specific check here is testing for a difference between logits_merged (after merging IA3 vectors) and logits_transformers (without IA3 vectors). With IA3, the initialization is all ones initially (From the paper: "...[the learned vectors] are all initialized with ones so that the overall function computed by the model does not change when they are added." In the case of LoRA, the random initialization will change model outputs ever so slightly after adding LoRA weights, which is being tested with the assertFalse statement. This isn't going to be true for IA3 - so I have simply ignored that test. I'm not sure what other tests can be added though for merging.

I see thanks a lot for the detailed explanation ! Do you think it is possible to force the initialization to be something different than all ones.? I think in the past I have faced the same issue with LoRA because the B matrix was initialized to all zeros. I have added a boolean here:

peft/src/peft/tuners/lora.py

Line 91 in 06fd06a

init_lora_weights: bool = field(

let me know if you need help on that !
Otherwise all good I think, it can be done in a follow up PR

Ah okay! There's actaully an argument already called init_ia3_weights that's supposed to give this control - but even with init_ia3_weights as False, currently the initialization is all ones :
https://github.com/SumanthRH/peft/blob/5ed92fad11aeec3ec94c8dfd908e81a3cb09d4e4/src/peft/tuners/ia3.py#L399

If this going to be different, then the initialization should be such that the merged version is close enough to the pre-trained weights, which means that the entries in the learned vectors should be close enough to one (because they rescale activations). I'm not too sure what standard initialization fits in here. Maybe something like PyTorch's nn.Linear initialization for matrices, but shifted by one: $U(-\sqrt{k}, \sqrt{k}) + 1$ , where $k=1/\text{vector-length}$. I also don't want to make it look like we conjured up an initialization, so I'm open to hearing your thoughts on this!

I see thanks for explaining, yes let's leave it as it is and we'll take care of that in a follow up PR !

@younesbelkada Could you explain why we want to require the outputs to be different each time when initializing the model? Is it not actually a good thing if by default, the added weights lead to an identity operation? I'm running into the same test error when using LoRA with certain custom models.

Hello @BenjaminBossan, LoRA initializes lora_B to all 0 so that we start with an identity function. So, for all usage init_lora_weights=True. Only during testing when we can't actually train the model but want to test the functionality of merge_and_unload, init_lora_weights is set to False else even after merging it will be identity op and we can't test if the merge was successful or not.

Ah I see, thanks for explaining. This should probably added as a comment to that line, since I think I'm not the only one who was confused.

Regarding the testing of the merging feature, would it be possible to check the weights directly, instead of the model outputs?

…; Cleaned unused merge/unmerge methods

younesbelkada

Looking great on my side ! Thanks for all your great work on this !
Let's see what @pacman100 & @BenjaminBossan will say, let me know if I can help you addressing last comments and incoming comments
Thanks again!
PS: can you run the styling checks ? make style && make quality

BenjaminBossan

Overall, this LGTM and I don't have much to add besides what Younes said.

It is not quite clear to me why _check_target_module_exists differs from the LoRA implementation, but I think we'll refactor some of that soon, so maybe we can keep it as is.

Also @BenjaminBossan raised a great question, regarding code de-duplication we should maybe start by doing the same approach as what we have in transformers # Copied from .. and test if the method/ class is effectively copied from somewhere else. I would say let's take care of that in a follow up PR

Okay, let's deal with that later. I'm not sure if the initial reason why # Copied from was introduced in transformers necessarily apply here as well, but we can discuss it then.

pacman100

Thank you so much @SumanthRH for all the efforts put it to add IA3. Left a suggestion, other than that LGTM!

src/peft/tuners/ia3.py

Remove unused attribute merge_weights Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

SumanthRH · 2023-07-14T03:01:27Z

Happy to see this merged! A small note to the maintainers @pacman100 @younesbelkada: The example notebooks peft_ia3_seq2seq.ipynb and IA3.ipynb might need some minor changes (specifically, in the "Share adapters to the Hub" sections at the end). Other than that, we should be okay! This was also my first major OSS contribution, so thanks for helping me out!

* Added initial ia3 code * Implemented ia3 correctly for feedforward layers; Fixed regex matching * Fixed module mapping for mt5 * Merged changes from huggingface:main * Merged changes * Fixed lora merge conflicts * Different bloom config * Added save option for ia3 * Added loading code for ia3 * Added feedforward implementation in utils and seq cls example * Added feedforward implementation in utils and seq cls example * Implemented merge, unmerge, enable/disable adapters functionality * Fixed feedforward during merge * Debugging Merge * Removing debug messages * Cleaned up repo * Removed non-IA3 changes * Refactor save and load * Added support to all models in tests; Added IA3Config for common tests * Added half-precision support and test for gradient checkpointing; Formatted jupyter notebooks * Added target modules for new models GPTBigCode and LLama * Cleaned up code * Cleaned up code * Cleaned up example notebook * Cleaned up seq2seq notebook * Corrected function docstrings; refactored find_and_replace * Corrected function docstrings; refactored find_and_replace * Added basic docs for IA3 * Added new conceptual guide in source tree for documentation * Minor fix to documentation * Minor fixes to docstrings; Added error handling for 4bit quantization; Cleaned unused merge/unmerge methods * styling changes after merge from main * Update src/peft/tuners/ia3.py Remove unused attribute merge_weights Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> --------- Co-authored-by: Abhishek2304 <abhishekgupta2304@gmail.com> Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

SumanthRH and others added 18 commits May 21, 2023 17:02

Added initial ia3 code

45dac88

Implemented ia3 correctly for feedforward layers; Fixed regex matching

5ac2e65

Fixed module mapping for mt5

4d97f08

Merged changes from huggingface:main

97e2df3

Merged changes

6b537a9

Merge remote-tracking branch 'upstream/main' into ia3

dee8e13

Fixed lora merge conflicts

d80f061

Different bloom config

92594b3

Added save option for ia3

648338c

Added loading code for ia3

150be4b

Added feedforward implementation in utils and seq cls example

5fffe57

Added feedforward implementation in utils and seq cls example

ef997cf

Implemented merge, unmerge, enable/disable adapters functionality

ee1c72d

Fixed feedforward during merge

0707631

Debugging Merge

75651d6

Removing debug messages

716c656

Merge pull request #1 from SumanthRH/merge

b814ddc

Merge additional merge/unmerge features

Cleaned up repo

0e4b882

Merged changes from upstream main

87564ed

pacman100 reviewed Jun 16, 2023

View reviewed changes

src/peft/utils/save_and_load.py Outdated Show resolved Hide resolved

younesbelkada reviewed Jun 19, 2023

View reviewed changes

SumanthRH added 3 commits June 19, 2023 21:51

Removed non-IA3 changes

b23ade4

Refactor save and load

d94fc8d

Added support to all models in tests; Added IA3Config for common tests

c4673b6

SumanthRH added 3 commits July 3, 2023 20:30

Cleaned up code

75f4796

Cleaned up example notebook

c3b80d9

Cleaned up seq2seq notebook

d6c0a47

BenjaminBossan reviewed Jul 4, 2023

View reviewed changes

src/peft/tuners/ia3.py Show resolved Hide resolved

src/peft/tuners/ia3.py Outdated Show resolved Hide resolved

src/peft/tuners/ia3.py Outdated Show resolved Hide resolved

src/peft/tuners/ia3.py Outdated Show resolved Hide resolved

src/peft/tuners/ia3.py Outdated Show resolved Hide resolved

SumanthRH added 2 commits July 4, 2023 11:15

Corrected function docstrings; refactored find_and_replace

f602b83

Corrected function docstrings; refactored find_and_replace

af8aa96

SumanthRH requested a review from BenjaminBossan July 4, 2023 19:57

SumanthRH added 3 commits July 4, 2023 22:28

Added basic docs for IA3

c6a0440

Added new conceptual guide in source tree for documentation

4fc4ab3

Minor fix to documentation

7d4dae1

younesbelkada mentioned this pull request Jul 7, 2023

Prototype LoRA/Adapter/Prefix-tuning support - Arxiv #648

Closed

younesbelkada reviewed Jul 7, 2023

View reviewed changes

SumanthRH and others added 2 commits July 7, 2023 23:29

Minor fixes to docstrings; Added error handling for 4bit quantization…

2fc94af

…; Cleaned unused merge/unmerge methods

Merge branch 'main' into ia3

3be3a3b

SumanthRH requested a review from younesbelkada July 8, 2023 06:47

younesbelkada approved these changes Jul 8, 2023

View reviewed changes

styling changes after merge from main

5ed92fa

BenjaminBossan approved these changes Jul 10, 2023

View reviewed changes

BenjaminBossan mentioned this pull request Jul 10, 2023

Add support for T-Few #42

Closed

pacman100 approved these changes Jul 11, 2023

View reviewed changes

src/peft/tuners/ia3.py Outdated Show resolved Hide resolved

src/peft/tuners/ia3.py Outdated Show resolved Hide resolved

Update src/peft/tuners/ia3.py

a9197db

Remove unused attribute merge_weights Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

SumanthRH requested a review from pacman100 July 12, 2023 03:32

pacman100 approved these changes Jul 13, 2023

View reviewed changes

pacman100 merged commit c33c42f into huggingface:main Jul 13, 2023

younesbelkada mentioned this pull request Jul 15, 2023

[Feature Request] - Add support for AdaMix #504

Closed

Conversation

SumanthRH commented Jun 15, 2023

Uh oh!

review-notebook-app bot commented Jun 15, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pacman100 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pacman100 commented Jun 16, 2023

Uh oh!

pacman100 commented Jun 16, 2023

Uh oh!

SumanthRH commented Jun 16, 2023

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

SumanthRH commented Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SumanthRH commented Jul 4, 2023

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SumanthRH commented Jul 6, 2023

Uh oh!

BenjaminBossan commented Jul 6, 2023

Uh oh!

younesbelkada left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

younesbelkada left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

pacman100 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SumanthRH commented Jul 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

HuggingFaceDocBuilderDev commented Jun 15, 2023 •

edited

Loading

SumanthRH commented Jun 26, 2023 •

edited

Loading

younesbelkada left a comment •

edited

Loading

younesbelkada left a comment •

edited

Loading