Skip to content

Conversation

@lucylq
Copy link
Contributor

@lucylq lucylq commented Nov 21, 2025

Summary

LoraLinears contain:

  1. base weight (nn.Linear)
  2. lora_a (nn.Linear)
  3. lora_b (nn.Linear)

(2) and (3) are caught by the filter, but (1) is not, as the weight and bias are pulled out of the nn.Linear and placed into nn.Parameters, and the linear is performed manually. This is for checkpoint compatibility - otherwise we'd have to map the weights for any lora model.

See:

linear = nn.Linear(in_dim, out_dim, bias=use_bias)
weight = linear.weight
bias = linear.bias if self.use_bias else None
self.register_parameter("weight", nn.Parameter(weight))
self.register_parameter(
"bias", nn.Parameter(bias) if bias is not None else None
)

This PR adds lora linears into the quantization filter.

Test plan

python -m extension.llm.export.export_llm \
    base.checkpoint="${DOWNLOADED_PATH}/consolidated.00.pth" \
    base.params="${DOWNLOADED_PATH}/params.json" \
    base.adapter_checkpoint="../et_docs_7_epoch/adapter_model.safetensors" \
    base.adapter_config="../et_docs_7_epoch/adapter_config.json" \
    base.tokenizer_path="../et_docs_7_epoch/" \
    model.use_kv_cache=true \
    model.use_sdpa_with_kv_cache=true \

Confirm output model size is ~1.7GB instead of 5.1GB.

(executorch) [lfq@devvm311.ldc0 /data/users/lfq/executorch (lfq.quantize-lora-linears)]$ ls -la *.pte
-rw-r--r-- 1 lfq users 5106135168 Nov 20 15:59 et_lora.pte
-rw-r--r-- 1 lfq users 1733835776 Nov 20 17:07 et_lora_fix.pte

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2025
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@lucylq lucylq force-pushed the lfq.quantize-lora-linears branch from 089addc to c13f720 Compare November 21, 2025 01:09
@lucylq lucylq changed the title Lfq.quantize lora linears Quantize lora linears Nov 21, 2025
@lucylq lucylq requested a review from metascroy November 21, 2025 01:09
@lucylq lucylq marked this pull request as ready for review November 21, 2025 01:09
@lucylq lucylq requested a review from jackzhxng as a code owner November 21, 2025 01:09
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15935

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit c13f720 with merge base b4d72f1 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@lucylq lucylq merged commit fee1b2d into main Nov 21, 2025
169 of 172 checks passed
@lucylq lucylq deleted the lfq.quantize-lora-linears branch November 21, 2025 17:33
jirioc pushed a commit to nxp-upstream/executorch that referenced this pull request Dec 19, 2025
### Summary
LoraLinears contain:
1. base weight (nn.Linear)
2. lora_a (nn.Linear)
3. lora_b (nn.Linear) 

(2) and (3) are caught by the filter, but (1) is not, as the weight and
bias are pulled out of the nn.Linear and placed into nn.Parameters, and
the linear is performed manually. This is for checkpoint compatibility -
otherwise we'd have to map the weights for any lora model.

See:

https://github.com/pytorch/executorch/blob/b4d72f1e271915e9c0e1d313753a1eec840fbdee/examples/models/llama/lora.py#L31-L37

This PR adds lora linears into the quantization filter.

### Test plan
```
python -m extension.llm.export.export_llm \
    base.checkpoint="${DOWNLOADED_PATH}/consolidated.00.pth" \
    base.params="${DOWNLOADED_PATH}/params.json" \
    base.adapter_checkpoint="../et_docs_7_epoch/adapter_model.safetensors" \
    base.adapter_config="../et_docs_7_epoch/adapter_config.json" \
    base.tokenizer_path="../et_docs_7_epoch/" \
    model.use_kv_cache=true \
    model.use_sdpa_with_kv_cache=true \
```

Confirm output model size is ~1.7GB instead of 5.1GB. 
```
(executorch) [lfq@devvm311.ldc0 /data/users/lfq/executorch (lfq.quantize-lora-linears)]$ ls -la *.pte
-rw-r--r-- 1 lfq users 5106135168 Nov 20 15:59 et_lora.pte
-rw-r--r-- 1 lfq users 1733835776 Nov 20 17:07 et_lora_fix.pte
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants