Quantize lora linears #15935

lucylq · 2025-11-21T01:08:26Z

Summary

LoraLinears contain:

base weight (nn.Linear)
lora_a (nn.Linear)
lora_b (nn.Linear)

(2) and (3) are caught by the filter, but (1) is not, as the weight and bias are pulled out of the nn.Linear and placed into nn.Parameters, and the linear is performed manually. This is for checkpoint compatibility - otherwise we'd have to map the weights for any lora model.

See:

executorch/examples/models/llama/lora.py

Lines 31 to 37 in b4d72f1

    
           linear = nn.Linear(in_dim, out_dim, bias=use_bias) 
        
           weight = linear.weight 
        
           bias = linear.bias if self.use_bias else None 
        
           self.register_parameter("weight", nn.Parameter(weight)) 
        
           self.register_parameter( 
        
               "bias", nn.Parameter(bias) if bias is not None else None 
        
           )

This PR adds lora linears into the quantization filter.

Test plan

python -m extension.llm.export.export_llm \
    base.checkpoint="${DOWNLOADED_PATH}/consolidated.00.pth" \
    base.params="${DOWNLOADED_PATH}/params.json" \
    base.adapter_checkpoint="../et_docs_7_epoch/adapter_model.safetensors" \
    base.adapter_config="../et_docs_7_epoch/adapter_config.json" \
    base.tokenizer_path="../et_docs_7_epoch/" \
    model.use_kv_cache=true \
    model.use_sdpa_with_kv_cache=true \

Confirm output model size is ~1.7GB instead of 5.1GB.

(executorch) [lfq@devvm311.ldc0 /data/users/lfq/executorch (lfq.quantize-lora-linears)]$ ls -la *.pte
-rw-r--r-- 1 lfq users 5106135168 Nov 20 15:59 et_lora.pte
-rw-r--r-- 1 lfq users 1733835776 Nov 20 17:07 et_lora_fix.pte

github-actions · 2025-11-21T01:09:07Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pytorch-bot · 2025-11-21T01:50:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15935

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit c13f720 with merge base b4d72f1 ():

NEW FAILURE - The following job has failed:

pull / unittest-editable / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest-editable / linux / linux-job (gh) (similar failure)
exir/backend/test/test_lowered_backend_module.py::TestBackendAPI::test_emit_nested_lowered_backend_module

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
exir/backend/test/test_lowered_backend_module.py::TestBackendAPI::test_emit_nested_lowered_backend_module

This comment was automatically generated by Dr. CI and updates every 15 minutes.

### Summary LoraLinears contain: 1. base weight (nn.Linear) 2. lora_a (nn.Linear) 3. lora_b (nn.Linear) (2) and (3) are caught by the filter, but (1) is not, as the weight and bias are pulled out of the nn.Linear and placed into nn.Parameters, and the linear is performed manually. This is for checkpoint compatibility - otherwise we'd have to map the weights for any lora model. See: https://github.com/pytorch/executorch/blob/b4d72f1e271915e9c0e1d313753a1eec840fbdee/examples/models/llama/lora.py#L31-L37 This PR adds lora linears into the quantization filter. ### Test plan ``` python -m extension.llm.export.export_llm \ base.checkpoint="${DOWNLOADED_PATH}/consolidated.00.pth" \ base.params="${DOWNLOADED_PATH}/params.json" \ base.adapter_checkpoint="../et_docs_7_epoch/adapter_model.safetensors" \ base.adapter_config="../et_docs_7_epoch/adapter_config.json" \ base.tokenizer_path="../et_docs_7_epoch/" \ model.use_kv_cache=true \ model.use_sdpa_with_kv_cache=true \ ``` Confirm output model size is ~1.7GB instead of 5.1GB. ``` (executorch) [lfq@devvm311.ldc0 /data/users/lfq/executorch (lfq.quantize-lora-linears)]$ ls -la *.pte -rw-r--r-- 1 lfq users 5106135168 Nov 20 15:59 et_lora.pte -rw-r--r-- 1 lfq users 1733835776 Nov 20 17:07 et_lora_fix.pte ```

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2025

quantize lora linears

c13f720

lucylq force-pushed the lfq.quantize-lora-linears branch from 089addc to c13f720 Compare November 21, 2025 01:09

lucylq changed the title ~~Lfq.quantize lora linears~~ Quantize lora linears Nov 21, 2025

lucylq requested a review from metascroy November 21, 2025 01:09

lucylq marked this pull request as ready for review November 21, 2025 01:09

lucylq requested a review from jackzhxng as a code owner November 21, 2025 01:09

metascroy approved these changes Nov 21, 2025

View reviewed changes

lucylq mentioned this pull request Nov 21, 2025

Quantization for program-data separation #15419

Closed

lucylq merged commit fee1b2d into main Nov 21, 2025
169 of 172 checks passed

lucylq deleted the lfq.quantize-lora-linears branch November 21, 2025 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize lora linears #15935

Quantize lora linears #15935

Uh oh!

lucylq commented Nov 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

pytorch-bot bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	linear = nn.Linear(in_dim, out_dim, bias=use_bias)
	weight = linear.weight
	bias = linear.bias if self.use_bias else None
	self.register_parameter("weight", nn.Parameter(weight))
	self.register_parameter(
	"bias", nn.Parameter(bias) if bias is not None else None
	)

Quantize lora linears #15935

Quantize lora linears #15935

Uh oh!

Conversation

lucylq commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Nov 21, 2025

This PR needs a release notes: label

Uh oh!

pytorch-bot bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15935

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucylq commented Nov 21, 2025 •

edited

Loading

This PR needs a `release notes:` label

pytorch-bot bot commented Nov 21, 2025 •

edited

Loading