Skip to content

Conversation

@chenweng-quic
Copy link
Collaborator

@chenweng-quic chenweng-quic commented Nov 13, 2025

Summary

Add Granite3.3-2b support.

Source model:
image
image

Static llama:
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H mlgtw-linux -s c3b39f15 -m SM8650 --temperature 0 --model_mode kv --max_seq_len 1024 --prefill_ar_len 128 --decoder_model granite_3_3-2b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --run_lm_eval --task hellaswag --limit 10 --artifact llama_qnn --kv_updater shift_pointer

Accuracy(hellaswag)(limit=10)

prepare_pt2e: {'acc_norm,none': 0.5}
convert_pt2e: {'acc_norm,none': 0.3}
device: {'acc_norm,none': 0.2}

Statistics on SM8650(16a4w_block64)

image

Statistics on SM8750(16a4w_block64)

image

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_granite_3_3_2b_instruct --device c3b39f15 --host mlgtw-linux --model SM8650 --build_folder build-android --executorch_root . --artifact_dir ./llama_qnn --llama_artifacts llama_qnn

cc @cccclai @cbilgin

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15808

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 33576ac with merge base 3e90b44 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2025
@chenweng-quic chenweng-quic added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Nov 13, 2025
@pytorch-bot pytorch-bot bot removed the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Nov 13, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 13, 2025

The label module: qnn is only applicable to issues and has been removed. Please only use this label on issues.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

model_architecture: str = (
"LlamaForCausalLM" # This setting is currently only supported for the QNN backend
)
attention_multiplier: Optional[float] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments to explain the new arguments for attention_multiplier, logits_scaling and residual_multiplier? I understand we don't have them for the other params, but let's start document them to make it more scalable and readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for your suggestion.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 14, 2025

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D87034494.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

@cccclai cccclai merged commit 76d43bc into pytorch:main Nov 16, 2025
144 of 145 checks passed
jirioc pushed a commit to nxp-upstream/executorch that referenced this pull request Dec 19, 2025
### Summary
Add Granite3.3-2b support.

Source model:
<img width="957" height="1047" alt="image"
src="https://github.com/user-attachments/assets/d17dd15c-ffc1-43e9-9e57-7794a63d8a5d"
/>
<img width="1734" height="947" alt="image"
src="https://github.com/user-attachments/assets/45f1e80d-95e7-4865-9dfc-1e04d3eb90e4"
/>


Static llama:
`python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H
mlgtw-linux -s c3b39f15 -m SM8650 --temperature 0 --model_mode kv
--max_seq_len 1024 --prefill_ar_len 128 --decoder_model
granite_3_3-2b_instruct --prompt "I would like to learn python, could
you teach me with a simple example?" --run_lm_eval --task hellaswag
--limit 10 --artifact llama_qnn --kv_updater shift_pointer`


#### Accuracy(hellaswag)(limit=10)
prepare_pt2e: {'acc_norm,none': 0.5}
convert_pt2e: {'acc_norm,none': 0.3}
device: {'acc_norm,none': 0.2}

#### Statistics on SM8650(16a4w_block64)
<img width="1167" height="395" alt="image"
src="https://github.com/user-attachments/assets/42fcd93f-546f-4884-9540-07a89729acb2"
/>

#### Statistics on SM8750(16a4w_block64)
<img width="1313" height="485" alt="image"
src="https://github.com/user-attachments/assets/7ab11855-60a8-4b09-9e20-1f93069ccc11"
/>


### Test plan
`
python backends/qualcomm/tests/test_qnn_delegate.py -k
TestExampleLLMScript.test_granite_3_3_2b_instruct --device c3b39f15
--host mlgtw-linux --model SM8650 --build_folder build-android
--executorch_root . --artifact_dir ./llama_qnn --llama_artifacts
llama_qnn
`


cc @cccclai @cbilgin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants