Qualcomm AI Engine Direct - GA Static Granite3.3-2b #15808

chenweng-quic · 2025-11-13T10:45:34Z

Summary

Add Granite3.3-2b support.

Source model:

Static llama:
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H mlgtw-linux -s c3b39f15 -m SM8650 --temperature 0 --model_mode kv --max_seq_len 1024 --prefill_ar_len 128 --decoder_model granite_3_3-2b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --run_lm_eval --task hellaswag --limit 10 --artifact llama_qnn --kv_updater shift_pointer

Accuracy(hellaswag)(limit=10)

prepare_pt2e: {'acc_norm,none': 0.5}
convert_pt2e: {'acc_norm,none': 0.3}
device: {'acc_norm,none': 0.2}

Statistics on SM8650(16a4w_block64)

Statistics on SM8750(16a4w_block64)

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_granite_3_3_2b_instruct --device c3b39f15 --host mlgtw-linux --model SM8650 --build_folder build-android --executorch_root . --artifact_dir ./llama_qnn --llama_artifacts llama_qnn

cc @cccclai @cbilgin

pytorch-bot · 2025-11-13T10:45:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15808

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 33576ac with merge base 3e90b44 ():

NEW FAILURE - The following job has failed:

pull / test-moshi-linux / linux-job (gh)
RuntimeError: Could not load libtorchcodec. Likely causes:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2025-11-13T10:46:04Z

The label module: qnn is only applicable to issues and has been removed. Please only use this label on issues.

github-actions · 2025-11-13T10:46:37Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cccclai · 2025-11-13T19:13:31Z

examples/models/llama/model_args.py

    model_architecture: str = (
        "LlamaForCausalLM"  # This setting is currently only supported for the QNN backend
    )
+    attention_multiplier: Optional[float] = None


Can you add some comments to explain the new arguments for attention_multiplier, logits_scaling and residual_multiplier? I understand we don't have them for the other params, but let's start document them to make it more scalable and readable.

Done, thanks for your suggestion.

meta-codesync · 2025-11-14T04:10:29Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D87034494.

cccclai

Thank you for the contribution!

@cccclai

### Summary Add Granite3.3-2b support. Source model: <img width="957" height="1047" alt="image" src="https://github.com/user-attachments/assets/d17dd15c-ffc1-43e9-9e57-7794a63d8a5d" /> <img width="1734" height="947" alt="image" src="https://github.com/user-attachments/assets/45f1e80d-95e7-4865-9dfc-1e04d3eb90e4" /> Static llama: `python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H mlgtw-linux -s c3b39f15 -m SM8650 --temperature 0 --model_mode kv --max_seq_len 1024 --prefill_ar_len 128 --decoder_model granite_3_3-2b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --run_lm_eval --task hellaswag --limit 10 --artifact llama_qnn --kv_updater shift_pointer` #### Accuracy(hellaswag)(limit=10) prepare_pt2e: {'acc_norm,none': 0.5} convert_pt2e: {'acc_norm,none': 0.3} device: {'acc_norm,none': 0.2} #### Statistics on SM8650(16a4w_block64) <img width="1167" height="395" alt="image" src="https://github.com/user-attachments/assets/42fcd93f-546f-4884-9540-07a89729acb2" /> #### Statistics on SM8750(16a4w_block64) <img width="1313" height="485" alt="image" src="https://github.com/user-attachments/assets/7ab11855-60a8-4b09-9e20-1f93069ccc11" /> ### Test plan ` python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_granite_3_3_2b_instruct --device c3b39f15 --host mlgtw-linux --model SM8650 --build_folder build-android --executorch_root . --artifact_dir ./llama_qnn --llama_artifacts llama_qnn ` cc @cccclai @cbilgin

Qualcomm AI Engine Direct - GA Static Granite3.3-2b

f0b0755

chenweng-quic requested review from cccclai, jackzhxng and lucylq as code owners November 13, 2025 10:45

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2025

chenweng-quic added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Nov 13, 2025

pytorch-bot bot removed the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Nov 13, 2025

cccclai reviewed Nov 13, 2025

View reviewed changes

Update model_args.py

74c08e1

fix lint

33576ac

cccclai approved these changes Nov 15, 2025

View reviewed changes

cccclai merged commit 76d43bc into pytorch:main Nov 16, 2025
144 of 145 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualcomm AI Engine Direct - GA Static Granite3.3-2b #15808

Qualcomm AI Engine Direct - GA Static Granite3.3-2b #15808

Uh oh!

chenweng-quic commented Nov 13, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

cccclai Nov 13, 2025

Uh oh!

chenweng-quic Nov 14, 2025

Uh oh!

meta-codesync bot commented Nov 14, 2025

Uh oh!

cccclai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qualcomm AI Engine Direct - GA Static Granite3.3-2b #15808

Qualcomm AI Engine Direct - GA Static Granite3.3-2b #15808

Uh oh!

Conversation

chenweng-quic commented Nov 13, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Accuracy(hellaswag)(limit=10)

Statistics on SM8650(16a4w_block64)

Statistics on SM8750(16a4w_block64)

Test plan

Uh oh!

pytorch-bot bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15808

❌ 1 New Failure

Uh oh!

pytorch-bot bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

This PR needs a release notes: label

Uh oh!

cccclai Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

chenweng-quic Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Nov 14, 2025

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenweng-quic commented Nov 13, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 13, 2025 •

edited

Loading

This PR needs a `release notes:` label