-
Notifications
You must be signed in to change notification settings - Fork 821
Qualcomm AI Engine Direct - GA Static Granite3.3-2b #15808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15808
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 33576ac with merge base 3e90b44 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
The label |
This PR needs a
|
examples/models/llama/model_args.py
Outdated
| model_architecture: str = ( | ||
| "LlamaForCausalLM" # This setting is currently only supported for the QNN backend | ||
| ) | ||
| attention_multiplier: Optional[float] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments to explain the new arguments for attention_multiplier, logits_scaling and residual_multiplier? I understand we don't have them for the other params, but let's start document them to make it more scalable and readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for your suggestion.
cccclai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution!
### Summary Add Granite3.3-2b support. Source model: <img width="957" height="1047" alt="image" src="https://github.com/user-attachments/assets/d17dd15c-ffc1-43e9-9e57-7794a63d8a5d" /> <img width="1734" height="947" alt="image" src="https://github.com/user-attachments/assets/45f1e80d-95e7-4865-9dfc-1e04d3eb90e4" /> Static llama: `python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H mlgtw-linux -s c3b39f15 -m SM8650 --temperature 0 --model_mode kv --max_seq_len 1024 --prefill_ar_len 128 --decoder_model granite_3_3-2b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --run_lm_eval --task hellaswag --limit 10 --artifact llama_qnn --kv_updater shift_pointer` #### Accuracy(hellaswag)(limit=10) prepare_pt2e: {'acc_norm,none': 0.5} convert_pt2e: {'acc_norm,none': 0.3} device: {'acc_norm,none': 0.2} #### Statistics on SM8650(16a4w_block64) <img width="1167" height="395" alt="image" src="https://github.com/user-attachments/assets/42fcd93f-546f-4884-9540-07a89729acb2" /> #### Statistics on SM8750(16a4w_block64) <img width="1313" height="485" alt="image" src="https://github.com/user-attachments/assets/7ab11855-60a8-4b09-9e20-1f93069ccc11" /> ### Test plan ` python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_granite_3_3_2b_instruct --device c3b39f15 --host mlgtw-linux --model SM8650 --build_folder build-android --executorch_root . --artifact_dir ./llama_qnn --llama_artifacts llama_qnn ` cc @cccclai @cbilgin
Summary
Add Granite3.3-2b support.
Source model:


Static llama:
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H mlgtw-linux -s c3b39f15 -m SM8650 --temperature 0 --model_mode kv --max_seq_len 1024 --prefill_ar_len 128 --decoder_model granite_3_3-2b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --run_lm_eval --task hellaswag --limit 10 --artifact llama_qnn --kv_updater shift_pointerAccuracy(hellaswag)(limit=10)
prepare_pt2e: {'acc_norm,none': 0.5}
convert_pt2e: {'acc_norm,none': 0.3}
device: {'acc_norm,none': 0.2}
Statistics on SM8650(16a4w_block64)
Statistics on SM8750(16a4w_block64)
Test plan
python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_granite_3_3_2b_instruct --device c3b39f15 --host mlgtw-linux --model SM8650 --build_folder build-android --executorch_root . --artifact_dir ./llama_qnn --llama_artifacts llama_qnncc @cccclai @cbilgin