Add Qwen3.5 MoE (35B-A3B) model export and runner for CUDA backend by mergennachin · Pull Request #18169 · pytorch/executorch

mergennachin · 2026-03-13T20:04:20Z

Memory-efficient loading using meta-device construction + lazy
safetensors shard-by-shard loading + assign=True state dict loading,
following the voxtral_realtime pattern. Peak CPU memory during loading
is ~1x model size instead of ~3x.

Expert weights are structured as grouped nn.Linear modules (16 groups
of 16 experts each) so quantize_model_() handles them automatically.
Layer-by-layer quantization on CUDA avoids loading the full bf16 model
onto GPU at once.

Includes C++ runner using the shared TextLLMRunner, Makefile target,
and CMake presets.

Reference implementations:

https://github.com/mergennachin/nano_qwen35_moe/
vLLM: vllm/model_executor/models/qwen3_5.py

Memory-efficient loading using meta-device construction + lazy safetensors shard-by-shard loading + assign=True state dict loading, following the voxtral_realtime pattern. Peak CPU memory during loading is ~1x model size instead of ~3x. Expert weights are structured as grouped nn.Linear modules (16 groups of 16 experts each) so quantize_model_() handles them automatically. Layer-by-layer quantization on CUDA avoids loading the full bf16 model onto GPU at once. Includes C++ runner using the shared TextLLMRunner, Makefile target, and CMake presets. Reference implementations: - https://github.com/mergennachin/nano_qwen35_moe/ - vLLM: vllm/model_executor/models/qwen3_5.py

pytorch-bot · 2026-03-13T20:04:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18169

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 78a940d with merge base e458023 ():

NEW FAILURES - The following jobs have failed:

Build Cadence / cpu-x86 / build (gh)
RuntimeError: Command docker exec -t 08e8ae112d897801d08dc544bfa97f83d3085e6679451f9c8d746eefceb15d81 /exec failed with exit code 1
Lint / link-check / lint-urls (gh)
Process completed with exit code 1.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Test CUDA Windows Export and E2E / export-model-cuda-windows-artifact (mistralai, Voxtral-Mini-3B-2507, non-quantized) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-13T20:05:17Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 13, 2026

mergennachin requested review from JacobSzwejbka and digantdesai March 13, 2026 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 MoE (35B-A3B) model export and runner for CUDA backend#18169

Add Qwen3.5 MoE (35B-A3B) model export and runner for CUDA backend#18169
mergennachin wants to merge 1 commit intomainfrom
mnachin/qwen3_5_moe

mergennachin commented Mar 13, 2026

Uh oh!

pytorch-bot bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mergennachin commented Mar 13, 2026

Uh oh!

pytorch-bot bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18169

❌ 2 New Failures, 1 Unrelated Failure

Uh oh!

github-actions bot commented Mar 13, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot bot commented Mar 13, 2026 •

edited

Loading

This PR needs a `release notes:` label