diff --git a/README.md b/README.md index 072c13b26f7..0a7e8bddb26 100644 --- a/README.md +++ b/README.md @@ -192,7 +192,7 @@ See [examples/models/llama](examples/models/llama/README.md) for complete workfl | macOS | XNNPACK, MPS, Metal *(experimental)* | | Embedded / MCU | XNNPACK, ARM Ethos-U, NXP, Cadence DSP | -See [Backend Documentation](https://docs.pytorch.org/executorch/main/backends-overview.html) for detailed hardware requirements and optimization guides. +See [Backend Documentation](https://docs.pytorch.org/executorch/main/backends-overview.html) for detailed hardware requirements and optimization guides. For desktop/laptop GPU inference with CUDA and Metal, see the [Desktop Guide](desktop/README.md). For Zephyr RTOS integration, see the [Zephyr Guide](zephyr/README.md). ## Production Deployments @@ -204,9 +204,9 @@ ExecuTorch powers on-device AI at scale across Meta's family of apps, VR/AR devi **Multimodal:** [Llava](examples/models/llava/README.md) (vision-language), [Voxtral](examples/models/voxtral/README.md) (audio-language), [Gemma](examples/models/gemma3) (vision-language) -**Vision/Speech:** [MobileNetV2](https://github.com/meta-pytorch/executorch-examples/tree/main/mv2), [DeepLabV3](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3), [Whisper](https://github.com/meta-pytorch/executorch-examples/tree/main/whisper/android/WhisperApp) +**Vision/Speech:** [MobileNetV2](https://github.com/meta-pytorch/executorch-examples/tree/main/mv2), [DeepLabV3](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3), [Whisper](examples/models/whisper/README.md) -**Resources:** [`examples/`](examples/) directory • [executorch-examples](https://github.com/meta-pytorch/executorch-examples) out-of-tree demos • [Optimum-ExecuTorch](https://github.com/huggingface/optimum-executorch) for HuggingFace models +**Resources:** [`examples/`](examples/) directory • [executorch-examples](https://github.com/meta-pytorch/executorch-examples) out-of-tree demos • [Optimum-ExecuTorch](https://github.com/huggingface/optimum-executorch) for HuggingFace models • [Unsloth](https://docs.unsloth.ai/new/deploy-llms-phone) for fine-tuned LLM deployment ## Key Features diff --git a/docs/source/backends-cadence.md b/docs/source/backends-cadence.md index 667e71ea5a4..7fbf00c9f5f 100644 --- a/docs/source/backends-cadence.md +++ b/docs/source/backends-cadence.md @@ -10,6 +10,8 @@ In this tutorial we will walk you through the process of getting setup to build In addition to the chip, the HiFi4 Neural Network Library ([nnlib](https://github.com/foss-xtensa/nnlib-hifi4)) offers an optimized set of library functions commonly used in NN processing that we utilize in this example to demonstrate how common operations can be accelerated. +For an overview of the Cadence ExecuTorch integration with performance benchmarks, see the blog post: [Running Optimized PyTorch Models on Cadence DSPs with ExecuTorch](https://community.cadence.com/cadence_blogs_8/b/ip/posts/running-optimized-pytorch-models-on-cadence-dsps-with-executorch). + On top of being able to run on the Xtensa HiFi4 DSP, another goal of this tutorial is to demonstrate how portable ExecuTorch is and its ability to run on a low-power embedded device such as the Xtensa HiFi4 DSP. This workflow does not require any delegates, it uses custom operators and compiler passes to enhance the model and make it more suitable to running on Xtensa HiFi4 DSPs. A custom [quantizer](https://pytorch.org/tutorials/prototype/quantization_in_pytorch_2_0_export_tutorial.html) is used to represent activations and weights as `uint8` instead of `float`, and call appropriate operators. Finally, custom kernels optimized with Xtensa intrinsics provide runtime acceleration. ::::{grid} 2 diff --git a/docs/source/success-stories.md b/docs/source/success-stories.md index 5b876437580..2845a0fd3f6 100644 --- a/docs/source/success-stories.md +++ b/docs/source/success-stories.md @@ -30,6 +30,7 @@ Powers Instagram, WhatsApp, Facebook, and Messenger with real-time on-device AI **Hardware:** Quest 3, Ray-Ban Meta Smart Glasses, Meta Ray-Ban Display Enables real-time computer vision, hand tracking, voice commands, and translation on power-constrained wearable devices. +[Read Blog →](https://ai.meta.com/blog/executorch-reality-labs-on-device-ai/) ::: :::{grid-item-card} **Liquid AI: Efficient, Flexible On-Device Intelligence** @@ -106,14 +107,39 @@ PyTorch-native quantization and optimization library for preparing efficient mod Optimize LLM fine-tuning with faster training and reduced VRAM usage, then deploy efficiently with ExecuTorch. -[Example Model →](https://huggingface.co/metascroy/Qwen3-4B-int8-int4-unsloth) • [Blog →](https://docs.unsloth.ai/new/quantization-aware-training-qat) +[Example Model →](https://huggingface.co/metascroy/Qwen3-4B-int8-int4-unsloth) • [Blog →](https://docs.unsloth.ai/new/quantization-aware-training-qat) • [Doc →](https://docs.unsloth.ai/new/deploy-llms-phone) ::: :::{grid-item-card} **Ultralytics** :class-header: bg-secondary text-white Deploy on-device inference for Ultralytics YOLO models using ExecuTorch. -[Explore →](https://docs.ultralytics.com/integrations/executorch/) + +[Explore →](https://docs.ultralytics.com/integrations/executorch/) • [Blog →](https://www.ultralytics.com/blog/deploy-ultralytics-yolo-models-using-the-executorch-integration) +::: + +:::{grid-item-card} **Arm ML Embedded Evaluation Kit** +:class-header: bg-secondary text-white + +Build and deploy ML applications on Arm Cortex-M (M55, M85) and Ethos-U NPUs (U55, U65, U85) using ExecuTorch. + +[Explore →](https://gitlab.arm.com/artificial-intelligence/ethos-u/ml-embedded-evaluation-kit) +::: + +:::{grid-item-card} **Alif Semiconductor Ensemble** +:class-header: bg-secondary text-white + +Run generative AI on Ensemble E4/E6/E8 MCUs with Arm Ethos-U85 NPU acceleration. + +[Learn More →](https://alifsemi.com/press-release/alif-semiconductor-elevates-generative-ai-with-support-for-executorch-runtime/) +::: + +:::{grid-item-card} **Digica AI SDK** +:class-header: bg-secondary text-white + +Automate PyTorch model deployment to iOS, Android, and edge devices with ExecuTorch-powered SDK. + +[Blog →](https://www.digica.com/blog/effortless-edge-deployment-of-ai-models-with-digicas-ai-sdk-feat-executorch.html) ::: :::: @@ -126,8 +152,12 @@ Deploy on-device inference for Ultralytics YOLO models using ExecuTorch. - **Voxtral** - Deploy audio-text-input LLM on CPU (via XNNPACK) and on CUDA. [Try →](https://github.com/pytorch/executorch/blob/main/examples/models/voxtral/README.md) +- **Whisper** - Deploy OpenAI's Whisper speech recognition model on CUDA and Metal backends. [Try →](https://github.com/pytorch/executorch/blob/main/examples/models/whisper/README.md) + - **LoRA adapter** - Export two LoRA adapters that share a single foundation weight file, saving memory and disk space. [Try →](https://github.com/meta-pytorch/executorch-examples/tree/main/program-data-separation/cpp/lora_example) - **OpenVINO from Intel** - Deploy [Yolo12](https://github.com/pytorch/executorch/tree/main/examples/models/yolo12), [Llama](https://github.com/pytorch/executorch/tree/main/examples/openvino/llama), and [Stable Diffusion](https://github.com/pytorch/executorch/tree/main/examples/openvino/stable_diffusion) on [OpenVINO from Intel](https://www.intel.com/content/www/us/en/developer/articles/community/optimizing-executorch-on-ai-pcs.html). +- **Audio Generation** - Generate audio from text prompts using Stable Audio Open Small on Arm CPUs with XNNPACK and KleidiAI. [Try →](https://github.com/Arm-Examples/ML-examples/tree/main/kleidiai-examples/audiogen-et) • [Video →](https://www.youtube.com/watch?v=q2P0ESVxhAY) + *Want to showcase your demo? [Submit here →](https://github.com/pytorch/executorch/issues)* diff --git a/examples/models/whisper/README.md b/examples/models/whisper/README.md index 329ef55e8b6..2376cda21d7 100644 --- a/examples/models/whisper/README.md +++ b/examples/models/whisper/README.md @@ -166,3 +166,7 @@ cmake-out/examples/models/whisper/whisper_runner \ --processor_path whisper_preprocessor.pte \ --temperature 0 ``` + +## Mobile Demo + +For an Android demo app, see the [Whisper Android App](https://github.com/meta-pytorch/executorch-examples/tree/main/whisper/android/WhisperApp) in the executorch-examples repository. diff --git a/examples/qualcomm/oss_scripts/llama/README.md b/examples/qualcomm/oss_scripts/llama/README.md index 8b1c188f8a4..489a866db5e 100644 --- a/examples/qualcomm/oss_scripts/llama/README.md +++ b/examples/qualcomm/oss_scripts/llama/README.md @@ -1,6 +1,9 @@ # Summary ## Overview + +**Video Tutorial:** [Build Along: Run LLMs Locally on Qualcomm Hardware Using ExecuTorch](https://www.youtube.com/watch?v=41PKDlGM3oU) + This file provides you the instructions to run LLM Decoder model with different parameters via Qualcomm HTP backend. We currently support the following models: 1. LLAMA2 Stories 110M diff --git a/website/index.html b/website/index.html index ace59ed8b5e..c15069c634a 100644 --- a/website/index.html +++ b/website/index.html @@ -918,7 +918,7 @@