m5-docs

Large Language Models

Model	Context Length	Quantization	First Token Latency (ms)	Generation Speed (Token/s)	Running Device
DeepSeek-R1-Distill-Qwen-1.5B	128	W8A16	1075.04	3.57	Module LLM Kit / LLM630 Compute Kit
DeepSeek-R1-Distill-Qwen-1.5B	256	W8A16	3056.86	3.57	Module LLM Kit / LLM630 Compute Kit
DeepSeek-R1-Distill-Qwen-1.5B	256	W4A16	-	13.29	LLM8850
Llama-3.2-1B-Instruct	128	W8A16	891	4.48	Module LLM Kit / LLM630 Compute Kit
Llama-3.2-1B-Instruct	256	W8A16	2601.11	4.49	Module LLM Kit / LLM630 Compute Kit
MiniCPM4-0.5B	512	W8A16	212.91	21.05	LLM8850
openbuddy-llama3.2-1b-v23.1-131k	128	W8A16	891.02	4.52	Module LLM Kit / LLM630 Compute Kit
Qwen2.5-0.5B-Instruct	128	W8A16	359.8	10.32	Module LLM Kit / LLM630 Compute Kit
Qwen2.5-0.5B-Instruct	256	W8A16	1126.19	10.3	Module LLM Kit / LLM630 Compute Kit
Qwen2.5-0.5B-Instruct	128	W4A16	442.95	12.52	Module LLM Kit / LLM630 Compute Kit
Qwen2.5-0.5B-Instruct	128	W4A16	140.17	37.11	AI Pyramid
Qwen2.5-0.5B-Instruct	128	W4A16	-	27.05	LLM8850
Qwen2.5-1.5B-Instruct	128	W8A16	3056.54	3.57	Module LLM Kit / LLM630 Compute Kit
Qwen2.5-1.5B-Instruct	128	W4A16	1219.54	4.63	Module LLM Kit / LLM630 Compute Kit
Qwen2.5-1.5B-Instruct	128	W4A16	289.06	16.77	AI Pyramid
Qwen2.5-1.5B-Instruct	128	W4A16	-	15.06	LLM8850
Qwen2.5-3B-Instruct	128	W4A16	550.3	9.46	AI Pyramid
Qwen2.5-0.5B-Instruct	1024	W8A16	533.19	9.76	Module LLM Kit / LLM630 Compute Kit
Qwen2.5-0.5B-Instruct	1024	W8A16	143.02	25.5	AI Pyramid
Qwen2.5-0.5B-Instruct	-	-	8210	1.54	RaspberryPi5 CPU (ollama)
Qwen3-0.6B	128	W8A16	361.81	10.28	Module LLM Kit / LLM630 Compute Kit
Qwen3-0.6B	2048	W8A16	670.51	12.88	LLM8850
Qwen3-1.7B	2048	W8A16	796.38	7.38	LLM8850

Multimodal Large Language Models

Model	Context Length	Quantization	First Token Latency (ms)	Image Size	Image Encoding Time (ms)	Generation Speed (Token/s)	Running Device
InternVL2_5-1B-MPO	256	W8A16	1117.27	364	1164.61	10.56	Module LLM Kit / LLM630 Compute Kit
InternVL2_5-1B-MPO	256	W8A16	433.87	448	362.22	29.48	AI Pyramid
InternVL3-1B	1024	W8A16	534.95	448	2267.89	9.78	Module LLM Kit / LLM630 Compute Kit
InternVL3-1B	2048	W8A16	142.32	448	393.08	26.67	AI Pyramid
InternVL3-1B	1024	W8A16	-	448	-	-	LLM8850
Qwen2.5-VL-3B-Instruct	512	W8A16	558.68	308	773.95	4.81	LLM8850
Qwen3-VL-2B-Instruct	1152	W8A16	159.79	384	190.73	11.93	AI Pyramid
Qwen3-VL-2B-Instruct	1152	W8A16	-	384	191.65	7.8	LLM8850
Qwen3-VL-2B-Instruct	-	-	24909	-	-	0.42	RaspberryPi5 CPU (ollama)

Speech Models

Model	Input Audio Length (s)	Real-Time Factor	Running Device
SenseVoiceSmall	10	0.061	AI Pyramid
SenseVoiceSmall	10	0.015	LLM8850

Model	Real-Time Factor	Running Device
CosyVoice2-0.5B	1.36	AI Pyramid
CosyVoice2-0.5B	1.73	LLM8850

Vision Models

Model	Resolution	Inference FPS	Running Device
YOLO26n	640	118	Module LLM Kit / LLM630 Compute Kit
YOLO26n	640	649	AI Pyramid
YOLO26n	640	645	LLM8850
YOLO26n	640	3.47	RaspberryPi5 CPU (torch)
YOLO26n	640	7.4	RaspberryPi5 CPU (onnx)
YOLO26n	640	15.8	RaspberryPi5 CPU (ncnn)

Next Overview

Devices & Quick Start

Module LLM

LLM630 Compute Kit

Models

Qwen2.5

Qwen3

DeepSeek-R1

SmolVLM

MeloTTS

Whisper

Llama

Applications

Audio

CV Vision Application

Vision Language Model (VLM)

Large Language Model (LLM)

Voice Assistant

OpenAI API

Large Language Models

Multimodal Large Language Models

Speech Models

Vision Models

On This Page