Skip to content

feat: add CPU offload toggle to performance settings#97

Draft
omnificate wants to merge 1 commit intoOverworldai:mainfrom
omnificate:feat/cpu-quantize-setting
Draft

feat: add CPU offload toggle to performance settings#97
omnificate wants to merge 1 commit intoOverworldai:mainfrom
omnificate:feat/cpu-quantize-setting

Conversation

@omnificate
Copy link
Copy Markdown

@omnificate omnificate commented Apr 12, 2026

Adds a CPU Model Loading checkbox in Performance settings that sends cpu_offload in the WebSocket init message. When enabled, the world_engine server builds the model on CPU before moving to GPU, reducing peak VRAM during model initialization. Essential for systems with limited GPU memory.

Changes (11 files, +46/−12):

Layer What changed
Settings schema Top-level cpu_offload: z.boolean().default(false)
WS protocol cpu_offload?: boolean on InitMessage
i18n cpuOffload + cpuOffloadDescription in en/ja/zh/goose
Settings UI Checkbox in Performance section between Quantization and Cap FPS
Change detection Toggling cpu_offload during streaming shows the mode-switch confirmation modal
Lifecycle model key Encodes +cpu0/+cpu1 so toggling triggers intentional reconnect
StreamingContext Passes cpu_offload in sendInit, adds to dependency arrays
Python server Extracts cpu_offload from init message, passes to load_engine()
Engine manager Forwards cpu_offload to WorldEngine() constructor

Companion PR: Overworldai/world_engine#40

@omnificate omnificate marked this pull request as draft April 12, 2026 21:23
Adds a 'CPU Model Loading' checkbox in Performance settings that sends
cpu_offload in the WebSocket init message. When enabled, the world_engine
server builds the model on CPU before moving to GPU, reducing peak VRAM
during initialization. Essential for systems with limited GPU memory.

Changes:
- Top-level cpu_offload setting (default: false)
- Checkbox in Performance section with i18n (en/ja/zh/goose)
- WebSocket init message includes cpu_offload flag
- Lifecycle model key encodes cpu_offload so toggling triggers reconnect
- Mode-switch modal shown when toggling during active streaming
- Server passes cpu_offload through to WorldEngine constructor

Companion PR: Overworldai/world_engine#40
@omnificate omnificate force-pushed the feat/cpu-quantize-setting branch from c73f5b9 to d629fb3 Compare April 12, 2026 21:42
@omnificate omnificate changed the title feat: add CPU Quantize toggle to experimental settings feat: add CPU offload toggle to performance settings Apr 12, 2026
@lapp0
Copy link
Copy Markdown

lapp0 commented Apr 13, 2026

I'm curious - could you please share VRAM / CPU memory utilization metrics? The model is eventually loaded in vram regardless, but this might save some memory since we'd patch the model on CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants