Skip to content

Comments

[Bugfix] Skip missing parameters during GGUF Gemma2 weight loading#30699

Open
kitaekatt wants to merge 1 commit intovllm-project:mainfrom
kitaekatt:fix/gemma2-skip-missing-params
Open

[Bugfix] Skip missing parameters during GGUF Gemma2 weight loading#30699
kitaekatt wants to merge 1 commit intovllm-project:mainfrom
kitaekatt:fix/gemma2-skip-missing-params

Conversation

@kitaekatt
Copy link
Contributor

Summary

Skip parameters not present in the model during GGUF weight loading, fixing KeyError: 'embed_tokens.qweight_type' when loading GGUF Gemma2 models.

Root Cause

The GGUF loader yields quantization metadata parameters (qweight_type) for all quantized tensors, including embeddings. However, VocabParallelEmbedding doesn't have these parameters, causing a KeyError during engine core initialization.

Changes

  • Add safety check in Gemma2Model.load_weights() to skip parameters not in params_dict
  • Matches existing pattern in llama.py (lines 502-503)

Testing

  • Tested with GGUF Gemma2 models that previously failed with KeyError
  • Model loads successfully and generates coherent output

🤖 Generated with Claude Code

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a KeyError during GGUF weight loading for Gemma2 models. The issue arises when the GGUF file contains quantization metadata (like qweight_type) for embedding layers, but the corresponding parameters are not defined in the vLLM model, which is initialized to expect unquantized embedding weights.

The proposed fix is to add a check in Gemma2Model.load_weights to skip any weight from the checkpoint that does not have a corresponding parameter in the model. This is a robust and clean solution that prevents the crash and aligns with existing defensive patterns in the codebase for loading weights.

I've analyzed the potential side-effects, such as whether this could lead to embedding weights not being loaded. Given the author's confirmation that the model produces coherent output after this change, I'm confident that the fix is correct and complete for the issue at hand. The implementation is straightforward and I have no further suggestions for improvement.

@kitaekatt kitaekatt marked this pull request as ready for review December 15, 2025 20:29
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@kitaekatt kitaekatt marked this pull request as draft December 15, 2025 20:37
@kitaekatt kitaekatt force-pushed the fix/gemma2-skip-missing-params branch from df8cfb1 to 0cd5abe Compare December 29, 2025 20:46
@mergify mergify bot added the bug Something isn't working label Jan 14, 2026
@kitaekatt kitaekatt force-pushed the fix/gemma2-skip-missing-params branch from 0cd5abe to 68dba93 Compare January 19, 2026 17:27
The GGUF loader yields quantization metadata parameters (qweight_type)
for all quantized tensors, including embeddings. However,
VocabParallelEmbedding doesn't have these parameters, causing a
KeyError when loading GGUF Gemma2 models.

This adds a safety check to skip parameters not present in the model,
matching the pattern already used in llama.py (lines 502-503).

Fixes KeyError: 'embed_tokens.qweight_type' during engine core init.

Signed-off-by: Christina <truffle@gmail.com>
@kitaekatt kitaekatt marked this pull request as ready for review February 4, 2026 22:46
@kitaekatt kitaekatt force-pushed the fix/gemma2-skip-missing-params branch from 68dba93 to d2b03c6 Compare February 4, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant