[Bugfix] Skip missing parameters during GGUF Gemma2 weight loading#30699
[Bugfix] Skip missing parameters during GGUF Gemma2 weight loading#30699kitaekatt wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a KeyError during GGUF weight loading for Gemma2 models. The issue arises when the GGUF file contains quantization metadata (like qweight_type) for embedding layers, but the corresponding parameters are not defined in the vLLM model, which is initialized to expect unquantized embedding weights.
The proposed fix is to add a check in Gemma2Model.load_weights to skip any weight from the checkpoint that does not have a corresponding parameter in the model. This is a robust and clean solution that prevents the crash and aligns with existing defensive patterns in the codebase for loading weights.
I've analyzed the potential side-effects, such as whether this could lead to embedding weights not being loaded. Given the author's confirmation that the model produces coherent output after this change, I'm confident that the fix is correct and complete for the issue at hand. The implementation is straightforward and I have no further suggestions for improvement.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
df8cfb1 to
0cd5abe
Compare
0cd5abe to
68dba93
Compare
The GGUF loader yields quantization metadata parameters (qweight_type) for all quantized tensors, including embeddings. However, VocabParallelEmbedding doesn't have these parameters, causing a KeyError when loading GGUF Gemma2 models. This adds a safety check to skip parameters not present in the model, matching the pattern already used in llama.py (lines 502-503). Fixes KeyError: 'embed_tokens.qweight_type' during engine core init. Signed-off-by: Christina <truffle@gmail.com>
68dba93 to
d2b03c6
Compare
Summary
Skip parameters not present in the model during GGUF weight loading, fixing
KeyError: 'embed_tokens.qweight_type'when loading GGUF Gemma2 models.Root Cause
The GGUF loader yields quantization metadata parameters (
qweight_type) for all quantized tensors, including embeddings. However,VocabParallelEmbeddingdoesn't have these parameters, causing a KeyError during engine core initialization.Changes
Gemma2Model.load_weights()to skip parameters not inparams_dictllama.py(lines 502-503)Testing
🤖 Generated with Claude Code