[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping#42881
[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping#42881kitaekatt wants to merge 2 commits intohuggingface:mainfrom
Conversation
Testing SummaryUnit test added: Manual verification (before/after comparison): # Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma2", {})
False # ❌ Missing
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma3", {})
False # ❌ Missing
# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma3"]
True # ✅ PresentTest follows the existing |
|
maybe @hmellor could review this one? |
hmellor
left a comment
There was a problem hiding this comment.
This seems reasonable to me.
Could you provide an example reproducer that I can run before/after the fix?
|
@hmellor Here's a reproducer. While testing, I found and fixed an issue with the mapping keys. The Problem: The original PR mapped After stripping the Reproducer: from gguf import GGUFReader
from huggingface_hub import hf_hub_download
from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
gguf_path = hf_hub_download("bartowski/gemma-2-2b-it-GGUF", "gemma-2-2b-it-Q4_K_M.gguf")
reader = GGUFReader(gguf_path)
# Show actual GGUF keys (after stripping architecture prefix)
for key in reader.fields:
if 'softcap' in key.lower():
suffix = key.split('.', 1)[1] # Strip 'gemma2.'
print(f"GGUF: {key} -> mapping key: '{suffix}'")
# Output:
# GGUF: gemma2.attn_logit_softcapping -> mapping key: 'attn_logit_softcapping'
# GGUF: gemma2.final_logit_softcapping -> mapping key: 'final_logit_softcapping'
# Check mapping
print('attn_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2']) # Should be True after fix
print('final_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2']) # Should be True after fixFix pushed (d86c30c): Changed mapping keys to match actual GGUF metadata and added |
d86c30c to
8e69ed1
Compare
a1127d1 to
0ecda9c
Compare
|
Thanks @kitaekatt, did you mean to mark the PR as draft? |
I have been doing additional testing and validation, let me wrap that up! But if you want the fix now feel free to change the status to open or if you can't do that I can do so. |
|
I'm happy to wait for your testing to be complete |
0ecda9c to
c402f38
Compare
Add "attention.logit_softcapping" -> "attn_logit_softcapping" mapping for Gemma2 and Gemma3 architectures in GGUF_CONFIG_MAPPING. This enables proper extraction of the attention logit softcapping parameter from GGUF metadata, which is critical for correct attention score scaling in these models. Without this mapping, GGUF models use the default softcap value (50.0) instead of the actual value stored in the GGUF file. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test_gemma_softcap_config_mapping to verify that GGUF_CONFIG_MAPPING includes the attention.logit_softcapping -> attn_logit_softcapping mapping for both Gemma2 and Gemma3 architectures. Follows existing test_deci_config_mapping pattern.
c402f38 to
2a6d5b8
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: ggml |
Summary
Add
attn_logit_softcappingextraction to GGUF config mapping for Gemma2 and Gemma3 architectures.Problem
When loading Gemma2/Gemma3 GGUF models, the
attn_logit_softcappingparameter is not extracted from GGUF metadata. This causes models to use the default value instead of the actual value stored in the GGUF file.This parameter is critical for attention score scaling and affects model output quality. The llama.cpp GGUF exporter stores this value in the
attention.logit_softcappingfield, but Transformers' GGUF loader doesn't map it to the HuggingFace config attribute.Changes
"attention.logit_softcapping": "attn_logit_softcapping"togemma2mapping inGGUF_CONFIG_MAPPING"attention.logit_softcapping": "attn_logit_softcapping"togemma3mapping inGGUF_CONFIG_MAPPINGtest_gemma_softcap_config_mappingtest (followstest_deci_config_mappingpattern)Testing
Unit Test Added:
test_gemma_softcap_config_mappingintests/quantization/ggml/test_ggml.pyManual Verification (before/after comparison):
Related
This fix enables proper GGUF model loading in downstream projects like vLLM that rely on Transformers' GGUF config extraction.