Skip to content

Comments

[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping#42881

Draft
kitaekatt wants to merge 2 commits intohuggingface:mainfrom
kitaekatt:fix/gemma-gguf-attn-logit-softcapping
Draft

[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping#42881
kitaekatt wants to merge 2 commits intohuggingface:mainfrom
kitaekatt:fix/gemma-gguf-attn-logit-softcapping

Conversation

@kitaekatt
Copy link

@kitaekatt kitaekatt commented Dec 15, 2025

Summary

Add attn_logit_softcapping extraction to GGUF config mapping for Gemma2 and Gemma3 architectures.

Problem

When loading Gemma2/Gemma3 GGUF models, the attn_logit_softcapping parameter is not extracted from GGUF metadata. This causes models to use the default value instead of the actual value stored in the GGUF file.

This parameter is critical for attention score scaling and affects model output quality. The llama.cpp GGUF exporter stores this value in the attention.logit_softcapping field, but Transformers' GGUF loader doesn't map it to the HuggingFace config attribute.

Changes

  • Add "attention.logit_softcapping": "attn_logit_softcapping" to gemma2 mapping in GGUF_CONFIG_MAPPING
  • Add "attention.logit_softcapping": "attn_logit_softcapping" to gemma3 mapping in GGUF_CONFIG_MAPPING
  • Add test_gemma_softcap_config_mapping test (follows test_deci_config_mapping pattern)

Testing

Unit Test Added: test_gemma_softcap_config_mapping in tests/quantization/ggml/test_ggml.py

Manual Verification (before/after comparison):

# Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
False  # ❌ Missing

# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True   # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'

Related

This fix enables proper GGUF model loading in downstream projects like vLLM that rely on Transformers' GGUF config extraction.

@kitaekatt
Copy link
Author

Testing Summary

Unit test added: test_gemma_softcap_config_mapping in tests/quantization/ggml/test_ggml.py

Manual verification (before/after comparison):

# Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma2", {})
False  # ❌ Missing
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma3", {})
False  # ❌ Missing

# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True   # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma3"]
True   # ✅ Present

Test follows the existing test_deci_config_mapping pattern.

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 16, 2025

maybe @hmellor could review this one?

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me.

Could you provide an example reproducer that I can run before/after the fix?

@kitaekatt
Copy link
Author

@hmellor Here's a reproducer. While testing, I found and fixed an issue with the mapping keys.

The Problem:

The original PR mapped attention.logit_softcapping, but that key doesn't exist in GGUF metadata. The actual keys are:

gemma2.attn_logit_softcapping = 50.0
gemma2.final_logit_softcapping = 30.0

After stripping the gemma2. prefix, the mapping keys should be attn_logit_softcapping and final_logit_softcapping.

Reproducer:

from gguf import GGUFReader
from huggingface_hub import hf_hub_download
from transformers.integrations.ggml import GGUF_CONFIG_MAPPING

gguf_path = hf_hub_download("bartowski/gemma-2-2b-it-GGUF", "gemma-2-2b-it-Q4_K_M.gguf")
reader = GGUFReader(gguf_path)

# Show actual GGUF keys (after stripping architecture prefix)
for key in reader.fields:
    if 'softcap' in key.lower():
        suffix = key.split('.', 1)[1]  # Strip 'gemma2.'
        print(f"GGUF: {key} -> mapping key: '{suffix}'")

# Output:
#   GGUF: gemma2.attn_logit_softcapping -> mapping key: 'attn_logit_softcapping'
#   GGUF: gemma2.final_logit_softcapping -> mapping key: 'final_logit_softcapping'

# Check mapping
print('attn_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2'])  # Should be True after fix
print('final_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2'])  # Should be True after fix

Fix pushed (d86c30c): Changed mapping keys to match actual GGUF metadata and added final_logit_softcapping.

@kitaekatt kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from d86c30c to 8e69ed1 Compare December 16, 2025 17:45
@kitaekatt kitaekatt marked this pull request as draft December 16, 2025 17:45
@kitaekatt kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from a1127d1 to 0ecda9c Compare December 16, 2025 17:59
@hmellor
Copy link
Member

hmellor commented Dec 18, 2025

Thanks @kitaekatt, did you mean to mark the PR as draft?

@kitaekatt
Copy link
Author

kitaekatt commented Dec 18, 2025

Thanks @kitaekatt, did you mean to mark the PR as draft?

I have been doing additional testing and validation, let me wrap that up!

But if you want the fix now feel free to change the status to open or if you can't do that I can do so.

@hmellor
Copy link
Member

hmellor commented Dec 18, 2025

I'm happy to wait for your testing to be complete

@kitaekatt kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from 0ecda9c to c402f38 Compare February 4, 2026 20:33
kitaekatt and others added 2 commits February 4, 2026 14:33
Add "attention.logit_softcapping" -> "attn_logit_softcapping" mapping
for Gemma2 and Gemma3 architectures in GGUF_CONFIG_MAPPING.

This enables proper extraction of the attention logit softcapping
parameter from GGUF metadata, which is critical for correct attention
score scaling in these models.

Without this mapping, GGUF models use the default softcap value (50.0)
instead of the actual value stored in the GGUF file.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test_gemma_softcap_config_mapping to verify that GGUF_CONFIG_MAPPING
includes the attention.logit_softcapping -> attn_logit_softcapping mapping
for both Gemma2 and Gemma3 architectures.

Follows existing test_deci_config_mapping pattern.
@kitaekatt kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from c402f38 to 2a6d5b8 Compare February 4, 2026 20:33
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: ggml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants