[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping by kitaekatt · Pull Request #42881 · huggingface/transformers

kitaekatt · 2025-12-15T16:57:53Z

Summary

Add attn_logit_softcapping extraction to GGUF config mapping for Gemma2 and Gemma3 architectures.

Problem

When loading Gemma2/Gemma3 GGUF models, the attn_logit_softcapping parameter is not extracted from GGUF metadata. This causes models to use the default value instead of the actual value stored in the GGUF file.

This parameter is critical for attention score scaling and affects model output quality. The llama.cpp GGUF exporter stores this value in the attention.logit_softcapping field, but Transformers' GGUF loader doesn't map it to the HuggingFace config attribute.

Changes

Add "attention.logit_softcapping": "attn_logit_softcapping" to gemma2 mapping in GGUF_CONFIG_MAPPING
Add "attention.logit_softcapping": "attn_logit_softcapping" to gemma3 mapping in GGUF_CONFIG_MAPPING
Add test_gemma_softcap_config_mapping test (follows test_deci_config_mapping pattern)

Testing

Unit Test Added: test_gemma_softcap_config_mapping in tests/quantization/ggml/test_ggml.py

Manual Verification (before/after comparison):

# Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
False  # ❌ Missing

# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True   # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'

Testing Summary

Unit test added: test_gemma_softcap_config_mapping in tests/quantization/ggml/test_ggml.py

Manual verification (before/after comparison):

# Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma2", {})
False  # ❌ Missing
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma3", {})
False  # ❌ Missing

# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True   # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma3"]
True   # ✅ Present

Test follows the existing test_deci_config_mapping pattern.

ydshieh · 2025-12-16T09:30:09Z

maybe @hmellor could review this one?

hmellor

This seems reasonable to me.

Could you provide an example reproducer that I can run before/after the fix?

kitaekatt · 2025-12-16T14:54:28Z

@hmellor Here's a reproducer. While testing, I found and fixed an issue with the mapping keys.

The Problem:

The original PR mapped attention.logit_softcapping, but that key doesn't exist in GGUF metadata. The actual keys are:

gemma2.attn_logit_softcapping = 50.0
gemma2.final_logit_softcapping = 30.0

After stripping the gemma2. prefix, the mapping keys should be attn_logit_softcapping and final_logit_softcapping.

Reproducer:

from gguf import GGUFReader
from huggingface_hub import hf_hub_download
from transformers.integrations.ggml import GGUF_CONFIG_MAPPING

gguf_path = hf_hub_download("bartowski/gemma-2-2b-it-GGUF", "gemma-2-2b-it-Q4_K_M.gguf")
reader = GGUFReader(gguf_path)

# Show actual GGUF keys (after stripping architecture prefix)
for key in reader.fields:
    if 'softcap' in key.lower():
        suffix = key.split('.', 1)[1]  # Strip 'gemma2.'
        print(f"GGUF: {key} -> mapping key: '{suffix}'")

# Output:
#   GGUF: gemma2.attn_logit_softcapping -> mapping key: 'attn_logit_softcapping'
#   GGUF: gemma2.final_logit_softcapping -> mapping key: 'final_logit_softcapping'

# Check mapping
print('attn_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2'])  # Should be True after fix
print('final_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2'])  # Should be True after fix

Fix pushed (d86c30c): Changed mapping keys to match actual GGUF metadata and added final_logit_softcapping.

hmellor · 2025-12-18T14:59:46Z

Thanks @kitaekatt, did you mean to mark the PR as draft?

kitaekatt · 2025-12-18T19:51:46Z

Thanks @kitaekatt, did you mean to mark the PR as draft?

I have been doing additional testing and validation, let me wrap that up!

But if you want the fix now feel free to change the status to open or if you can't do that I can do so.

hmellor · 2025-12-18T20:00:44Z

I'm happy to wait for your testing to be complete

Add "attention.logit_softcapping" -> "attn_logit_softcapping" mapping for Gemma2 and Gemma3 architectures in GGUF_CONFIG_MAPPING. This enables proper extraction of the attention logit softcapping parameter from GGUF metadata, which is critical for correct attention score scaling in these models. Without this mapping, GGUF models use the default softcap value (50.0) instead of the actual value stored in the GGUF file. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add test_gemma_softcap_config_mapping to verify that GGUF_CONFIG_MAPPING includes the attention.logit_softcapping -> attn_logit_softcapping mapping for both Gemma2 and Gemma3 architectures. Follows existing test_deci_config_mapping pattern.

github-actions · 2026-02-04T20:34:41Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: ggml

kitaekatt mentioned this pull request Dec 15, 2025

fix(gguf): Extract attn_logit_softcapping from GGUF metadata vllm-project/vllm#30427

Closed

kitaekatt marked this pull request as ready for review December 15, 2025 17:29

github-actions bot requested review from SunMarc and ydshieh December 15, 2025 17:29

hmellor reviewed Dec 16, 2025

View reviewed changes

kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from d86c30c to 8e69ed1 Compare December 16, 2025 17:45

kitaekatt marked this pull request as draft December 16, 2025 17:45

kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from a1127d1 to 0ecda9c Compare December 16, 2025 17:59

kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from 0ecda9c to c402f38 Compare February 4, 2026 20:33

kitaekatt and others added 2 commits February 4, 2026 14:33

kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from c402f38 to 2a6d5b8 Compare February 4, 2026 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping#42881

[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping#42881
kitaekatt wants to merge 2 commits intohuggingface:mainfrom
kitaekatt:fix/gemma-gguf-attn-logit-softcapping

kitaekatt commented Dec 15, 2025 •

edited

Loading

Uh oh!

kitaekatt commented Dec 15, 2025

Uh oh!

ydshieh commented Dec 16, 2025

Uh oh!

hmellor left a comment

Uh oh!

kitaekatt commented Dec 16, 2025

Uh oh!

hmellor commented Dec 18, 2025

Uh oh!

kitaekatt commented Dec 18, 2025 •

edited

Loading

Uh oh!

hmellor commented Dec 18, 2025

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

kitaekatt commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Testing

Related

Uh oh!

kitaekatt commented Dec 15, 2025

Testing Summary

Uh oh!

ydshieh commented Dec 16, 2025

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

kitaekatt commented Dec 16, 2025

Uh oh!

hmellor commented Dec 18, 2025

Uh oh!

kitaekatt commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmellor commented Dec 18, 2025

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kitaekatt commented Dec 15, 2025 •

edited

Loading

kitaekatt commented Dec 18, 2025 •

edited

Loading