-
Notifications
You must be signed in to change notification settings - Fork 22
Description
I’m reading both the SIGCOMM’24 paper and the codebase. The paper (§5.2 and Fig. 6) describes a change-based / delta encoding strategy: grouping tokens and storing the KV of the first token in each group as an anchor, then encoding only the deltas relative to that anchor.
However, in the current implementation, I could not locate such delta computation or storage. It seems the KV tensors are directly quantized and compressed without this differential step.
Is delta-based encoding still part of the latest method?
Was it removed/changed for practical reasons (accuracy / complexity / GPU optimization)?
If removed — does the reported performance in the paper include delta encoding or reflect the current code?
If I missed where it’s implemented, could you please point me to the right place?
Thanks again for the excellent work and for open-sourcing CacheGen.