Question about delta-based encoding missing in implementation

I’m reading both the SIGCOMM’24 paper and the codebase. The paper (§5.2 and Fig. 6) describes a change-based / delta encoding strategy: grouping tokens and storing the KV of the first token in each group as an anchor, then encoding only the deltas relative to that anchor.

However, in the current implementation, I could not locate such delta computation or storage. It seems the KV tensors are directly quantized and compressed without this differential step.

Is delta-based encoding still part of the latest method?
Was it removed/changed for practical reasons (accuracy / complexity / GPU optimization)?
If removed — does the reported performance in the paper include delta encoding or reflect the current code?

If I missed where it’s implemented, could you please point me to the right place?

Thanks again for the excellent work and for open-sourcing CacheGen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about delta-based encoding missing in implementation #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about delta-based encoding missing in implementation #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions