Skip to content

2bit e2e example error #2

@123tab

Description

@123tab

我尝试在e2e branch中使用2bit进行测试,但发现e2e branch中只支持4bit量化,于是我参照main branch的代码,解除了bit_decode中对于2bit量化的注释,然后使用evaluation/example.py进行测试.
具体的指令为:python example.py --model_path Llama-3.1-8B-Instruct --max_length 131072 --num_bits 2 --quant_mode k-channel --group_size 128 --attn_backend bit_decoding.
结果发现2bit量化后模型的输出质量极差,作为对比4bit模型的回答质量较好.
但这个是令人疑惑的,因为residual_block_size在2bit情况下是设置为了256,这个窗口应该为模型留下了一定的上下文信息.
请问是e2e branch中2bit量化的代码存在问题吗?
还有问题在于,Bitdecoding能否支持BF16精度,因为类似于Qwen2.5-7B/14B/32B等模型,使用FP16推理可能会出现数值溢出.

2bit输出:
Answer to 1
Let's assume you get a nice reward you for a 1
In the next wave of 0 The question is a tricky one to answer, as a general rule that for any given, 0
A question like, "Whoa, No, Not now Let's get you Let's see, you're doing a very bad way of doing it Let's give you 5 Any number of 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

4bit输出:
Answer: Arnel gave his friends 5 * 8 = 40 pencils in total.
He had 10 pencils for himself, so in total, he had 40 + 10 = 50 pencils.
Since he had 10 boxes, there were 50 / 10 = 5 pencils in each box.
The final answer is 5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions