2bit e2e example error

我尝试在e2e branch中使用2bit进行测试,但发现e2e branch中只支持4bit量化,于是我参照main branch的代码,解除了bit_decode中对于2bit量化的注释,然后使用evaluation/example.py进行测试.
具体的指令为:python example.py --model_path Llama-3.1-8B-Instruct  --max_length 131072  --num_bits 2  --quant_mode k-channel  --group_size 128  --attn_backend bit_decoding.
结果发现2bit量化后模型的输出质量极差,作为对比4bit模型的回答质量较好.
但这个是令人疑惑的,因为residual_block_size在2bit情况下是设置为了256,这个窗口应该为模型留下了一定的上下文信息.
请问是e2e branch中2bit量化的代码存在问题吗?
还有问题在于,Bitdecoding能否支持BF16精度,因为类似于Qwen2.5-7B/14B/32B等模型,使用FP16推理可能会出现数值溢出.

2bit输出:
Answer to 1
Let's assume you get a nice reward you for a 1
In the next wave of 0  The question is a tricky one to answer, as a general rule that for any given, 0
A question like, "Whoa, No, Not now  Let's get you  Let's see, you're doing a very bad way of doing it  Let's give you 5  Any number of 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

4bit输出:
Answer: Arnel gave his friends 5 * 8 = 40 pencils in total.
He had 10 pencils for himself, so in total, he had 40 + 10 = 50 pencils.
Since he had 10 boxes, there were 50 / 10 = 5 pencils in each box.
The final answer is 5. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2bit e2e example error #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

2bit e2e example error #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions