-
Notifications
You must be signed in to change notification settings - Fork 134
Description
System Info
I could successfully run evalution scripts.
I downloaded the model weights using huggingface_hub python packages using following codes.
from huggingface_hub import snapshot_download
` # Define model paths
models = {
"math-shepherd-mistral-7b-prm": "/workspace/openr/model/math-shepherd-mistral-7b-prm",
"mistral-7b-sft": "/workspace/openr/model/mistral-7b-sft",
}
' # Download each model
for model_name, local_dir in models.items():
print(f"Downloading {model_name} to {local_dir}")
snapshot_download(repo_id=f"peiyi9979/{model_name}", local_dir=local_dir, local_files_only=False)`
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the codebase (such as scrips/, ...)
- My own task or dataset (give details below)
Reproduction
-
I just downloaded the models using huggingface.
-
Then after just changing model path and name, I ran sh reason/llm_service/create_service_math_shepherd.sh to launch FastChat serving system.
-
I modified /reason/evaluate.py to run just 1 request not the whole datasets using following codes.
` def evaluate_single_request(
method_name: str, solver_fn: Callable, save_dir: Optional[Path] = None) -> Dict[str, Any]:
test_ds = task.test_ds
# Select the first request
problem_inst = test_ds[10]
print(f"Evaluating single request: {problem_inst['question']}")
# Solve the problem instance
output = solver_fn(problem_inst, llm_gen_fn, rm_call)
print(f"Result: {output.solutions}")
return output`
- Then I ran given script to check its functionality. sh scripts/eval/vanila_mcts.sh & sh scripts/eval/beam_search.sh
Expected behavior
Generation by VLLMRemoteCaller is strange.
It produces whole output, not the step by step output.
So for me, it produces bunch of warnings which tells me that PRM value length does not matches with action history.
Also, It dose not show any expected reasoning behavior.
In my opinion, LM model doesn't recognize the lm_step_tag well.
This is log of mine.
`
root@f7320dc3b737:/workspace/openr.code# sh scripts/eval/beam_search.sh
/usr/local/lib/python3.10/dist-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
lm_step_tag ки
Evaluating single request: What is the least positive integer multiple of 30 that can be written with only the digits 0 and 2?
2025-01-03 04:11:42.457 | WARNING | reason.guided_search.tree:_expand_leaf_node:692 - PRM value length not match with action history. len(prm)=4, len(act_hist)=0 s:
<|im_start|>system
Please reason step by step, and put your final answer within \boxed{{}}.<|im_end|>
<|im_start|>user
What is the least positive integer multiple of 30 that can be written with only the digits 0 and 2?<|im_end|>
<|im_start|>assistant
a:
{'action': 'We want to find the least positive integer multiple of 30 that can be written with only the digits 0 and 2. ки\nStep 1: The smallest multiple of 30 is 30 itself, but it contains the digit 0. ки\nStep 2: The next multiple of 30 is 60, which contains the digit 0 again. ки\nStep 3: The next multiple of 30 is 90, which contains the digit 0 again. ки\nStep 4: The next multiple of 30 is 120, which contains the digit 0 again. ки\nStep 5: Finally, the next multiple of 30 is 150, which only contains the digits 0 and 2. ки\nStep 6: Therefore, the least positive integer multiple of 30 that can be written with only the digits 0 and 2 is
rs:[0.8916015625, 0.60400390625, 0.607421875, 0.54296875, 0.4765625, 0.611328125, 0.57763671875]
2025-01-03 04:11:42.457 | WARNING | reason.guided_search.tree:_expand_leaf_node:692 - PRM value length not match with action history. len(prm)=4, len(act_hist)=0 s:
<|im_start|>system
Please reason step by step, and put your final answer within \boxed{{}}.<|im_end|>
<|im_start|>user
What is the least positive integer multiple of 30 that can be written with only the digits 0 and 2?<|im_end|>
<|im_start|>assistant
a:
{'action': 'We know that a number is divisible by 30 if it is divisible by both 2 and 3. ки\nStep 1: Since we can only use the digits 0 and 2, we need to find the smallest multiple of 30 that only contains these digits. ки\nStep 2: To be divisible by 2, the number must end in 0 or 2. ки\nStep 3: To be divisible by 3, the sum of its digits must be divisible by 3. ки\nStep 4: The smallest multiple of 30 that satisfies these conditions is 20. ки\nStep 5: Therefore, the least positive integer multiple of 30 that can be written with only the digits 0 and 2 is
rs:[0.5849609375, 0.5390625, 0.640625, 0.64794921875, 0.23095703125, 0.06658935546875]
2025-01-03 04:11:42.457 | WARNING | reason.guided_search.tree:_expand_leaf_node:692 - PRM value length not match with action history. len(prm)=4, len(act_hist)=0 s:
<|im_start|>system
Please reason step by step, and put your final answer within \boxed{{}}.<|im_end|>
<|im_start|>user
What is the least positive integer multiple of 30 that can be written with only the digits 0 and 2?<|im_end|>
<|im_start|>assistant
a:
{'action': 'We want to find the smallest multiple of 30 that can be written with only the digits 0 and 2. ки\nStep 1: The smallest multiple of 30 that we can think of is 30 itself, which is not a valid answer since it contains the digit 1. ки\nStep 2: The next multiple of 30 is 60, which only contains the digits 0 and 2. ки\nStep 3: Therefore, the least positive integer multiple of 30 that can be written with only the digits 0 and 2 is
rs:[0.8759765625, 0.57373046875, 0.46484375, 0.259765625]
2025-01-03 04:11:42.457 | WARNING | reason.guided_search.tree:_expand_leaf_node:692 - PRM value length not match with action history. len(prm)=4, len(act_hist)=0 s:
<|im_start|>system
Please reason step by step, and put your final answer within \boxed{{}}.<|im_end|>
<|im_start|>user
What is the least positive integer multiple of 30 that can be written with only the digits 0 and 2?<|im_end|>
<|im_start|>assistant
a:
{'action': 'We want to find the smallest multiple of 30 that can be written using only the digits 0 and 2. ки\nStep 1: Since 30 is divisible by 2 and 3, our number must also be divisible by 2 and 3. ки\nStep 2: To be divisible by 2, the units digit must be even, so it must be 0 or 2. ки\nStep 3: To be divisible by 3, the sum of its digits must be divisible by 3. ки\nStep 4: The only possible sum of digits using only 0 and 2 is 0, 2, or 4. ки\nStep 5: However, we want the smallest multiple, so we choose the sum of digits to be 0. ки\nStep 6: The units digit must be 0, and the tens digit must be 2. ки\nStep 7: Therefore, the least positive integer multiple of 30 that can be written with only the digits 0 and 2 is \boxed{20}. The answer is: 20 ки\n\n', 'prob': 0.2400245956047538, 'num_token': 248, 'finish_reason': 'stop'}
rs:[0.8759765625, 0.74609375, 0.71875, 0.7373046875, 0.60009765625, 0.60400390625, 0.6513671875, 0.1143798828125]
Result: ['We want to find the least positive integer multiple of 30 that can be written with only the digits 0 and 2. ки\nStep 1: The smallest multiple of 30 is 30 itself, but it contains the digit 0. ки\nStep 2: The next multiple of 30 is 60, which contains the digit 0 again. ки\nStep 3: The next multiple of 30 is 90, which contains the digit 0 again. ки\nStep 4: The next multiple of 30 is 120, which contains the digit 0 again. ки\nStep 5: Finally, the next multiple of 30 is 150, which only contains the digits 0 and 2. ки\nStep 6: Therefore, the least positive integer multiple of 30 that can be written with only the digits 0 and 2 is
`