Skip to content

confused with you two evaluation methods #2

@yunzqq

Description

@yunzqq

I feel a little bit confused with you two evaluation methods.
May I know can I directly use the Minilongbench as the test samples for evaluating LLMs.
And we use minilongbench_scorer.py to obtain the final scores?
or any other post-processing is necessary?
It is really confused.
Are the final score stored in (one of the two methods) eval_data//example_minilongbench_scores.pkl?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions