A large-scale 7B pretraining language model developed by BaiChuan-Inc.
-
Updated
Jul 18, 2024 - Python
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
A series of large language models developed by Baichuan Intelligent Technology
A 13B large language model developed by Baichuan Intelligent Technology
[NeurIPS 2023 Spotlight] In-Context Impersonation Reveals Large Language Models' Strengths and Biases
[NeurIPS 2025] AGI-Elo: How Far Are We From Mastering A Task?
CLI tool to evaluate LLM factuality on MMLU benchmark.
Code and data accompanying the article "The impact of quantising a small open source LLM". This repository explores how quantisation affects performance, VRAM usage, and inference speed in Qwen3 1.7B.
An easy-to-use and standardised framework for evaluating Large Language Models (LLMs) on the Massive Multitask Language Understanding (MMLU) dataset. Currently supported: Hugging Face transformer models and Bedrock models.
Enterprise-grade LLM evaluation framework | Multi-model benchmarking, honest dashboards, system profiling | Academic metrics: MMLU, TruthfulQA, HellaSwag | Zero fake data | PyPI: llm-benchmark-toolkit | Blog: https://dev.to/nahuelgiudizi/building-an-honest-llm-evaluation-framework-from-fake-metrics-to-real-benchmarks-2b90
Add a description, image, and links to the mmlu topic page so that developers can more easily learn about it.
To associate your repository with the mmlu topic, visit your repo's landing page and select "manage topics."