Large-scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction

NABench

Overview

Variations in nucleotide sequences often lead to significant changes in fitness. Nucleotide Foundation Models (NFMs) have emerged as a new paradigm in fitness prediction, enabling increasingly accurate estimation of fitness directly from sequence. However, assessing the advantages of these models remains challenging due to the use of diverse and specific experimental datasets, and their performance often varies markedly across different nucleic acid families, complicating fair comparisons.

To address this challenge, we introduce NABench, a large-scale, systematic benchmark specifically designed for nucleic acid fitness prediction. NABench integrates 2.6 million mutant sequences from 162 high-throughput assays, covering a wide range of DNA and RNA families. Within a standardized and unified evaluation framework, we rigorously assess 29 representative nucleotide foundation models.

NABench's evaluation covers a variety of complementary scenarios: zero-shot prediction, few-shot adaptation, supervised training, and transfer learning. Our experimental results quantify the heterogeneity in model performance across different tasks and nucleic acid families, revealing the strengths and weaknesses of each model. This curated benchmark lays the groundwork for the development of next-generation nucleotide foundation models, poised to drive impactful applications in cellular biology and nucleic acid drug discovery.

Figure 1: The NABench Benchmark Framework.

Leaderboard

Our comprehensive evaluation reveals a complex and interesting performance landscape where no single model or architectural family dominates across all settings. The most striking finding is a clear performance dichotomy between different architectural families across zero-shot and supervised settings.

In the zero-shot setting, autoregressive models (e.g., GPT-like) and state-space models (e.g., Hyena/Evo series) show a clear advantage.
When labeled data is introduced, in supervised and few-shot scenarios, many BERT-like models demonstrate a remarkable ability to learn, often outperforming the generative models.

This suggests fundamental differences in the nature of the representations learned by these architectures. Detailed performance files and more in-depth analyses (e.g., breakdowns by nucleic acid type, mutational depth) can be found in the benchmarks folder.

Baseline Models

Our benchmark evaluates a total of 29 nucleotide foundation models, which are categorized into four main architectural classes: BERT-like, GPT-like, Hyena, and LLaMA-based.

Model	Params	Max Length	Tokenization	Architecture
LucaVirus	1.8B	1280	Single	BERT
Evo2-7B-base	7B	8192	Single	Hyena
Evo2-7B	7B	131072	Single	Hyena
Evo-1-8k	6.45B	8192	Single	Hyena
Evo-1-8k-base	6.45B	131072	single	Hyena
GENA-LM	336M	512	k-mer	BERT
N.T.v2	500M	2048	k-mer	BERT
N.T.v2	50M	2048	k-mer	BERT
CRAFTS	161M	1024	Single	GPT
LucaOne	1.8B	1280	Single	BERT
AIDO.RNA	1.6B	1024	Single	BERT
BiRNA-BERT	117M	dynamic	BPE	BERT
Evo-1.5	6.45B	131072	Single	Hyena
GenSLM	2.5B	2048	Codon	BERT
HyenaDNA	54.6M	up to 1M	Single	Hyena
N.T.	500M	1000	k-mer	BERT
RFAMLlama	88M	2048	Single	GPT
RNA-FM	99.52M	1024	Single	BERT
RNAErnie	105M	1024	Single	BERT
GenerRNA	350M	dynamic	BPE	GPT
DNABERT	117M	dynamic	k-mer	BERT
RINALMo	650M	1022	Single	BERT
Enformer	251M	196608	Single	BERT
SPACE	588M	131072	Single	BERT
GENERator	3B	16384	6-mer	GPT
RESM	150M	dynamic	Single	BERT
RESM	650M	dynamic	Single	BERT
structRFM	86M	512	Single	BERT

Resources

At the moment, the DMS assay data used in the paper are already available directly in the data directory. For the SELEX data used in the manuscript, we are still finalizing the organization and cleaning of the processed results. We plan to release these data publicly within the next few months.

How to Contribute

New Assays

If you would like to suggest a new fitness dataset to be included in NABench, please open an issue with the new_assay label. We typically consider the following criteria for inclusion:

The corresponding raw dataset must be publicly available.
The assay must be related to nucleic acids (DNA/RNA).
The dataset needs to have a sufficient number of variant measurements.
The assay should have a sufficiently high dynamic range.
The assay must be relevant to fitness prediction.

New Baselines

If you would like to include a new baseline model in NABench, please follow these steps:

Submit a Pull Request containing:

A new subfolder under scripts/ named after your model. This folder should contain a scoring script seq_emb.py and a run script seq_emb.sh, similar to other models in the repository.
All code dependencies required for the scoring script to run properly.

Open an issue with the new_model label, providing instructions on how to download relevant model checkpoints and reporting your model's performance on the relevant benchmark using our performance scripts.

Currently, we are only considering models that meet the following conditions:

The model is able to score all mutants in the relevant benchmark.
The corresponding model is open-source to allow for reproducibility.

Usage and Reproducibility

Environment Setup We recommend using Conda to create and manage your Python environment:

# (Recommended) Create environment with conda
conda create -n nabench python=3.9
conda activate nabench

# Install dependencies with pip
pip install -r requirements.txt

Download Data Download the necessary data from the Resources section above and unzip it into your project's root directory or a specified path.

Generate Sequence Embeddings Our scripts directory provides a standardized embedding extraction pipeline for each model. To generate embeddings for a specific model, run:

# Example for DNABERT
bash scripts/dnabert/seq_emb.sh path/to/input/data.csv path/to/output/embeddings.pt

Please refer to the README or script comments in each model's directory for detailed parameters.

Evaluate Model Performance After generating embeddings/scores for all models, you can use our evaluation scripts to compute performance metrics.

# Example command (specific script to be provided by you)
python evaluate.py --scores_dir path/to/scores --output_dir benchmarks/

This script will generate detailed performance reports, including metrics aggregated by different dimensions (e.g., nucleic acid type, evaluation setting).

Citation

If you find this codebase useful for your research, please consider citing our paper.

@article{nabench,
    title={{NABench}: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction},
    author={Zhongmin Li, Runze Ma, Jiahao Tan, Chengzi Tan, Shuangjia Zheng},
    journal={arXiv preprint arXiv:2511.02888},
    year={2025}
}

Acknowledgements

We thank all the researchers and experimentalists who developed the original assays and foundation models that made this benchmark possible. We also acknowledge the invaluable contributions of the communities behind ProteinGym and RNAGym, which heavily inspired this work.

Please consider citing the corresponding papers of the models and datasets you use from this benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
data		data
scripts		scripts
README.md		README.md
metadata.csv		metadata.csv
metadata_rfam_id.csv		metadata_rfam_id.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large-scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction

NABench

Overview

Leaderboard

Baseline Models

Resources

How to Contribute

New Assays

New Baselines

Usage and Reproducibility

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

mrzzmrzz/NABench

Folders and files

Latest commit

History

Repository files navigation

Large-scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction

NABench

Overview

Leaderboard

Baseline Models

Resources

How to Contribute

New Assays

New Baselines

Usage and Reproducibility

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages