Skip to content

mrzzmrzz/NABench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large-scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction

NABench

Overview

Variations in nucleotide sequences often lead to significant changes in fitness. Nucleotide Foundation Models (NFMs) have emerged as a new paradigm in fitness prediction, enabling increasingly accurate estimation of fitness directly from sequence. However, assessing the advantages of these models remains challenging due to the use of diverse and specific experimental datasets, and their performance often varies markedly across different nucleic acid families, complicating fair comparisons.

To address this challenge, we introduce NABench, a large-scale, systematic benchmark specifically designed for nucleic acid fitness prediction. NABench integrates 2.6 million mutant sequences from 162 high-throughput assays, covering a wide range of DNA and RNA families. Within a standardized and unified evaluation framework, we rigorously assess 29 representative nucleotide foundation models.

NABench's evaluation covers a variety of complementary scenarios: zero-shot prediction, few-shot adaptation, supervised training, and transfer learning. Our experimental results quantify the heterogeneity in model performance across different tasks and nucleic acid families, revealing the strengths and weaknesses of each model. This curated benchmark lays the groundwork for the development of next-generation nucleotide foundation models, poised to drive impactful applications in cellular biology and nucleic acid drug discovery.

Figure 1: The NABench Benchmark Framework.

The NABench Benchmark Framework

Leaderboard

Our comprehensive evaluation reveals a complex and interesting performance landscape where no single model or architectural family dominates across all settings. The most striking finding is a clear performance dichotomy between different architectural families across zero-shot and supervised settings.

  • In the zero-shot setting, autoregressive models (e.g., GPT-like) and state-space models (e.g., Hyena/Evo series) show a clear advantage.
  • When labeled data is introduced, in supervised and few-shot scenarios, many BERT-like models demonstrate a remarkable ability to learn, often outperforming the generative models.

This suggests fundamental differences in the nature of the representations learned by these architectures. Detailed performance files and more in-depth analyses (e.g., breakdowns by nucleic acid type, mutational depth) can be found in the benchmarks folder.

Baseline Models

Our benchmark evaluates a total of 29 nucleotide foundation models, which are categorized into four main architectural classes: BERT-like, GPT-like, Hyena, and LLaMA-based.

Model Params Max Length Tokenization Architecture
LucaVirus 1.8B 1280 Single BERT
Evo2-7B-base 7B 8192 Single Hyena
Evo2-7B 7B 131072 Single Hyena
Evo-1-8k 6.45B 8192 Single Hyena
Evo-1-8k-base 6.45B 131072 single Hyena
GENA-LM 336M 512 k-mer BERT
N.T.v2 500M 2048 k-mer BERT
N.T.v2 50M 2048 k-mer BERT
CRAFTS 161M 1024 Single GPT
LucaOne 1.8B 1280 Single BERT
AIDO.RNA 1.6B 1024 Single BERT
BiRNA-BERT 117M dynamic BPE BERT
Evo-1.5 6.45B 131072 Single Hyena
GenSLM 2.5B 2048 Codon BERT
HyenaDNA 54.6M up to 1M Single Hyena
N.T. 500M 1000 k-mer BERT
RFAMLlama 88M 2048 Single GPT
RNA-FM 99.52M 1024 Single BERT
RNAErnie 105M 1024 Single BERT
GenerRNA 350M dynamic BPE GPT
DNABERT 117M dynamic k-mer BERT
RINALMo 650M 1022 Single BERT
Enformer 251M 196608 Single BERT
SPACE 588M 131072 Single BERT
GENERator 3B 16384 6-mer GPT
RESM 150M dynamic Single BERT
RESM 650M dynamic Single BERT
structRFM 86M 512 Single BERT

Resources

At the moment, the DMS assay data used in the paper are already available directly in the data directory. For the SELEX data used in the manuscript, we are still finalizing the organization and cleaning of the processed results. We plan to release these data publicly within the next few months.

How to Contribute

New Assays

If you would like to suggest a new fitness dataset to be included in NABench, please open an issue with the new_assay label. We typically consider the following criteria for inclusion:

  1. The corresponding raw dataset must be publicly available.
  2. The assay must be related to nucleic acids (DNA/RNA).
  3. The dataset needs to have a sufficient number of variant measurements.
  4. The assay should have a sufficiently high dynamic range.
  5. The assay must be relevant to fitness prediction.

New Baselines

If you would like to include a new baseline model in NABench, please follow these steps:

  1. Submit a Pull Request containing:
  • A new subfolder under scripts/ named after your model. This folder should contain a scoring script seq_emb.py and a run script seq_emb.sh, similar to other models in the repository.
  • All code dependencies required for the scoring script to run properly.
  1. Open an issue with the new_model label, providing instructions on how to download relevant model checkpoints and reporting your model's performance on the relevant benchmark using our performance scripts.

Currently, we are only considering models that meet the following conditions:

  1. The model is able to score all mutants in the relevant benchmark.
  2. The corresponding model is open-source to allow for reproducibility.

Usage and Reproducibility

Environment Setup We recommend using Conda to create and manage your Python environment:

# (Recommended) Create environment with conda
conda create -n nabench python=3.9
conda activate nabench

# Install dependencies with pip
pip install -r requirements.txt

Download Data Download the necessary data from the Resources section above and unzip it into your project's root directory or a specified path.

Generate Sequence Embeddings Our scripts directory provides a standardized embedding extraction pipeline for each model. To generate embeddings for a specific model, run:

# Example for DNABERT
bash scripts/dnabert/seq_emb.sh path/to/input/data.csv path/to/output/embeddings.pt

Please refer to the README or script comments in each model's directory for detailed parameters.

Evaluate Model Performance After generating embeddings/scores for all models, you can use our evaluation scripts to compute performance metrics.

# Example command (specific script to be provided by you)
python evaluate.py --scores_dir path/to/scores --output_dir benchmarks/

This script will generate detailed performance reports, including metrics aggregated by different dimensions (e.g., nucleic acid type, evaluation setting).

Citation

If you find this codebase useful for your research, please consider citing our paper.

@article{nabench,
    title={{NABench}: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction},
    author={Zhongmin Li, Runze Ma, Jiahao Tan, Chengzi Tan, Shuangjia Zheng},
    journal={arXiv preprint arXiv:2511.02888},
    year={2025}
}

Acknowledgements

We thank all the researchers and experimentalists who developed the original assays and foundation models that made this benchmark possible. We also acknowledge the invaluable contributions of the communities behind ProteinGym and RNAGym, which heavily inspired this work.

Please consider citing the corresponding papers of the models and datasets you use from this benchmark.

About

The NABench project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •