NMD variant effect prediction

The NMD-Scanner is a Python-based variant effect annotation tool that predicts the likelihood of transcript degradation through nonsense-mediated decay (NMD). It reconstructs reference and alternative coding sequences as well as transcript sequences in some cases, identifies premature termination codons (PTCs), and evaluates canonical and non-canonical NMD escape rules. It can handle single-nucleotide variants, multiple base substitutions, long and short deletions and duplications as well as frameshift variants.

Features

Reconstructs reference and alternative CDS, reference transcript sequence and (in some cases) the alternative transcript sequences with metadata
Detects start / stop-loss and premature termination codons (PTCs) with the exact position in the CDS and in which exon it lies
Computes different NMD-related features:
- Total, upstream and downstream exon count
- Distance of PTC to original stop codon
- Distance of PTC to start codon
- Transcript length
- 3' and 5' UTR lengths
Evaluates five canonical NMD escape rules:
- Last exon rule
- 50nt penultimate rule
- Long exon rule
- Start-proximal rule
- Single-exon rule
Outputs all annotations as a structured DataFrame (CSV)

Installation

git clone https://github.com/gagneurlab/NMD-Scanner.git
cd NMD-Scanner
pip install .

Usage

Option 1: Annotating a VCF on the command line

# if running the script directly
python -m nmd_scanner.cli --vcf input.vcf --gtf annotation.gtf --fasta reference.fa --output results/

# option: fix exon numbering (recommended for hg19)
python -m nmd_scanner.cli --vcf input.vcf --gtf annotation.gtf --fasta reference.fa --output results/ --reassign_exons

Arguments:

--vcf: Path to input VCF (SNVs / Indels supported; frameshifts handled)
--gtf: Path to gene annotation (GTF)
--fasta: Path to reference genome FASTA
--output: Path to an existing directory (or a file path whose parent exists)
--reassign_exons: (flag) Recompute exon numbers (useful for hg19)

Output:

A CSV named <vcf_basename>_final_nmd_results.csv saved to --output, containing:
- reconstructed reference / alternative CDS and transcript sequences(+ metadata)
- PTC detection and start / stop-loss flags
- NMD escape rules
- extra features such as UTR lengths, exon counts, distances, etc.)

Option 2: Import as a python moduele

Instead of running the entire pipeline, you can import NMD-Scanner in Python and call only specific components. This is useful if you want to

only reconstruct transcript / CDS sequences
only compute NMD escape rules
integrate NMD-Scanner into a larger workflow
build custom features

For reconstructing reference and alternative coding and transcript sequences, PTC detection and start / stop-loss information:

import pandas as pd
import pyranges as pr
from pyfaidx import Fasta

import nmd_scanner

vcf = nmd_scanner.read_vcf("input.vcf")
gtf_pr = nmd_scanner.read_gtf("annotation.gtf")
fasta = Fasta("reference.fa")

# Optional: fix exon numbering (recommended for hg19)
gtf_pr = nmd_scanner.compute_exon_numbers(gtf_pr)

gtf_df = gtf_pr.df
cds_df = gtf_df[gtf_df["Feature"] == "CDS"]
exons_df = gtf_df[gtf_df["Feature"] == "exon"].copy()
exons_df["exon_length"] = exons_df["End"] - exons_df["Start"]

results = extract_ptc(cds_df, vcf, fasta, exons_df, output="tmp/")

Add NMD escape rules (last exon rule, 50 nt penultimate rule, long exon rule, start proximal rule, single exon rule, nmd escape) to the above computed results:

nmd_results = results.apply(nmd_scanner.evaluate_nmd_escape_rules, axis=1, result_type='expand')
results = pd.concat([results, nmd_results], axis=1)

Add extra NMD-related features (utr lengths, exon counts, ptc-related features) to above computed results:

extra_features = results.apply(nmd_scanner.add_nmd_features, axis=1, result_type='expand')
results = pd.concat([results, extra_features], axis=1)

License

All source code in this repository is licensed under the MIT License.

Citation

Schröder, C.H. (2025). Enhanced Aberrant Gene Expression Prediction across Human Tissues. Master's Thesis, Technical University of Munich / Ludwig-Maximilians-Universität München.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
resources		resources
scripts		scripts
src/nmd_scanner		src/nmd_scanner
tests		tests
.gitignore		.gitignore
License		License
README.md		README.md
Technical Notes.md		Technical Notes.md
best_model.pkl		best_model.pkl
environment-dev.yaml		environment-dev.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NMD variant effect prediction

Features

Installation

Usage

Option 1: Annotating a VCF on the command line

Option 2: Import as a python moduele

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NMD variant effect prediction

Features

Installation

Usage

Option 1: Annotating a VCF on the command line

Option 2: Import as a python moduele

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages