The Tandem Repeat Analysis and Comparison Kit (TRACK) is an automated Linux-based Snakemake workflow designed to identify and compare tandem repeats (TRs) across species, and genotype catalogs in population-wide data. The pipeline includes scripts for creating and filtering TR catalogs from reference genomes, generating catalogs of putative homologous TRs between species pairs, and performing population-level genotyping and basic population genetics analyses. Additionally, TRACK features tools for visualizing TR length comparisons between species and essential population genetic metrics, such as genetic diversity and observed heterozygosity.
Installation and set up
# Clone TRACK repository:
git clone https://github.com/caroladam/track.git
cd track
# Create and activate the conda environment:
conda env create -f environment.yml
conda activate track_env
# Get necessary files to run examples:
bash ./setup.sh
Updating TRACK
If you already have track_env in your conda environments but need to update to a new version:
conda env update --name track_env --file environment.yml --prune
TRACK is a Linux-based tool. While most required dependencies should work on MacOS via conda-forge or bioconda, some exceptions may require installation via Homebrew or manual setup. While we are not formally supporting TRACK use on MacOS, we are providing guidelines and tips to help MacOS users utilize TRACK's functionalities.
# Clone TRACK repository
git clone https://github.com/caroladam/track.git
cd track
# Create and activate the conda environment:
conda env create -f environment.yml
conda activate track_env
# Get necessary files to run examples:
bash ./setup.sh
# If not already, install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Libraries that require Homebrew installation
brew install gawk gcc coreutils postgresql libpq gnu-sed gnu-tar
# Software that require Homebrew installation
brew install ucsc-kent-tools emboss
Tools that require manual installation
- Tandem Repeat Finder (TRF) - necessary for TR catalog building. Pre-compiled versions and installation instructions here!
- Tandem Repeat Genotyping Tool (TRGT) - necessary for TR genotyping on long-read data. Source code and instructions available here!
Each directory within the repository contains example input data, allowing you to perform test runs and familiarize yourself with TRACK's functionalities.
track/
├── environment.yml
├── LICENSE
├── README.md
├── setup.sh
├── genotype
│ ├── config.yaml
│ ├── data
│ ├── scripts
│ └── Snakefile
├── homology
│ ├── config.yaml
│ ├── data
│ ├── scripts
│ └── Snakefile
├── manual
│ ├── example_plots
│ ├── track_workflow.png
│ └── user_manual.md
├── popgen_analysis
│ ├── config.yaml
│ ├── data
│ ├── scripts
│ └── Snakefile
└── tr_catalog
├── config.yaml
├── data
├── scripts
└── Snakefile
14 directories, 14 files
To perform test runs, enter the subdirectories and type:
snakemake --cores <integer>
For detailed instructions on setting up configuration files and executing the pipeline with your data, please refer to the user's manual
You can download the catalogs of TRs identified in T2T genomes of ape species using TRACK in the links below. Reference genomes used to create TR catalogs were obtained from the T2T Consortium Primate Project v2.0 and CHM13 Project v2.0.
Filtered catalogs can be downloaded here:
- Homo sapiens
- Pan troglodytes
- Pan paniscus
- Gorilla gorilla
- Pongo abelii
- Pongo pygmaeous
- Symphalangus syndactylus
An additional TR catalog for the Rhesus macaque is now available. The reference genome used to create this catalog was obtained from the T2T-MMU8 QV100 project:
This repository is constantly being developed and improved; users may encounter changes and updates. We recommend regularly checking for updates and reviewing the documentation to ensure optimal pipeline usage.
Send your questions or suggestions to carolinaladam@gmail.com
