This repository provides a pipeline for analyzing RNA-seq data to identify and quantify splice site usage (SSU), map genetic variants to transcripts, and train neural networks to predict splice sites and their usage.
- Python: 3.7-3.10
- GCC: Tested with GCC 11.1.0.
- Note: Older GCC versions might not support the required C++ standards required by RegTools (e.g., C++11).
- CUDA: 11.2 (for TensorFlow 2.10.0)
- cuDNN: 8.1
- Note: Ensure your CUDA/cuDNN versions are compatible with your TensorFlow version. Refer to the TensorFlow GPU support guide for compatibility details.
Install SpliSER and python requirements.
conda env create -f environment.yml
conda activate proc-rnaseq
cd pipeline
git clone git@github.com:NNeuralDynamics/SpliSER.git
Contains the modified SpliceAI model integrated with SSU regression. Includes training and testing scripts, as well as tools for calculating evaluation metrics.
Nextflow pipeline and supporting scripts to process RNA-Seq data in BAM format to find splice-sites and SSU values in each sample and combine to create data for machine learning.