GitHub

PredRAD

High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes–generally known as restriction-site associated DNA sequencing (RAD-seq)–is now one most commonly used strategies to generate single nucleotide polymorphism data in eukaryotes. The choice of restriction enzyme is critical for the design of any RAD-seq study as it determines the number of genetic markers that can be obtained for a given species, and ultimately the success of a project.

For the design of a study using RAD-seq, or a related methodology, there are two general fundamental questions that researchers face: i) what is the best restriction enzyme to use to obtain a desired number of RAD tags in the organism of interest? And ii) how many markers can be obtained with a particular enzyme in the organism of interest? This software pipeline will allow any researcher to obtain an approximate answer to these questions and will help guide the design of any study using RAD sequencing and related methods.

This Git contains the software code and output results from Herrera S., P.H. Reyes-Herrera & T.M. Shank (2014) Genome-wide predictability of restriction sites across the eukaryotic tree of life. bioRxiv preprint doi: http://dx.doi.org/10.1101/007781

Requirements

Python 2.7 and above
Biopython
Bowtie

Install

Download python and shell scritps

For the shell script (change execute permissions using chmod u+x)

Usage

restriction_site_search.sh. This shell script will search all the restriction sites from the input file (patternfilename) in every genome from the input file (genomefilename). As a result the script provides the following files:
- ALL.aligned.txt, ALL.failed.txt, ALL.processed.txt, ALL.suppressed.txt - each file with a table summarizing bowtie output(reads aligned, failed, processed and suppressed) for each genome.
- ALL.count.txt - contains a table with the number of restriciton sites found in each genome
- ALL.size.txt - contains a table with the size of each genome
The input arguments are:
- genomefilename: name of file with table with two columns (1) species code and (2) link to whole genome fasta file (see test/genomeFileExample.txt)
- patternfilename - name of file with table with tow columns (1) restriction site regular expression and (2) restriction site name (see test/Patterns_list.txt)
To run, just write on shell

./restriction_site_search.sh genomefilename patternfilename

obtain_nucleotides_model.py. This python script obtains the nucleotides, dinucleotide and trinucleotides distribution for each genome from the input file (genomefilename)

The input arguments are:
- genomefilename: name of file with table with two columns (1) species code and (2) link to whole genome fasta file (see test/genomeFileExample.txt)
- resultsfile : name of the outputfile
To run, just write on shell

python obtain_nucleotides_model.py genomefilename resultsfile

sequence_probability.py. This python script obtains the probability for each restriction site from the input file (patternfilename) in every genome considering nt, dint and trint frequencies (distributionfile). As a result the script provides the following files:
- $distributionfile$_nt - contains a table with the sequences probabilities (based on nucleotide probabilities)
- $distributionfile$_dint - contains a table with the sequences probabilities (based on dinucleotides probabilities)
- $distributionfile$_trint - contains a table with the sequences probabilities (based on trinucleotides probabilities)
The input arguments are:
- distributionfile - output from genome_nucleotide_distrib_paper (see test/DistributionFile.txt)
- patternfilename - name of file with table with tow columns (1) restriction site regular expression and (2) restriction site name (see test/Patterns_list.txt)
To run, just write on shell

python sequence_probability.py distributionfile patternsfile

License

PredRAD is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
paper_analyses		paper_analyses
paper_outputs		paper_outputs
test		test
working		working
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
obtain_nucleotides_model.py		obtain_nucleotides_model.py
restriction_site_search.sh		restriction_site_search.sh
sequence_probability.py		sequence_probability.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PredRAD

Requirements

Install

Usage

License

About

Uh oh!

Releases

Packages

Languages

License

herreralab/PredRAD

Folders and files

Latest commit

History

Repository files navigation

PredRAD

Requirements

Install

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages