DNAMutationReport

A Tool for analysis of tumor mutation burden

We present here the program and the results of our research for constructing a feasibly running algorithm that once given a collection of unorganised reads (strings of nucleobases A, C, G, T) from a tumour tissue and a healthy tissue of the same specimen, the algorithm searches for mutations of certain types. Which, in turn, when evaluating the number of mutations, categorizing mutations and identifying precise changes in comparison to a normal genome, can lead to better detection, diagnosis and treatment provided healthcare. The algorithm uses the reads to partially assemble the genome, with the help of an external existing de-novo assembly program. Once the reads are mapped into longer sequences, called ‘contigs’, the algorithm uses a dictionary type data-structure to reduce the number of comparisons between them.

Tools we used:

Minia assembler - A short-read assembler based on a de Bruijn graph, the output is a set of contigs. see more at Minia page.

minia command line we used:

./minia -in reads.fa -kmer-size 24 -abundance-min 3 -out output_prefix

The main parameters are:

reads.fa* – the input file(s)
kmer-size 24 – k-mer length (integer), the number may vary depending on user choice
abundance-min 3 - hard cut-off to remove likely erroneous, low-abundance k-mers
output_prefix – any prefix string to store output contigs as well as temporary files for this assembly

edit-distance - Python module for computing edit distances and alignments between sequences. see more at edit-distancw page.

How to Use:

in 'main-code' folder run:

python3.6 run_compare_tissues helathy_file_path tumor_file_path output_prefix(optional) test(optional) test_num(optional)

The parameters are:

helathy_file_path, tumor_file_path - contigs file in FASTA format
test - this variable designed to assist in the software development process. If 'test' argument exist then the software will only run up to test_num contigs.
test_num - int (optional) this parameter used only in case 'test' argument exist

Results:

The program outputs three reports:

An object of mutation_distance containing quantities of point-mutations divided according to their types and nucleotides, in addition to the percentage of mutation per strings’ length.
Diagrams per mutation type (png files)
Sampling file that is a collection of already compared strings and the distance between them to illustrate mutations that were found in the genome (txt file).

For example - the replaces diagram as retulted in experiment on 10,000 contigs. The string 'AC' represents that 'A' was in the healthy tissue and was replaced with 'C' in the tumor tissue and so on.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
histogram		histogram
main-code		main-code
open-sources-examples		open-sources-examples
preprocessing		preprocessing
scripts-back-up		scripts-back-up
statistic-results		statistic-results
Identifying and Categorising Single Point Mutations of Tumour Cells in Whole Human Genome.pdf		Identifying and Categorising Single Point Mutations of Tumour Cells in Whole Human Genome.pdf
README.md		README.md
results_eample.JPG		results_eample.JPG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNAMutationReport

A Tool for analysis of tumor mutation burden

Tools we used:

How to Use:

Results:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DNAMutationReport

A Tool for analysis of tumor mutation burden

Tools we used:

How to Use:

Results:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages