Skip to content
This repository was archived by the owner on May 30, 2024. It is now read-only.
/ Illumina_SIGNAL Public archive

WDL-based analysis of paired-end Illumina SARS-CoV-2 reads using the SIGNAL pipeline

License

Notifications You must be signed in to change notification settings

DNAstack/Illumina_SIGNAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Illumina SARS-CoV-2 data processing using the SIGNAL pipeline

This repository provides a WDL wrapper for running the SIGNAL pipeline to process Illumina paired-end SARS-CoV-2 sequencing data.

Workflow inputs

An input template file with some defaults pre-defined can be found here.

Input Description
accession Sample ID
fastq_R1s, fastq_R2s Array of paired FASTQ file locations; paired files should be at the same index in each array
scheme_bed The BED-format primer scheme used to prepare the library
viral_reference_genome The SARS-CoV-2 reference genome
viral_reference_feature_coords Feature coordinates for the SARS-CoV-2 reference genome
viral_reference_contig_name [MN908947.3]
primer_pairs_tsv Primer pair TSV file; used for iVar's amplicon filter. This file is a headerless TSV containing one row per primer pair, with the LEFT primer names in column 1 and the RIGHT column names in column 2.
amplicon_bed BED-formatted amplicon locations
container_registry Registry that hosts workflow containers. All containers are hosted in DNAstack's Dockerhub [dnastack]

Primer schemes

Primer schemes will differ based on the protocols used by the sequencing lab. Some common schemes can be downloaded from the official artic-network github. Additional schemes can be found in the SIGNAL repository. Primers from these locations map to the inputs as follows:

  • scheme_bed: ends in .primer.bed
  • amplicon_bed: ends in .scheme.bed
  • primer_pairs_tsv: this file is not provided directly, but can be generated from the amplicon_bed file

Example command to generate primer_pairs_tsv using the ARTIC V3 scheme bed:

paste \
	<(cut -f 4 nCoV-2019.scheme.bed | sort -t _ -k 2 -g | grep LEFT) \
	<(cut -f 4 nCoV-2019.scheme.bed | sort -t _ -k 2 -g | grep RIGHT) \
> nCoV-2019.primer_pairs.tsv

Workflow outputs

Output Description
ivar_vcf, ivar_vcf_index Variants and index output by iVar
ivar_assembly Genome assembly generated by iVar
freebayes_vcf, freebayes_vcf_index Variants and index output by Freebayes
freebayes_assembly Genome assembly generated by Freebayes
summary Pipeline metrics
lineage_metadata Pangolin lineage assignment metadata
bam Reads aligned to the SARS-CoV-2 reference genome

Containers

Docker image definitions can be found in our bioinformatics-public-docker-images repo.

All containers are publicly hosted in DNAstack's container registry.

N.B. that the SIGNAL Docker container is ~10 GB to allow it to be used at scale in AWS, where EBS auto-scaling can sometimes not expand rapidly enough to accomodate running hundreds of samples in parallel. Including reference data in the Docker container seems to solve this issue, but does make it somewhat unwieldly.

About

WDL-based analysis of paired-end Illumina SARS-CoV-2 reads using the SIGNAL pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages