Bacterial Antimicrobial Resistance annOtation of Genomes - ISOlate whole genome
BALROG-ISO (Bacterial Antimicrobial Resistance annOtation of Genomes - ISOlate whole genome) is a comprehensive high throughput Nextflow pipeline built to utilize next generaion short-reads for the investigation of bacterial antimicrobial resistance (AMR) and its mobility from whole genome sequences of bacterial isolates. While AMR characterization is the main goal of BALROG-ISO, it also provides the taxonomic classification, gene identities, and assignment of gene origin (i.e. plasmid or chromosome) for the submitted isolate(s).
Note
Updates to BALROG-ISO may occur periodically to help continually improve the pipeline. If you have any requests or recommended changes you'd like to see (i.e. usage with other data types), please reach out via email (edwardbirdlab@gmail.com | edwardbird@ksu.edu) or request feature.
If you experience any trouble or find bugs when running BALROG-ISO, please report issues or bugs and they will be addressed as soon as possible.
BALROG-MSR: Bacterial Antimicrobial Resistance annOtation of Genomes - Metagenomic Short Read
BALROG-MON: Bacterial Antimicrobial Resistance annOtation of Genomes - Metagenomic Oxford Nanopore
*See sections below for details on subworkflows
Before you get too far along, familiarize yourself with this section to make sure this is the pipeline for you and your equipment and samples can meet the requirements. (Don't worry, there isn't too much to do).
BALROG-ISO in its current form expects Illuminia/Aviti paired-end, short-read data. BALROG-ISO in its standard configuration will require 100GB of RAM.
Note
If you would like to run BALROG-ISO with long-read data, feel free to request feature.
All dependencies are managed via Docker Containers and hosted on DockerHub. In addion to Nextflow, one of the following container runtime software packages will be required:
- Nextflow (>= 23.04.0.5857) - Install Nextflow
- Docker/Singularity/Apptainer - Install Docker - Install Singularity - Install Apptainer
Preferred Method - Download Release
wget https://github.com/edwardbirdlab/BALROG-ISO/archive/refs/tags/1.0.0.tar.gz
tar -xzf 1.0.0.tar.gzMethod 2 - Clone Repo
git clone https://github.com/edwardbirdlab/BALROG-ISOBALROG-ISO takes a CSV (Comma-Seperated-Value) sheet as the input. Note that the "sample" column will be the prefix of all output files for that sample. This version does not automatically combine reads of the same sample name, so please combine sequencing runs manually before starting the pipeline.
Example Format:
sample,r1,r2
Sample_Name_1,/absolute/path/to/sample1_R1.fastq.gz,/absolute/path/to/sample1_R2.fastq.gz
Sample_Name_2,/absolute/path/to/sample2_R1.fastq.gz,/absolute/path/to/sample2_R2.fastq.gz
When creating a Nextflow config, ensure a container runtime is enabled (Singularity/Apptainer/Docker). If you are using Slurm, you can use the incuded Beocat Slurm config as a template. Most nf-core configs will also be supported. If you have never created a Nextflow config, or are having issues, reach out to your local administration.
Nextflow Configuration - nf-core configs
If you want to change any parameters of BALROG-ISO from its default options, they can be changed using the "nextflow.config" file, or via command line. Configurable parameters will be outlined in the detailed sections below, as well as in the config file.
Required Parameters
--samplesheet /path/to/samplesheet
--run_name "NameOfRun"Optional Parameters
--sequencing_adapter_type illuminaDefines which adapter set to use.
Default: illumina (options = illumina, aviti, custom)
--custom_sequencing_adapter_r1 "ATGCATGC"Sequence of the read 1 adapter.
Default: NaN
--custom_sequencing_adapter_r2 "ATGCATGC"Sequence of the read 2 adapter.
Default: NaN
--fastp_minlen 100The minimum read length.
Default: 100
--fastp_q 20The minimum q-score threshold.
Default: 20
--busco_lineage bacteria_odb10Sets which BUSCO lineage to use. Recommend changing if you have an expected taxon.
Default: bacteria_odb10
--params.plasmer_min_len = 500Sets the minimum sequence length to be included in plasmid prediction. Not recommended to lower below 500.
Default = 500
-- params.plasmer_max_len = 500000Sets the sequence length above which longer sequences are automatically predicted to be chromosomal in origin.
Default = 500000
--amrfinder_lineage EscherichiaEnables species-specific models in AMRFinderPlus. See AMRFinder documentation for supported species and how to supply the name of them.
Default: NaN
--resfinder_lineage "Escherichia coli"Enables species-specific models in ResFinder. See ResFinder documentation for supported species and how to supply the name of them.
Default: NaN
- Running the Whole Pipeline
nextflow run /path/to/edwardbirdlab/BALROG-MON -c /path/to/config.cfg- Generate Multi-QC
nextflow run /path/to/edwardbirdlab/BALROG-MON -c /path/to/config.cfg --workflow-opt multiqcRaw QC
- FastQC : Raw Read
Human Read Removal Tool
- sra-human-scrubber : Masks human sequences in data
Trimming
Final QC
- FastQC : Trimmed Read
Genome Assembly
Assembly Stats
- QUAST : Assembly Metrics Report
Genome Completeness
- BUSCO: Single-Copy Ortholog "Completeness"
Sequence Origin Assignment
- Plasmer : Plasmid prediction
Functional Genome Annotation
MultiAMR Resistance Gene Annotation
- hAMRonization : Unified ARG Results Report from...
1) CARD using RGI
2) AMRFinderPlus
3) ResFinder
As there is currently no paper associated with BALROG-ISO, please cite this Github page. Also, I feel free to contact me (edwardbirdlab@gmail.com | edwardbird@ksu.edu) to let me know!
Many tools are used in this pipeline and its respective options. See 'CITATION.md' for the list of all tools used in this pipeline.
Distributed under the MIT License. See LICENSE for more information.
Edward Bird - - edwardbirdlab@gmail.com | edwardbird@ksu.edu