Skip to content

A Java-based tool for generating simulated FASTQ files with configurable sequence lengths and file sizes. Ideal for testing bioinformatics pipelines, benchmarking sequence analysis tools, and NGS application development.

License

Notifications You must be signed in to change notification settings

david-ta-ming/fastqgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastqGen

A Java-based FASTQ file generator for bioinformatics testing and development.

Overview

FastqGen is a lightweight tool that generates simulated FASTQ files with random DNA sequences. FASTQ files are text-based format for storing both biological sequence data (usually nucleotide sequence) and its corresponding quality scores. This tool is particularly useful for:

  • Testing bioinformatics pipelines
  • Benchmarking sequence analysis tools
  • Development and debugging of NGS data processing applications
  • Educational purposes in bioinformatics

Features

  • Generates valid FASTQ format files
  • Command-line configurable sequence length and file size
  • Random DNA sequence generation using standard nucleotides (A, T, C, G)
  • Phred quality score simulation
  • Automatic timestamp-based file naming

Usage

The program requires two command-line arguments:

java -jar fastqgen.jar <sequenceLength> <fileSizeInMB>

Parameters:

  • sequenceLength: Length of each DNA sequence in base pairs (e.g., 100)
  • fileSizeInMB: Desired output file size in megabytes (e.g., 10)

Example:

java -jar fastqgen.jar 150 20    # Generates sequences of 150bp with a total file size of 20MB

The output filename is automatically generated with format: simulated_YYYYMMDDHHMMSS.fastq

Output Format

Each entry in the generated FASTQ file consists of four lines:

  1. Sequence identifier (starts with '@')
  2. Raw sequence letters
  3. Separator line (starts with '+')
  4. Quality scores (encoded in ASCII)

Example:

@SEQ1
ATCGATCG...
+
IIIIIII...

Building and Running

Prerequisites

  • Java JDK 8 or higher
  • Maven (for building)

Building the Project

mvn clean package

Running the Application

After building, you can run the application using the generated jar in the target directory:

java -jar fastqgen.jar <sequenceLength> <fileSizeInMB>

Example:

java -jar fastqgen.jar 100 10    # Generates 100bp sequences in a 10MB file

Technical Details

  • Quality scores are generated using Phred-like quality values
  • Sequences are randomly generated using equal probabilities for A, T, C, G
  • Each generated file includes a unique timestamp in its name for easy identification

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

A Java-based tool for generating simulated FASTQ files with configurable sequence lengths and file sizes. Ideal for testing bioinformatics pipelines, benchmarking sequence analysis tools, and NGS application development.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages