FastqGen

A Java-based FASTQ file generator for bioinformatics testing and development.

Overview

FastqGen is a lightweight tool that generates simulated FASTQ files with random DNA sequences. FASTQ files are text-based format for storing both biological sequence data (usually nucleotide sequence) and its corresponding quality scores. This tool is particularly useful for:

Testing bioinformatics pipelines
Benchmarking sequence analysis tools
Development and debugging of NGS data processing applications
Educational purposes in bioinformatics

Features

Generates valid FASTQ format files
Command-line configurable sequence length and file size
Random DNA sequence generation using standard nucleotides (A, T, C, G)
Phred quality score simulation
Automatic timestamp-based file naming

Usage

The program requires two command-line arguments:

java -jar fastqgen.jar <sequenceLength> <fileSizeInMB>

Parameters:

sequenceLength: Length of each DNA sequence in base pairs (e.g., 100)
fileSizeInMB: Desired output file size in megabytes (e.g., 10)

Example:

java -jar fastqgen.jar 150 20    # Generates sequences of 150bp with a total file size of 20MB

The output filename is automatically generated with format: simulated_YYYYMMDDHHMMSS.fastq

Output Format

Each entry in the generated FASTQ file consists of four lines:

Sequence identifier (starts with '@')
Raw sequence letters
Separator line (starts with '+')
Quality scores (encoded in ASCII)

Example:

@SEQ1
ATCGATCG...
+
IIIIIII...

Building and Running

Prerequisites

Java JDK 8 or higher
Maven (for building)

Building the Project

mvn clean package

Running the Application

After building, you can run the application using the generated jar in the target directory:

java -jar fastqgen.jar <sequenceLength> <fileSizeInMB>

Example:

java -jar fastqgen.jar 100 10    # Generates 100bp sequences in a 10MB file

Technical Details

Quality scores are generated using Phred-like quality values
Sequences are randomly generated using equal probabilities for A, T, C, G
Each generated file includes a unique timestamp in its name for easy identification

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/main/java/net/noisynarwhal/fastqgen		src/main/java/net/noisynarwhal/fastqgen
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastqGen

Overview

Features

Usage

Output Format

Building and Running

Prerequisites

Building the Project

Running the Application

Technical Details

License

Contributing

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

david-ta-ming/fastqgen

Folders and files

Latest commit

History

Repository files navigation

FastqGen

Overview

Features

Usage

Output Format

Building and Running

Prerequisites

Building the Project

Running the Application

Technical Details

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages