Async Soprano TTS Service

A production-ready Text-to-Speech microservice built around the Soprano TTS engine with asynchronous job processing and dual quota tracking.

Features

✅ Asynchronous Processing — Submit jobs and receive webhook notifications when complete
✅ Dual Quota System — Track usage by both characters and audio seconds
✅ GPU-Accelerated — Fast inference with warm model (stays loaded in memory)
✅ Scalable Architecture — Redis queue + multiple workers support
✅ RESTful API — Clean FastAPI endpoints with automatic documentation
✅ Production-Ready — Error handling, retry logic, logging, health checks
✅ Docker Support — CPU and GPU Docker Compose configurations included
✅ Comprehensive Tests — Full pytest suite with in-memory SQLite fixtures

Architecture

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ POST /api/v1/tts/jobs
       ▼
┌─────────────────────┐
│   FastAPI Server    │ ← Validates request, checks quota
└──────┬──────────────┘
       │ Enqueue job
       ▼
┌─────────────────────┐
│   Redis Queue (RQ)  │
└──────┬──────────────┘
       │
       ▼
┌─────────────────────┐
│   Worker + GPU      │ ← Warm model, generates audio
│  (Soprano TTS)      │
└──────┬──────────────┘
       │
       ├─► Save WAV to storage/
       └─► Send webhook notification

Components:

FastAPI Server — API endpoints, authentication, quota management
PostgreSQL — User data, job tracking, quota counters
Redis + RQ — Job queue for async processing
Worker (SimpleWorker) — GPU-powered TTS generation with warm model kept in-process
Storage — Local filesystem for audio files (WAV format)

Quick Start

Prerequisites

Python 3.13+
PostgreSQL 15+
Redis 7+
NVIDIA GPU (optional but recommended)
CUDA 12.1+ (if using GPU)

Installation

Clone and create virtual environment:

cd async-soprano

# Using UV (recommended)
uv venv && source .venv/bin/activate

# Or using plain venv
python -m venv .venv
source .venv/bin/activate

Install dependencies:

# With UV (recommended — uses uv.lock for deterministic installs)
uv sync

# Or with pip
pip install .

# Include dev/test dependencies
uv sync --group dev

Configure environment (optional):

The application uses Pydantic Settings and will work with sensible defaults. You can override any setting via environment variables or a .env file:

# All variables are optional — defaults are shown below
POSTGRES_USER=tts_user
POSTGRES_PASSWORD=tts_pass
POSTGRES_DB=tts_db
POSTGRES_HOST=localhost
POSTGRES_PORT=5432

REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0

STORAGE_PATH=./storage
MODEL_NAME=ekwek/Soprano-1.1-80M
CHARACTERS_PER_SECOND=16.88

DEFAULT_CHARACTER_QUOTA=100000
DEFAULT_SECONDS_QUOTA=6000.0

GPU_DEVICE=0
LOG_LEVEL=INFO

Initialize database:

# Create tables
python scripts/init_db.py

# Seed test users and get API keys
python scripts/seed_users.py

Running the Service

You'll need 2 terminals (plus a running PostgreSQL and Redis):

Terminal 1 — API Server:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 — Worker:

# With GPU
CUDA_VISIBLE_DEVICES=0 python -m app.worker

# Without GPU (slower)
python -m app.worker

The service is now running:

API: http://localhost:8000
API Docs: http://localhost:8000/docs
Health Check: http://localhost:8000/health

Docker Deployment

The project includes full Docker Compose support for both CPU and GPU modes.

# CPU mode
make build && make up

# GPU mode (requires NVIDIA Container Toolkit)
make build-gpu && make up-gpu

# Initialize DB and seed users
make init-db
make seed

See README.docker.md for the full Docker guide.

Usage

1. Create a TTS Job

curl -X POST "http://localhost:8000/api/v1/tts/jobs" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world, this is a test of the text to speech service.",
    "webhook_url": "http://localhost:5001/webhook"
  }'

Response:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "character_count": 63,
  "estimated_audio_seconds": 5.0,
  "message": "Job created and queued for processing"
}

2. Check Job Status

curl "http://localhost:8000/api/v1/tts/jobs/550e8400-e29b-41d4-a716-446655440000" \
  -H "X-API-Key: YOUR_API_KEY"

Response:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "text": "Hello world...",
  "audio_file_path": "/storage/550e8400-e29b-41d4-a716-446655440000.wav",
  "character_count": 63,
  "estimated_audio_seconds": 5.0,
  "actual_audio_seconds": 4.92,
  "created_at": "2026-02-06T10:00:00Z",
  "completed_at": "2026-02-06T10:00:05Z",
  "quota": {
    "characters": {
      "used": 63,
      "limit": 100000,
      "remaining": 99937
    },
    "seconds": {
      "used": 5.0,
      "limit": 6000.0,
      "remaining": 5995.0
    }
  }
}

3. Download Audio

curl -H "X-API-Key: YOUR_API_KEY" \
  "http://localhost:8000/api/v1/tts/storage/550e8400-e29b-41d4-a716-446655440000.wav" \
  -o output.wav

4. Webhook Notification

When a job completes (or fails), a POST request is sent to the provided webhook_url:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "audio_file_path": "/storage/550e8400-e29b-41d4-a716-446655440000.wav",
  "audio_duration_seconds": 4.92,
  "character_count": 63,
  "completed_at": "2026-02-06T10:00:05Z"
}

Webhook delivery includes retry logic (up to 3 attempts with exponential backoff).

API Endpoints

Jobs

Method	Endpoint	Description
POST	`/api/v1/tts/jobs`	Create new TTS job
GET	`/api/v1/tts/jobs/{job_id}`	Get job status and details
GET	`/api/v1/tts/storage/{job_id}.wav`	Download audio file

System

Method	Endpoint	Description
GET	`/`	Service information
GET	`/health`	Health check (DB, Redis, storage)
GET	`/queue/stats`	Queue statistics (pending, active, failed jobs)
GET	`/docs`	Interactive API documentation (Swagger UI)

Authentication: All /api/v1/tts/* endpoints require the X-API-Key header.

Quota System

The service tracks usage with dual quotas:

Character Quota

Counted immediately when a job is created
Numbers in text are expanded to words before counting (e.g. "123" → "one hundred twenty-three")
Default limit: 100,000 characters

Seconds Quota

Estimated at job creation using configurable chars-per-second ratio (default: 16.88)
Adjusted after actual audio generation to reflect true duration
Default limit: 6,000 seconds

Both quotas must be sufficient to accept a job. If either is exceeded, the request is rejected with 429 Too Many Requests.

If a job fails, both character and seconds quotas are refunded automatically.

Testing

The project has a comprehensive pytest test suite covering all modules:

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run with coverage report
pytest --cov=app --cov-report=term-missing

# Run a specific test file
pytest tests/test_api.py

Tests use an in-memory SQLite database and mocked Redis/RQ so no external services are required.

Running Tests in Docker

make test          # Run tests in Docker
make test-cov      # Run tests with coverage in Docker

Scaling

Single Machine — Multiple Workers

Run multiple workers on the same GPU:

CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=0 python -m app.worker &

Workers share the GPU and process jobs in parallel from the same queue.

Multiple GPUs

CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=1 python -m app.worker &

Multi-Machine

All machines connect to the same PostgreSQL and Redis instances via environment variables:

# Machine 1
POSTGRES_HOST=db-server REDIS_HOST=redis-server python -m app.worker

# Machine 2
POSTGRES_HOST=db-server REDIS_HOST=redis-server python -m app.worker

Workers automatically distribute the workload via the shared Redis queue.

Docker Scaling

docker compose up --scale worker=3 -d

Configuration

All configuration is managed through environment variables (or a .env file) via Pydantic Settings. Connection URLs are built automatically from their components.

Variable	Default	Description
`POSTGRES_USER`	`tts_user`	PostgreSQL username
`POSTGRES_PASSWORD`	`tts_pass`	PostgreSQL password
`POSTGRES_DB`	`tts_db`	PostgreSQL database name
`POSTGRES_HOST`	`localhost`	PostgreSQL host
`POSTGRES_PORT`	`5432`	PostgreSQL port
`REDIS_HOST`	`localhost`	Redis host
`REDIS_PORT`	`6379`	Redis port
`REDIS_DB`	`0`	Redis database number
`STORAGE_PATH`	`./storage`	Audio file storage directory
`DEFAULT_CHARACTER_QUOTA`	`100000`	Default character quota per user
`DEFAULT_SECONDS_QUOTA`	`6000.0`	Default audio seconds quota per user
`MODEL_NAME`	`ekwek/Soprano-1.1-80M`	HuggingFace model name
`CHARACTERS_PER_SECOND`	`16.88`	Chars/second ratio for duration estimation
`GPU_DEVICE`	`0`	CUDA device ID
`LOG_LEVEL`	`INFO`	Logging level
`API_HOST`	`0.0.0.0`	API bind host
`API_PORT`	`8000`	API bind port

Project Structure

async-soprano/
├── app/
│   ├── main.py                 # FastAPI application & lifespan
│   ├── worker.py               # RQ SimpleWorker with warm model
│   ├── config.py               # Pydantic Settings configuration
│   ├── database.py             # SQLAlchemy engine & session
│   ├── models.py               # SQLAlchemy ORM models (User, TTSJob)
│   ├── schemas.py              # Pydantic request/response schemas
│   ├── queue.py                # Redis/RQ queue setup
│   ├── api/
│   │   ├── deps.py             # FastAPI dependencies (auth)
│   │   └── jobs.py             # Job endpoints (create, get, download)
│   ├── services/
│   │   ├── tts_service.py      # Soprano TTS model wrapper
│   │   ├── quota_service.py    # Quota check, reserve, adjust, refund
│   │   ├── webhook_service.py  # Webhook delivery with retries
│   │   └── storage_service.py  # Audio file storage management
│   └── utils/
│       ├── logger.py           # Logging configuration
│       └── exceptions.py       # Custom exception classes
├── scripts/
│   ├── init_db.py              # Initialize database tables
│   ├── seed_users.py           # Create test users with API keys
│   └── quota/                  # Quota analysis & benchmarking tools
│       ├── calculate_duration.py
│       ├── make_graphs.py
│       ├── tts_inputs.csv
│       └── tts_results.csv
├── tests/
│   ├── conftest.py             # Shared fixtures (DB, client, factories)
│   ├── test_api.py             # API endpoint tests
│   ├── test_config.py          # Configuration tests
│   ├── test_database.py        # Database tests
│   ├── test_exceptions.py      # Custom exception tests
│   ├── test_models.py          # ORM model tests
│   ├── test_queue.py           # Queue management tests
│   ├── test_quota_service.py   # Quota logic tests
│   ├── test_schemas.py         # Schema validation tests
│   ├── test_storage_service.py # Storage service tests
│   ├── test_tts_service.py     # TTS service tests
│   ├── test_webhook_service.py # Webhook service tests
│   └── test_worker.py          # Worker processing tests
├── storage/                    # Audio file output directory
├── Dockerfile                  # Multi-stage build (base + test)
├── docker-compose.yml          # CPU mode compose
├── docker-compose.gpu.yml      # GPU mode compose
├── Makefile                    # Development & Docker commands
├── pyproject.toml              # Project metadata & dependencies
├── uv.lock                    # Deterministic dependency lock file
├── check-gpu.sh               # GPU availability checker
├── setup-nvidia-docker.sh     # NVIDIA Docker runtime installer
└── README.docker.md           # Docker deployment guide

Troubleshooting

GPU Memory Errors

If you get CUDA out of memory errors:

Reduce number of workers per GPU
Switch to CPU mode: run python -m app.worker without setting CUDA_VISIBLE_DEVICES
Check available VRAM: nvidia-smi

Queue Not Processing

Check:

Is Redis running? redis-cli ping
Is worker running? Check worker terminal output
Check queue stats: curl http://localhost:8000/queue/stats

Database Connection Issues

Verify your PostgreSQL connection settings:

psql -U tts_user -h localhost -p 5432 -d tts_db

Model Download Issues

The Soprano model is downloaded automatically on first run. Ensure you have internet access and sufficient disk space. The model will be cached by HuggingFace for subsequent runs.

Credits

Built with:

FastAPI — API framework
SQLAlchemy — ORM & database toolkit
RQ (Redis Queue) — Job queue
Soprano TTS — Text-to-Speech engine
PyTorch — ML framework
Pydantic — Data validation & settings
httpx — HTTP client for webhooks
soundfile — Audio I/O
num2words — Number-to-words conversion

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
scripts		scripts
storage		storage
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Documentation.pdf		Documentation.pdf
Makefile		Makefile
Presentation.pdf		Presentation.pdf
README.docker.md		README.docker.md
README.md		README.md
check-gpu.sh		check-gpu.sh
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
setup-nvidia-docker.sh		setup-nvidia-docker.sh
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Async Soprano TTS Service

Features

Architecture

Quick Start

Prerequisites

Installation

Running the Service

Docker Deployment

Usage

1. Create a TTS Job

2. Check Job Status

3. Download Audio

4. Webhook Notification

API Endpoints

Jobs

System

Quota System

Character Quota

Seconds Quota

Testing

Running Tests in Docker

Scaling

Single Machine — Multiple Workers

Multiple GPUs

Multi-Machine

Docker Scaling

Configuration

Project Structure

Troubleshooting

GPU Memory Errors

Queue Not Processing

Database Connection Issues

Model Download Issues

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages