A production-ready Text-to-Speech microservice built around the Soprano TTS engine with asynchronous job processing and dual quota tracking.
✅ Asynchronous Processing — Submit jobs and receive webhook notifications when complete
✅ Dual Quota System — Track usage by both characters and audio seconds
✅ GPU-Accelerated — Fast inference with warm model (stays loaded in memory)
✅ Scalable Architecture — Redis queue + multiple workers support
✅ RESTful API — Clean FastAPI endpoints with automatic documentation
✅ Production-Ready — Error handling, retry logic, logging, health checks
✅ Docker Support — CPU and GPU Docker Compose configurations included
✅ Comprehensive Tests — Full pytest suite with in-memory SQLite fixtures
┌─────────────┐
│ Client │
└──────┬──────┘
│ POST /api/v1/tts/jobs
▼
┌─────────────────────┐
│ FastAPI Server │ ← Validates request, checks quota
└──────┬──────────────┘
│ Enqueue job
▼
┌─────────────────────┐
│ Redis Queue (RQ) │
└──────┬──────────────┘
│
▼
┌─────────────────────┐
│ Worker + GPU │ ← Warm model, generates audio
│ (Soprano TTS) │
└──────┬──────────────┘
│
├─► Save WAV to storage/
└─► Send webhook notification
Components:
- FastAPI Server — API endpoints, authentication, quota management
- PostgreSQL — User data, job tracking, quota counters
- Redis + RQ — Job queue for async processing
- Worker (SimpleWorker) — GPU-powered TTS generation with warm model kept in-process
- Storage — Local filesystem for audio files (WAV format)
- Python 3.13+
- PostgreSQL 15+
- Redis 7+
- NVIDIA GPU (optional but recommended)
- CUDA 12.1+ (if using GPU)
- Clone and create virtual environment:
cd async-soprano
# Using UV (recommended)
uv venv && source .venv/bin/activate
# Or using plain venv
python -m venv .venv
source .venv/bin/activate- Install dependencies:
# With UV (recommended — uses uv.lock for deterministic installs)
uv sync
# Or with pip
pip install .
# Include dev/test dependencies
uv sync --group dev- Configure environment (optional):
The application uses Pydantic Settings and will work with sensible defaults. You can override any setting via environment variables or a .env file:
# All variables are optional — defaults are shown below
POSTGRES_USER=tts_user
POSTGRES_PASSWORD=tts_pass
POSTGRES_DB=tts_db
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
STORAGE_PATH=./storage
MODEL_NAME=ekwek/Soprano-1.1-80M
CHARACTERS_PER_SECOND=16.88
DEFAULT_CHARACTER_QUOTA=100000
DEFAULT_SECONDS_QUOTA=6000.0
GPU_DEVICE=0
LOG_LEVEL=INFO- Initialize database:
# Create tables
python scripts/init_db.py
# Seed test users and get API keys
python scripts/seed_users.pyYou'll need 2 terminals (plus a running PostgreSQL and Redis):
Terminal 1 — API Server:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Terminal 2 — Worker:
# With GPU
CUDA_VISIBLE_DEVICES=0 python -m app.worker
# Without GPU (slower)
python -m app.workerThe service is now running:
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
The project includes full Docker Compose support for both CPU and GPU modes.
# CPU mode
make build && make up
# GPU mode (requires NVIDIA Container Toolkit)
make build-gpu && make up-gpu
# Initialize DB and seed users
make init-db
make seedSee README.docker.md for the full Docker guide.
curl -X POST "http://localhost:8000/api/v1/tts/jobs" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello world, this is a test of the text to speech service.",
"webhook_url": "http://localhost:5001/webhook"
}'Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"character_count": 63,
"estimated_audio_seconds": 5.0,
"message": "Job created and queued for processing"
}curl "http://localhost:8000/api/v1/tts/jobs/550e8400-e29b-41d4-a716-446655440000" \
-H "X-API-Key: YOUR_API_KEY"Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"text": "Hello world...",
"audio_file_path": "/storage/550e8400-e29b-41d4-a716-446655440000.wav",
"character_count": 63,
"estimated_audio_seconds": 5.0,
"actual_audio_seconds": 4.92,
"created_at": "2026-02-06T10:00:00Z",
"completed_at": "2026-02-06T10:00:05Z",
"quota": {
"characters": {
"used": 63,
"limit": 100000,
"remaining": 99937
},
"seconds": {
"used": 5.0,
"limit": 6000.0,
"remaining": 5995.0
}
}
}curl -H "X-API-Key: YOUR_API_KEY" \
"http://localhost:8000/api/v1/tts/storage/550e8400-e29b-41d4-a716-446655440000.wav" \
-o output.wavWhen a job completes (or fails), a POST request is sent to the provided webhook_url:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"audio_file_path": "/storage/550e8400-e29b-41d4-a716-446655440000.wav",
"audio_duration_seconds": 4.92,
"character_count": 63,
"completed_at": "2026-02-06T10:00:05Z"
}Webhook delivery includes retry logic (up to 3 attempts with exponential backoff).
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/tts/jobs |
Create new TTS job |
| GET | /api/v1/tts/jobs/{job_id} |
Get job status and details |
| GET | /api/v1/tts/storage/{job_id}.wav |
Download audio file |
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Service information |
| GET | /health |
Health check (DB, Redis, storage) |
| GET | /queue/stats |
Queue statistics (pending, active, failed jobs) |
| GET | /docs |
Interactive API documentation (Swagger UI) |
Authentication: All /api/v1/tts/* endpoints require the X-API-Key header.
The service tracks usage with dual quotas:
- Counted immediately when a job is created
- Numbers in text are expanded to words before counting (e.g.
"123"→"one hundred twenty-three") - Default limit: 100,000 characters
- Estimated at job creation using configurable chars-per-second ratio (default: 16.88)
- Adjusted after actual audio generation to reflect true duration
- Default limit: 6,000 seconds
Both quotas must be sufficient to accept a job. If either is exceeded, the request is rejected with 429 Too Many Requests.
If a job fails, both character and seconds quotas are refunded automatically.
The project has a comprehensive pytest test suite covering all modules:
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run with coverage report
pytest --cov=app --cov-report=term-missing
# Run a specific test file
pytest tests/test_api.pyTests use an in-memory SQLite database and mocked Redis/RQ so no external services are required.
make test # Run tests in Docker
make test-cov # Run tests with coverage in DockerRun multiple workers on the same GPU:
CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=0 python -m app.worker &Workers share the GPU and process jobs in parallel from the same queue.
CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=1 python -m app.worker &All machines connect to the same PostgreSQL and Redis instances via environment variables:
# Machine 1
POSTGRES_HOST=db-server REDIS_HOST=redis-server python -m app.worker
# Machine 2
POSTGRES_HOST=db-server REDIS_HOST=redis-server python -m app.workerWorkers automatically distribute the workload via the shared Redis queue.
docker compose up --scale worker=3 -dAll configuration is managed through environment variables (or a .env file) via Pydantic Settings. Connection URLs are built automatically from their components.
| Variable | Default | Description |
|---|---|---|
POSTGRES_USER |
tts_user |
PostgreSQL username |
POSTGRES_PASSWORD |
tts_pass |
PostgreSQL password |
POSTGRES_DB |
tts_db |
PostgreSQL database name |
POSTGRES_HOST |
localhost |
PostgreSQL host |
POSTGRES_PORT |
5432 |
PostgreSQL port |
REDIS_HOST |
localhost |
Redis host |
REDIS_PORT |
6379 |
Redis port |
REDIS_DB |
0 |
Redis database number |
STORAGE_PATH |
./storage |
Audio file storage directory |
DEFAULT_CHARACTER_QUOTA |
100000 |
Default character quota per user |
DEFAULT_SECONDS_QUOTA |
6000.0 |
Default audio seconds quota per user |
MODEL_NAME |
ekwek/Soprano-1.1-80M |
HuggingFace model name |
CHARACTERS_PER_SECOND |
16.88 |
Chars/second ratio for duration estimation |
GPU_DEVICE |
0 |
CUDA device ID |
LOG_LEVEL |
INFO |
Logging level |
API_HOST |
0.0.0.0 |
API bind host |
API_PORT |
8000 |
API bind port |
async-soprano/
├── app/
│ ├── main.py # FastAPI application & lifespan
│ ├── worker.py # RQ SimpleWorker with warm model
│ ├── config.py # Pydantic Settings configuration
│ ├── database.py # SQLAlchemy engine & session
│ ├── models.py # SQLAlchemy ORM models (User, TTSJob)
│ ├── schemas.py # Pydantic request/response schemas
│ ├── queue.py # Redis/RQ queue setup
│ ├── api/
│ │ ├── deps.py # FastAPI dependencies (auth)
│ │ └── jobs.py # Job endpoints (create, get, download)
│ ├── services/
│ │ ├── tts_service.py # Soprano TTS model wrapper
│ │ ├── quota_service.py # Quota check, reserve, adjust, refund
│ │ ├── webhook_service.py # Webhook delivery with retries
│ │ └── storage_service.py # Audio file storage management
│ └── utils/
│ ├── logger.py # Logging configuration
│ └── exceptions.py # Custom exception classes
├── scripts/
│ ├── init_db.py # Initialize database tables
│ ├── seed_users.py # Create test users with API keys
│ └── quota/ # Quota analysis & benchmarking tools
│ ├── calculate_duration.py
│ ├── make_graphs.py
│ ├── tts_inputs.csv
│ └── tts_results.csv
├── tests/
│ ├── conftest.py # Shared fixtures (DB, client, factories)
│ ├── test_api.py # API endpoint tests
│ ├── test_config.py # Configuration tests
│ ├── test_database.py # Database tests
│ ├── test_exceptions.py # Custom exception tests
│ ├── test_models.py # ORM model tests
│ ├── test_queue.py # Queue management tests
│ ├── test_quota_service.py # Quota logic tests
│ ├── test_schemas.py # Schema validation tests
│ ├── test_storage_service.py # Storage service tests
│ ├── test_tts_service.py # TTS service tests
│ ├── test_webhook_service.py # Webhook service tests
│ └── test_worker.py # Worker processing tests
├── storage/ # Audio file output directory
├── Dockerfile # Multi-stage build (base + test)
├── docker-compose.yml # CPU mode compose
├── docker-compose.gpu.yml # GPU mode compose
├── Makefile # Development & Docker commands
├── pyproject.toml # Project metadata & dependencies
├── uv.lock # Deterministic dependency lock file
├── check-gpu.sh # GPU availability checker
├── setup-nvidia-docker.sh # NVIDIA Docker runtime installer
└── README.docker.md # Docker deployment guide
If you get CUDA out of memory errors:
- Reduce number of workers per GPU
- Switch to CPU mode: run
python -m app.workerwithout settingCUDA_VISIBLE_DEVICES - Check available VRAM:
nvidia-smi
Check:
- Is Redis running?
redis-cli ping - Is worker running? Check worker terminal output
- Check queue stats:
curl http://localhost:8000/queue/stats
Verify your PostgreSQL connection settings:
psql -U tts_user -h localhost -p 5432 -d tts_dbThe Soprano model is downloaded automatically on first run. Ensure you have internet access and sufficient disk space. The model will be cached by HuggingFace for subsequent runs.
Built with:
- FastAPI — API framework
- SQLAlchemy — ORM & database toolkit
- RQ (Redis Queue) — Job queue
- Soprano TTS — Text-to-Speech engine
- PyTorch — ML framework
- Pydantic — Data validation & settings
- httpx — HTTP client for webhooks
- soundfile — Audio I/O
- num2words — Number-to-words conversion