Skip to content

Avoyan/async-soprano

Repository files navigation

Async Soprano TTS Service

A production-ready Text-to-Speech microservice built around the Soprano TTS engine with asynchronous job processing and dual quota tracking.

Features

Asynchronous Processing — Submit jobs and receive webhook notifications when complete
Dual Quota System — Track usage by both characters and audio seconds
GPU-Accelerated — Fast inference with warm model (stays loaded in memory)
Scalable Architecture — Redis queue + multiple workers support
RESTful API — Clean FastAPI endpoints with automatic documentation
Production-Ready — Error handling, retry logic, logging, health checks
Docker Support — CPU and GPU Docker Compose configurations included
Comprehensive Tests — Full pytest suite with in-memory SQLite fixtures


Architecture

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ POST /api/v1/tts/jobs
       ▼
┌─────────────────────┐
│   FastAPI Server    │ ← Validates request, checks quota
└──────┬──────────────┘
       │ Enqueue job
       ▼
┌─────────────────────┐
│   Redis Queue (RQ)  │
└──────┬──────────────┘
       │
       ▼
┌─────────────────────┐
│   Worker + GPU      │ ← Warm model, generates audio
│  (Soprano TTS)      │
└──────┬──────────────┘
       │
       ├─► Save WAV to storage/
       └─► Send webhook notification

Components:

  • FastAPI Server — API endpoints, authentication, quota management
  • PostgreSQL — User data, job tracking, quota counters
  • Redis + RQ — Job queue for async processing
  • Worker (SimpleWorker) — GPU-powered TTS generation with warm model kept in-process
  • Storage — Local filesystem for audio files (WAV format)

Quick Start

Prerequisites

  • Python 3.13+
  • PostgreSQL 15+
  • Redis 7+
  • NVIDIA GPU (optional but recommended)
  • CUDA 12.1+ (if using GPU)

Installation

  1. Clone and create virtual environment:
cd async-soprano

# Using UV (recommended)
uv venv && source .venv/bin/activate

# Or using plain venv
python -m venv .venv
source .venv/bin/activate
  1. Install dependencies:
# With UV (recommended — uses uv.lock for deterministic installs)
uv sync

# Or with pip
pip install .

# Include dev/test dependencies
uv sync --group dev
  1. Configure environment (optional):

The application uses Pydantic Settings and will work with sensible defaults. You can override any setting via environment variables or a .env file:

# All variables are optional — defaults are shown below
POSTGRES_USER=tts_user
POSTGRES_PASSWORD=tts_pass
POSTGRES_DB=tts_db
POSTGRES_HOST=localhost
POSTGRES_PORT=5432

REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0

STORAGE_PATH=./storage
MODEL_NAME=ekwek/Soprano-1.1-80M
CHARACTERS_PER_SECOND=16.88

DEFAULT_CHARACTER_QUOTA=100000
DEFAULT_SECONDS_QUOTA=6000.0

GPU_DEVICE=0
LOG_LEVEL=INFO
  1. Initialize database:
# Create tables
python scripts/init_db.py

# Seed test users and get API keys
python scripts/seed_users.py

Running the Service

You'll need 2 terminals (plus a running PostgreSQL and Redis):

Terminal 1 — API Server:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 — Worker:

# With GPU
CUDA_VISIBLE_DEVICES=0 python -m app.worker

# Without GPU (slower)
python -m app.worker

The service is now running:


Docker Deployment

The project includes full Docker Compose support for both CPU and GPU modes.

# CPU mode
make build && make up

# GPU mode (requires NVIDIA Container Toolkit)
make build-gpu && make up-gpu

# Initialize DB and seed users
make init-db
make seed

See README.docker.md for the full Docker guide.


Usage

1. Create a TTS Job

curl -X POST "http://localhost:8000/api/v1/tts/jobs" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world, this is a test of the text to speech service.",
    "webhook_url": "http://localhost:5001/webhook"
  }'

Response:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "character_count": 63,
  "estimated_audio_seconds": 5.0,
  "message": "Job created and queued for processing"
}

2. Check Job Status

curl "http://localhost:8000/api/v1/tts/jobs/550e8400-e29b-41d4-a716-446655440000" \
  -H "X-API-Key: YOUR_API_KEY"

Response:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "text": "Hello world...",
  "audio_file_path": "/storage/550e8400-e29b-41d4-a716-446655440000.wav",
  "character_count": 63,
  "estimated_audio_seconds": 5.0,
  "actual_audio_seconds": 4.92,
  "created_at": "2026-02-06T10:00:00Z",
  "completed_at": "2026-02-06T10:00:05Z",
  "quota": {
    "characters": {
      "used": 63,
      "limit": 100000,
      "remaining": 99937
    },
    "seconds": {
      "used": 5.0,
      "limit": 6000.0,
      "remaining": 5995.0
    }
  }
}

3. Download Audio

curl -H "X-API-Key: YOUR_API_KEY" \
  "http://localhost:8000/api/v1/tts/storage/550e8400-e29b-41d4-a716-446655440000.wav" \
  -o output.wav

4. Webhook Notification

When a job completes (or fails), a POST request is sent to the provided webhook_url:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "audio_file_path": "/storage/550e8400-e29b-41d4-a716-446655440000.wav",
  "audio_duration_seconds": 4.92,
  "character_count": 63,
  "completed_at": "2026-02-06T10:00:05Z"
}

Webhook delivery includes retry logic (up to 3 attempts with exponential backoff).


API Endpoints

Jobs

Method Endpoint Description
POST /api/v1/tts/jobs Create new TTS job
GET /api/v1/tts/jobs/{job_id} Get job status and details
GET /api/v1/tts/storage/{job_id}.wav Download audio file

System

Method Endpoint Description
GET / Service information
GET /health Health check (DB, Redis, storage)
GET /queue/stats Queue statistics (pending, active, failed jobs)
GET /docs Interactive API documentation (Swagger UI)

Authentication: All /api/v1/tts/* endpoints require the X-API-Key header.


Quota System

The service tracks usage with dual quotas:

Character Quota

  • Counted immediately when a job is created
  • Numbers in text are expanded to words before counting (e.g. "123""one hundred twenty-three")
  • Default limit: 100,000 characters

Seconds Quota

  • Estimated at job creation using configurable chars-per-second ratio (default: 16.88)
  • Adjusted after actual audio generation to reflect true duration
  • Default limit: 6,000 seconds

Both quotas must be sufficient to accept a job. If either is exceeded, the request is rejected with 429 Too Many Requests.

If a job fails, both character and seconds quotas are refunded automatically.


Testing

The project has a comprehensive pytest test suite covering all modules:

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run with coverage report
pytest --cov=app --cov-report=term-missing

# Run a specific test file
pytest tests/test_api.py

Tests use an in-memory SQLite database and mocked Redis/RQ so no external services are required.

Running Tests in Docker

make test          # Run tests in Docker
make test-cov      # Run tests with coverage in Docker

Scaling

Single Machine — Multiple Workers

Run multiple workers on the same GPU:

CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=0 python -m app.worker &

Workers share the GPU and process jobs in parallel from the same queue.

Multiple GPUs

CUDA_VISIBLE_DEVICES=0 python -m app.worker &
CUDA_VISIBLE_DEVICES=1 python -m app.worker &

Multi-Machine

All machines connect to the same PostgreSQL and Redis instances via environment variables:

# Machine 1
POSTGRES_HOST=db-server REDIS_HOST=redis-server python -m app.worker

# Machine 2
POSTGRES_HOST=db-server REDIS_HOST=redis-server python -m app.worker

Workers automatically distribute the workload via the shared Redis queue.

Docker Scaling

docker compose up --scale worker=3 -d

Configuration

All configuration is managed through environment variables (or a .env file) via Pydantic Settings. Connection URLs are built automatically from their components.

Variable Default Description
POSTGRES_USER tts_user PostgreSQL username
POSTGRES_PASSWORD tts_pass PostgreSQL password
POSTGRES_DB tts_db PostgreSQL database name
POSTGRES_HOST localhost PostgreSQL host
POSTGRES_PORT 5432 PostgreSQL port
REDIS_HOST localhost Redis host
REDIS_PORT 6379 Redis port
REDIS_DB 0 Redis database number
STORAGE_PATH ./storage Audio file storage directory
DEFAULT_CHARACTER_QUOTA 100000 Default character quota per user
DEFAULT_SECONDS_QUOTA 6000.0 Default audio seconds quota per user
MODEL_NAME ekwek/Soprano-1.1-80M HuggingFace model name
CHARACTERS_PER_SECOND 16.88 Chars/second ratio for duration estimation
GPU_DEVICE 0 CUDA device ID
LOG_LEVEL INFO Logging level
API_HOST 0.0.0.0 API bind host
API_PORT 8000 API bind port

Project Structure

async-soprano/
├── app/
│   ├── main.py                 # FastAPI application & lifespan
│   ├── worker.py               # RQ SimpleWorker with warm model
│   ├── config.py               # Pydantic Settings configuration
│   ├── database.py             # SQLAlchemy engine & session
│   ├── models.py               # SQLAlchemy ORM models (User, TTSJob)
│   ├── schemas.py              # Pydantic request/response schemas
│   ├── queue.py                # Redis/RQ queue setup
│   ├── api/
│   │   ├── deps.py             # FastAPI dependencies (auth)
│   │   └── jobs.py             # Job endpoints (create, get, download)
│   ├── services/
│   │   ├── tts_service.py      # Soprano TTS model wrapper
│   │   ├── quota_service.py    # Quota check, reserve, adjust, refund
│   │   ├── webhook_service.py  # Webhook delivery with retries
│   │   └── storage_service.py  # Audio file storage management
│   └── utils/
│       ├── logger.py           # Logging configuration
│       └── exceptions.py       # Custom exception classes
├── scripts/
│   ├── init_db.py              # Initialize database tables
│   ├── seed_users.py           # Create test users with API keys
│   └── quota/                  # Quota analysis & benchmarking tools
│       ├── calculate_duration.py
│       ├── make_graphs.py
│       ├── tts_inputs.csv
│       └── tts_results.csv
├── tests/
│   ├── conftest.py             # Shared fixtures (DB, client, factories)
│   ├── test_api.py             # API endpoint tests
│   ├── test_config.py          # Configuration tests
│   ├── test_database.py        # Database tests
│   ├── test_exceptions.py      # Custom exception tests
│   ├── test_models.py          # ORM model tests
│   ├── test_queue.py           # Queue management tests
│   ├── test_quota_service.py   # Quota logic tests
│   ├── test_schemas.py         # Schema validation tests
│   ├── test_storage_service.py # Storage service tests
│   ├── test_tts_service.py     # TTS service tests
│   ├── test_webhook_service.py # Webhook service tests
│   └── test_worker.py          # Worker processing tests
├── storage/                    # Audio file output directory
├── Dockerfile                  # Multi-stage build (base + test)
├── docker-compose.yml          # CPU mode compose
├── docker-compose.gpu.yml      # GPU mode compose
├── Makefile                    # Development & Docker commands
├── pyproject.toml              # Project metadata & dependencies
├── uv.lock                    # Deterministic dependency lock file
├── check-gpu.sh               # GPU availability checker
├── setup-nvidia-docker.sh     # NVIDIA Docker runtime installer
└── README.docker.md           # Docker deployment guide

Troubleshooting

GPU Memory Errors

If you get CUDA out of memory errors:

  • Reduce number of workers per GPU
  • Switch to CPU mode: run python -m app.worker without setting CUDA_VISIBLE_DEVICES
  • Check available VRAM: nvidia-smi

Queue Not Processing

Check:

  1. Is Redis running? redis-cli ping
  2. Is worker running? Check worker terminal output
  3. Check queue stats: curl http://localhost:8000/queue/stats

Database Connection Issues

Verify your PostgreSQL connection settings:

psql -U tts_user -h localhost -p 5432 -d tts_db

Model Download Issues

The Soprano model is downloaded automatically on first run. Ensure you have internet access and sufficient disk space. The model will be cached by HuggingFace for subsequent runs.


Credits

Built with:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors