The RAG Template is a production-ready, Kubernetes-native Retrieval-Augmented Generation (RAG) system designed to enable rapid deployment of AI-powered document question-answering applications. This repository provides a complete reference implementation with microservices architecture, shared Python libraries, infrastructure-as-code, and CI/CD pipelines.
This document provides a high-level overview of the entire system architecture, component relationships, and key design patterns. For detailed information about specific subsystems:
Sources: README.md1-87 Tiltfile1-736
The repository is organized into four primary directories, each serving distinct purposes:
rag-template/
├── services/ # Deployable microservices (FastAPI applications)
├── libs/ # Shared Python libraries (published to PyPI)
├── infrastructure/ # Kubernetes manifests, Helm charts, Terraform
└── tools/ # Development and automation scripts
| Directory | Purpose | Key Technologies | Deployment Unit |
|---|---|---|---|
services/rag-backend | RAG query processing, chat endpoints | FastAPI, LangChain, LangGraph | Docker image |
services/admin-backend | Document management, upload pipeline | FastAPI, boto3, Redis | Docker image |
services/document-extractor | Content extraction from files | FastAPI, Tesseract, Docling | Docker image |
services/mcp-server | Model Context Protocol server | FastAPI, MCP SDK | Sidecar container |
services/frontend | Chat and admin UIs | Vue.js, Nx monorepo | Nginx static |
libs/rag-core-lib | Shared LLM utilities, retry logic | LangChain, Langfuse | Python package |
libs/rag-core-api | RAG API layer, retrievers, graph | LangChain, Qdrant | Python package |
libs/admin-api-lib | Admin API, chunking, summarization | boto3, nltk | Python package |
libs/extractor-api-lib | Format-specific extractors | Docling, MarkItDown | Python package |
infrastructure/rag | Helm chart with dependencies | Helm, Kubernetes | Chart archive |
infrastructure/terraform | Cloud provisioning examples | Terraform, STACKIT provider | N/A |
Sources: README.md87-168 Tiltfile13-736
The RAG Template implements a layered microservices architecture with clear separation of concerns. The system is designed for horizontal scalability, observability, and extensibility.
Sources: Tiltfile206-684 README.md47-68
The library architecture implements a four-tier dependency hierarchy to maximize code reuse and minimize coupling. Libraries are published to PyPI and consumed by services via Poetry dependency declarations.
Sources: libs/rag-core-api/pyproject.toml1-137 libs/admin-api-lib/pyproject.toml1-125 libs/extractor-api-lib/pyproject.toml1-155 libs/rag-core-lib/pyproject.toml1-127
| Component | Framework | Version | Key Dependencies |
|---|---|---|---|
| HTTP Server | FastAPI | ^0.121.2 | Starlette >=0.49.1 |
| LLM Framework | LangChain | ^1.0.8 | langchain-core, langchain-community |
| Graph Framework | LangGraph | ^1.0.3 | langgraph-checkpoint |
| Vector DB Client | qdrant-client | ^1.14.2 | grpcio |
| S3 Client | boto3 | ^1.38.10 | botocore |
| Observability | Langfuse | ^3.10.1 | opentelemetry |
| Dependency Injection | dependency-injector | ^4.46.0 | - |
| Component | Framework | Version | Build Tool |
|---|---|---|---|
| UI Framework | Vue.js | 3.x | Vite |
| Monorepo | Nx | Latest | npm workspaces |
| HTTP Client | Axios | Latest | - |
| Component | Technology | Purpose | Port |
|---|---|---|---|
| Vector Database | Qdrant | Semantic search | 6333 |
| Object Storage | MinIO | Document storage | 9001 |
| Cache/KV Store | KeyDB | Session & metadata | 6379 |
| Observability | Langfuse | LLM tracing | 3000 |
| Web Server | Nginx | Static file serving | 80 |
Sources: services/rag-backend/poetry.lock1-100 libs/rag-core-api/poetry.lock1-100 services/admin-backend/poetry.lock1-100
The RAG Template supports two distinct deployment patterns optimized for different use cases.
| Aspect | Local Development | Production |
|---|---|---|
| Orchestrator | k3d (lightweight K8s) | Full Kubernetes cluster |
| Automation | Tilt (hot reload at :10350) | Helm + CI/CD |
| Infrastructure | In-cluster (bitnami-legacy) | Managed services (STACKIT) |
| Image Source | Local registry (localhost:5000) | GitHub Container Registry |
| TLS | Disabled (*.localhost) | Cert-Manager + Let's Encrypt |
| Ingress | NGINX (local domains) | NGINX (real DNS) |
| Database Images | bitnami-legacy (allowInsecure=true) | bitnami (production-hardened) |
| Storage | In-cluster volumes | Cloud S3 (STACKIT Object Storage) |
| Cache | In-cluster KeyDB | STACKIT Redis (managed) |
Sources: Tiltfile71-684 README.md198-408
Sources: infrastructure/rag/values.yaml1-100 (inferred), README.md412-477
All services use dependency-injector to enable runtime component replacement without code changes. Each service defines a Container class that wires dependencies.
Example Container Structure:
services/rag-backend/container.py: Defines RAG-specific providersservices/admin-backend/container.py: Defines admin-specific providersThe retry_with_backoff function in rag-core-lib provides exponential backoff with jitter for transient failures. Key features:
LangfuseTracedRunnable wraps LangChain components to provide end-to-end tracing. All LLM calls, retrievals, and reranking operations are automatically traced to Langfuse at port 3000.
Shared functionality is packaged as Python libraries published to PyPI. Services consume these libraries rather than duplicating code. This enables:
Sources: libs/rag-core-lib/pyproject.toml1-127 libs/rag-core-api/pyproject.toml1-137
The system follows a hierarchical configuration approach:
.env in dev)values.yaml)features.mcp.enabled)All configurable parameters are documented in Configuration Reference.
Sources: Tiltfile470-544 README.md206-250
Refresh this wiki