Overview

Relevant source files

Purpose and Scope

The RAG Template is a production-ready, Kubernetes-native Retrieval-Augmented Generation (RAG) system designed to enable rapid deployment of AI-powered document question-answering applications. This repository provides a complete reference implementation with microservices architecture, shared Python libraries, infrastructure-as-code, and CI/CD pipelines.

This document provides a high-level overview of the entire system architecture, component relationships, and key design patterns. For detailed information about specific subsystems:

Local development setup: see Getting Started
Architecture deep-dive: see Architecture
Deployment strategies: see Deployment
RAG pipeline internals: see RAG Pipeline
Configuration options: see Configuration Reference

Sources: README.md1-87 Tiltfile1-736

Repository Structure

The repository is organized into four primary directories, each serving distinct purposes:

rag-template/
├── services/          # Deployable microservices (FastAPI applications)
├── libs/             # Shared Python libraries (published to PyPI)
├── infrastructure/   # Kubernetes manifests, Helm charts, Terraform
└── tools/           # Development and automation scripts

Directory Breakdown

Directory	Purpose	Key Technologies	Deployment Unit
`services/rag-backend`	RAG query processing, chat endpoints	FastAPI, LangChain, LangGraph	Docker image
`services/admin-backend`	Document management, upload pipeline	FastAPI, boto3, Redis	Docker image
`services/document-extractor`	Content extraction from files	FastAPI, Tesseract, Docling	Docker image
`services/mcp-server`	Model Context Protocol server	FastAPI, MCP SDK	Sidecar container
`services/frontend`	Chat and admin UIs	Vue.js, Nx monorepo	Nginx static
`libs/rag-core-lib`	Shared LLM utilities, retry logic	LangChain, Langfuse	Python package
`libs/rag-core-api`	RAG API layer, retrievers, graph	LangChain, Qdrant	Python package
`libs/admin-api-lib`	Admin API, chunking, summarization	boto3, nltk	Python package
`libs/extractor-api-lib`	Format-specific extractors	Docling, MarkItDown	Python package
`infrastructure/rag`	Helm chart with dependencies	Helm, Kubernetes	Chart archive
`infrastructure/terraform`	Cloud provisioning examples	Terraform, STACKIT provider	N/A

Sources: README.md87-168 Tiltfile13-736

System Architecture

The RAG Template implements a layered microservices architecture with clear separation of concerns. The system is designed for horizontal scalability, observability, and extensibility.

High-Level Component Diagram

Sources: Tiltfile206-684 README.md47-68

Library Dependency Graph

The library architecture implements a four-tier dependency hierarchy to maximize code reuse and minimize coupling. Libraries are published to PyPI and consumed by services via Poetry dependency declarations.

Sources: libs/rag-core-api/pyproject.toml1-137 libs/admin-api-lib/pyproject.toml1-125 libs/extractor-api-lib/pyproject.toml1-155 libs/rag-core-lib/pyproject.toml1-127

Technology Stack

Backend Services

Component	Framework	Version	Key Dependencies
HTTP Server	FastAPI	^0.121.2	Starlette >=0.49.1
LLM Framework	LangChain	^1.0.8	langchain-core, langchain-community
Graph Framework	LangGraph	^1.0.3	langgraph-checkpoint
Vector DB Client	qdrant-client	^1.14.2	grpcio
S3 Client	boto3	^1.38.10	botocore
Observability	Langfuse	^3.10.1	opentelemetry
Dependency Injection	dependency-injector	^4.46.0	-

Frontend Applications

Component	Framework	Version	Build Tool
UI Framework	Vue.js	3.x	Vite
Monorepo	Nx	Latest	npm workspaces
HTTP Client	Axios	Latest	-

Infrastructure

Component	Technology	Purpose	Port
Vector Database	Qdrant	Semantic search	6333
Object Storage	MinIO	Document storage	9001
Cache/KV Store	KeyDB	Session & metadata	6379
Observability	Langfuse	LLM tracing	3000
Web Server	Nginx	Static file serving	80

Sources: services/rag-backend/poetry.lock1-100 libs/rag-core-api/poetry.lock1-100 services/admin-backend/poetry.lock1-100

Deployment Models

The RAG Template supports two distinct deployment patterns optimized for different use cases.

Deployment Comparison

Aspect	Local Development	Production
Orchestrator	k3d (lightweight K8s)	Full Kubernetes cluster
Automation	Tilt (hot reload at :10350)	Helm + CI/CD
Infrastructure	In-cluster (bitnami-legacy)	Managed services (STACKIT)
Image Source	Local registry (localhost:5000)	GitHub Container Registry
TLS	Disabled (*.localhost)	Cert-Manager + Let's Encrypt
Ingress	NGINX (local domains)	NGINX (real DNS)
Database Images	bitnami-legacy (allowInsecure=true)	bitnami (production-hardened)
Storage	In-cluster volumes	Cloud S3 (STACKIT Object Storage)
Cache	In-cluster KeyDB	STACKIT Redis (managed)

Local Development Architecture

Sources: Tiltfile71-684 README.md198-408

Production Deployment Architecture

Sources: infrastructure/rag/values.yaml1-100 (inferred), README.md412-477

Key Design Patterns

Dependency Injection

All services use dependency-injector to enable runtime component replacement without code changes. Each service defines a Container class that wires dependencies.

Example Container Structure:

services/rag-backend/container.py: Defines RAG-specific providers
services/admin-backend/container.py: Defines admin-specific providers
Runtime configuration via environment variables

Retry & Resilience

The retry_with_backoff function in rag-core-lib provides exponential backoff with jitter for transient failures. Key features:

Maximum 5 retries by default
Rate limit detection (HTTP 429)
Jitter: 0.05-0.25 seconds
Used for LLM calls and embedding generation

Observability Integration

LangfuseTracedRunnable wraps LangChain components to provide end-to-end tracing. All LLM calls, retrievals, and reranking operations are automatically traced to Langfuse at port 3000.

Library-as-API Pattern

Shared functionality is packaged as Python libraries published to PyPI. Services consume these libraries rather than duplicating code. This enables:

Independent versioning of components
Third-party reuse of libraries
Clear API contracts between layers

Sources: libs/rag-core-lib/pyproject.toml1-127 libs/rag-core-api/pyproject.toml1-137

Configuration Philosophy

The system follows a hierarchical configuration approach:

Environment Variables: Runtime secrets and endpoints (loaded from .env in dev)
Helm Values: Deployment-time configuration (values.yaml)
Dependency Containers: Component selection and wiring
Feature Flags: Runtime behavior toggles (e.g., features.mcp.enabled)

All configurable parameters are documented in Configuration Reference.

Sources: Tiltfile470-544 README.md206-250

Next Steps

To set up your local environment: Getting Started
To understand the RAG query pipeline: RAG Pipeline
To deploy to production: Deployment
To configure LLM providers: Configuration Reference

Overview

Relevant source files

Purpose and Scope

This document provides a high-level overview of the entire system architecture, component relationships, and key design patterns. For detailed information about specific subsystems:

Local development setup: see Getting Started
Architecture deep-dive: see Architecture
Deployment strategies: see Deployment
RAG pipeline internals: see RAG Pipeline
Configuration options: see Configuration Reference

Sources: README.md1-87 Tiltfile1-736

Repository Structure

The repository is organized into four primary directories, each serving distinct purposes:

rag-template/
├── services/          # Deployable microservices (FastAPI applications)
├── libs/             # Shared Python libraries (published to PyPI)
├── infrastructure/   # Kubernetes manifests, Helm charts, Terraform
└── tools/           # Development and automation scripts

Directory Breakdown

Directory	Purpose	Key Technologies	Deployment Unit
`services/rag-backend`	RAG query processing, chat endpoints	FastAPI, LangChain, LangGraph	Docker image
`services/admin-backend`	Document management, upload pipeline	FastAPI, boto3, Redis	Docker image
`services/document-extractor`	Content extraction from files	FastAPI, Tesseract, Docling	Docker image
`services/mcp-server`	Model Context Protocol server	FastAPI, MCP SDK	Sidecar container
`services/frontend`	Chat and admin UIs	Vue.js, Nx monorepo	Nginx static
`libs/rag-core-lib`	Shared LLM utilities, retry logic	LangChain, Langfuse	Python package
`libs/rag-core-api`	RAG API layer, retrievers, graph	LangChain, Qdrant	Python package
`libs/admin-api-lib`	Admin API, chunking, summarization	boto3, nltk	Python package
`libs/extractor-api-lib`	Format-specific extractors	Docling, MarkItDown	Python package
`infrastructure/rag`	Helm chart with dependencies	Helm, Kubernetes	Chart archive
`infrastructure/terraform`	Cloud provisioning examples	Terraform, STACKIT provider	N/A

Sources: README.md87-168 Tiltfile13-736

System Architecture

The RAG Template implements a layered microservices architecture with clear separation of concerns. The system is designed for horizontal scalability, observability, and extensibility.

High-Level Component Diagram

Sources: Tiltfile206-684 README.md47-68

Library Dependency Graph

Sources: libs/rag-core-api/pyproject.toml1-137 libs/admin-api-lib/pyproject.toml1-125 libs/extractor-api-lib/pyproject.toml1-155 libs/rag-core-lib/pyproject.toml1-127

Technology Stack

Backend Services

Component	Framework	Version	Key Dependencies
HTTP Server	FastAPI	^0.121.2	Starlette >=0.49.1
LLM Framework	LangChain	^1.0.8	langchain-core, langchain-community
Graph Framework	LangGraph	^1.0.3	langgraph-checkpoint
Vector DB Client	qdrant-client	^1.14.2	grpcio
S3 Client	boto3	^1.38.10	botocore
Observability	Langfuse	^3.10.1	opentelemetry
Dependency Injection	dependency-injector	^4.46.0	-

Frontend Applications

Component	Framework	Version	Build Tool
UI Framework	Vue.js	3.x	Vite
Monorepo	Nx	Latest	npm workspaces
HTTP Client	Axios	Latest	-

Infrastructure

Component	Technology	Purpose	Port
Vector Database	Qdrant	Semantic search	6333
Object Storage	MinIO	Document storage	9001
Cache/KV Store	KeyDB	Session & metadata	6379
Observability	Langfuse	LLM tracing	3000
Web Server	Nginx	Static file serving	80

Sources: services/rag-backend/poetry.lock1-100 libs/rag-core-api/poetry.lock1-100 services/admin-backend/poetry.lock1-100

Deployment Models

The RAG Template supports two distinct deployment patterns optimized for different use cases.

Deployment Comparison

Aspect	Local Development	Production
Orchestrator	k3d (lightweight K8s)	Full Kubernetes cluster
Automation	Tilt (hot reload at :10350)	Helm + CI/CD
Infrastructure	In-cluster (bitnami-legacy)	Managed services (STACKIT)
Image Source	Local registry (localhost:5000)	GitHub Container Registry
TLS	Disabled (*.localhost)	Cert-Manager + Let's Encrypt
Ingress	NGINX (local domains)	NGINX (real DNS)
Database Images	bitnami-legacy (allowInsecure=true)	bitnami (production-hardened)
Storage	In-cluster volumes	Cloud S3 (STACKIT Object Storage)
Cache	In-cluster KeyDB	STACKIT Redis (managed)

Local Development Architecture

Sources: Tiltfile71-684 README.md198-408

Production Deployment Architecture

Sources: infrastructure/rag/values.yaml1-100 (inferred), README.md412-477

Key Design Patterns

Dependency Injection

All services use dependency-injector to enable runtime component replacement without code changes. Each service defines a Container class that wires dependencies.

Example Container Structure:

services/rag-backend/container.py: Defines RAG-specific providers
services/admin-backend/container.py: Defines admin-specific providers
Runtime configuration via environment variables

Retry & Resilience

The retry_with_backoff function in rag-core-lib provides exponential backoff with jitter for transient failures. Key features:

Maximum 5 retries by default
Rate limit detection (HTTP 429)
Jitter: 0.05-0.25 seconds
Used for LLM calls and embedding generation

Observability Integration

LangfuseTracedRunnable wraps LangChain components to provide end-to-end tracing. All LLM calls, retrievals, and reranking operations are automatically traced to Langfuse at port 3000.

Library-as-API Pattern

Shared functionality is packaged as Python libraries published to PyPI. Services consume these libraries rather than duplicating code. This enables:

Independent versioning of components
Third-party reuse of libraries
Clear API contracts between layers

Sources: libs/rag-core-lib/pyproject.toml1-127 libs/rag-core-api/pyproject.toml1-137

Configuration Philosophy

The system follows a hierarchical configuration approach:

Environment Variables: Runtime secrets and endpoints (loaded from .env in dev)
Helm Values: Deployment-time configuration (values.yaml)
Dependency Containers: Component selection and wiring
Feature Flags: Runtime behavior toggles (e.g., features.mcp.enabled)

All configurable parameters are documented in Configuration Reference.

Sources: Tiltfile470-544 README.md206-250

Next Steps

To set up your local environment: Getting Started
To understand the RAG query pipeline: RAG Pipeline
To deploy to production: Deployment
To configure LLM providers: Configuration Reference

Overview

Purpose and Scope

Repository Structure

Directory Breakdown

System Architecture

High-Level Component Diagram

Library Dependency Graph

Technology Stack

Backend Services

Frontend Applications

Infrastructure

Deployment Models

Deployment Comparison

Local Development Architecture

Production Deployment Architecture

Key Design Patterns

Dependency Injection

Retry & Resilience

Observability Integration

Library-as-API Pattern

Configuration Philosophy

Next Steps

On this page

Overview

Purpose and Scope

Repository Structure

Directory Breakdown

System Architecture

High-Level Component Diagram

Library Dependency Graph

Technology Stack

Backend Services

Frontend Applications

Infrastructure

Deployment Models

Deployment Comparison

Local Development Architecture

Production Deployment Architecture

Key Design Patterns

Dependency Injection

Retry & Resilience

Observability Integration

Library-as-API Pattern

Configuration Philosophy

Next Steps

On this page