A production-ready FastAPI-based intelligent document processing and chat application that combines ChromaDB vector database with Google's Generative AI for semantic search, document context-aware conversations, and scalable session management.
This project was developed through four major phases to achieve production readiness:
- β
Centralized configuration management with
Settingsclass - β Dependency injection architecture
- β Service separation and clean interfaces
- β CORS configuration and security basics
- β Microservice-style architecture with pure service responsibilities
- β Document ingestion pipeline with orchestration
- β File system service for storage management
- β Comprehensive error handling and validation
- β Context-aware chat with file-specific conversations
- β Advanced prompt engineering with document context
- β Router-based endpoint organization
- β Enhanced request/response models
- β Persistent session storage using Redis with optional Supabase backend
- β Stateless application design for horizontal scaling
- β Session lifecycle management and cleanup
- β Admin endpoints for session monitoring
These features are to be implemented, reviewed and depent on future decisions.
- Dynamic Context Switching
- Knowledge Base Architecture for Context Management
- Summarization techniques for different levels of Abstraction in Text
- UM Graph Overview, or other overview methods for better context management
- User Friendly features like file management and organization interfaces
- Local LLM Inference endpoint support
- Intelligent Document Processing: Advanced PDF and text processing with semantic chunking
- Context-Aware Chat: File-specific conversations with document context injection
- Semantic Search: Vector-based similarity search using Google Generative AI embeddings
- Persistent Sessions: Scalable session management with Redis and optional Supabase storage
- Admin Dashboard: Session monitoring, cleanup, and management endpoints
- Horizontal Scalability: Stateless design supports multiple app instances
- Graceful Degradation: Fallback mechanisms for service reliability
- Health Monitoring: Comprehensive health checks and status endpoints
- Security: CORS configuration, input validation, and secure secrets management
- Dependency Injection: Clean, testable, and maintainable service architecture
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ
β Presentation β Business β Data β
β Layer β Logic β Layer β
βββββββββββββββββββΌββββββββββββββββββΌββββββββββββββββββ€
β FastAPI Routes β Service Layer β Vector Database β
β - Chat Router β - ChatService β - ChromaDB β
β - Admin Routes β - IngestionSvc β - Supabase β
β - Health Checks β - VectorDBSvc β - File System β
β β - SessionSvc β β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ
Presentation layer is reflected in the Reflex UI, endpoints can be checked through /docs endpoint.
ChatService
βββ VectorDBService (document retrieval)
βββ SessionStorageService (persistent state)
βββ PromptBuilder (context injection)
IngestionService (orchestrator)
βββ DocumentService (processing)
βββ VectorDBService (storage)
βββ FileSystemService (file ops)
βββ MetadataService (tracking)
SessionStorageService
βββ SupabaseService (optional persistent storage)
βββ Redis (caching and session storage)
- Backend: FastAPI with async/await support
- AI/ML: Google Generative AI (Gemini) for embeddings and chat
- Vector DB: ChromaDB for semantic search and document storage (embedded)
- Session Storage: Redis for caching with optional Supabase PostgreSQL for persistence
- File Processing: PyPDF2, python-docx for document parsing
- Validation: Pydantic for request/response validation
- DI Container: Custom dependency injection system
vino-project/
βββ .github/ # GitHub Actions workflows
β βββ workflows/
β βββ ci.yml # CI/CD pipeline configuration
βββ src/app/ # FastAPI backend application
β βββ core/
β β βββ config.py # Centralized configuration management
β β βββ exceptions.py # Custom exception classes
β βββ dependencies.py # Dependency injection providers
β βββ main.py # FastAPI application with DI & routers
β βββ endpoints/ # API route handlers
β β βββ chat.py # Chat router with context support
β β βββ file_handler.py # File upload/management endpoints
β β βββ health.py # Health check endpoints
β βββ services/ # Business logic layer
β β βββ chat_service.py # Context-aware chat with sessions
β β βββ chunking_service.py # Document chunking logic
β β βββ document_service.py # Document parsing utilities
β β βββ extraction_service.py # Text extraction from files
β β βββ file_system_service.py # File operations & storage
β β βββ ingestion_pipeline_service.py # Document processing pipeline
β β βββ session_storage_service.py # Persistent session management
β β βββ supabase_service.py # Supabase client service
β β βββ vector_db_service.py # ChromaDB operations
β βββ schemas/
β β βββ models.py # Pydantic request/response models
β βββ prompt_engineering/ # AI prompt management
β βββ builder.py # Context-aware prompt building
β βββ templates.py # Prompt templates
β βββ matrix_definitions.py # Universal matrix definitions
βββ reflex_ui/ # Reflex frontend application
β βββ app/ # Reflex app components
β β βββ components/ # UI components
β β β βββ chat_interface.py # Chat interface component
β β β βββ input_area.py # Input area component
β β β βββ message_bubble.py # Message bubble component
β β β βββ navbar.py # Navigation bar
β β β βββ typing_indicator.py # Typing indicator
β β βββ states/ # State management
β β β βββ chat_state.py # Chat state logic
β β β βββ state.py # Global state
β β βββ app.py # Main Reflex app
β βββ assets/ # Static assets (images, icons)
β βββ uploaded_files/ # User uploaded files storage
β βββ requirements.txt # Reflex dependencies
β βββ rxconfig.py # Reflex configuration
β βββ style.py # Styling definitions
βββ tests/ # Comprehensive test suite (WIP)
β βββ test_app.py # Application tests
β βββ test_phase3.py # Context & chat tests
β βββ test_phase4.py # Session storage tests
β βββ test_phase3_integration.py # Integration tests
βββ database/
β βββ migrations/ # Database migration scripts
β βββ 001_create_chat_sessions.py
βββ data/ # Application data
β βββ chroma/ # ChromaDB storage
β βββ framework_docs/ # Pre-loaded documentation
β βββ user_uploads/ # User-uploaded documents
βββ docs/ # Project documentation
β βββ architecture/ # System design documents
β βββ learning/ # Research and iterations
β βββ process/ # Development process docs
β βββ ci-cd-architecture.md # CI/CD documentation
β βββ error-handling-architecture.md # Error handling guide
β βββ phase3_implementation.md # Phase 3 implementation details
βββ documents/ # Sample documents for testing
βββ docker-compose.yml # Main Docker services configuration
βββ docker-compose.ci.yml # CI-specific Docker configuration
βββ Dockerfile.fastapi # FastAPI service container
βββ Dockerfile.reflex # Reflex service container
βββ pyproject.toml # Project configuration & dependencies
βββ requirements.txt # Auto-generated by uv (for compatibility)
βββ rxconfig.py # Global Reflex configuration
βββ uv.lock # UV package lock file
βββ test_phase3_integration.py # Integration test runner
- Python 3.8+
- Google Generative AI API key
- Supabase account (for persistent sessions)
- Docker (optional, for ChromaDB server mode)
-
Clone the repository
git clone <repository-url> cd vino-project
-
Install uv
Follow the installation steps depending on your OS. Make sure the ENVIRONMENT VARIABLES are set up correctly for terminal/CLI use.
-
Install dependencies
uv sync --all-extras
-
Set up environment variables
For a comprehensive guide on Supabase setup, see Supabase Project Setup
Create a
.envfile in the project root:# Required: Google AI API Key GOOGLE_API_KEY=your_google_api_key_here # Required: Supabase Configuration (for persistent sessions) SUPABASE_URL=https://your-project.supabase.co SUPABASE_ANON_KEY=your_supabase_anon_key # Optional: ChromaDB Configuration USE_CHROMA_SERVER=false CHROMA_SERVER_HOST=localhost CHROMA_SERVER_PORT=8001 # Optional: Chunking Debug Mode CHUNKING_DEBUG=false
-
Set up Supabase Database
Create the necessary tables in your Supabase project by opening the SQL editor and running the following commands:
-- Create the largeobject_oid_seq sequence CREATE SEQUENCE public.largeobject_oid_seq INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 9223372036854775807 -- Max value for bigint CACHE 1; -- Create the largeobject table CREATE TABLE public.largeobject ( oid bigint NOT NULL DEFAULT nextval('largeobject_oid_seq'::regclass), plain_text text NULL, CONSTRAINT largeobject_pkey PRIMARY KEY (oid) ) TABLESPACE pg_default; -- Create the filemetadata_id_seq sequence CREATE SEQUENCE public.filemetadata_id_seq INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 9223372036854775807 -- Max value for bigint CACHE 1; -- Create the filemetadata table CREATE TABLE public.filemetadata ( id bigint NOT NULL DEFAULT nextval('filemetadata_id_seq'::regclass), file_name text NULL, file_size bigint NULL, file_type text NULL, page_count smallint NULL, word_count integer NULL, char_count integer NULL, keywords text[] NULL, source text NULL, abstract text NULL, large_object_oid bigint NULL, CONSTRAINT filemetadata_pkey PRIMARY KEY (id), CONSTRAINT fk_large_object FOREIGN KEY (large_object_oid) REFERENCES largeobject(oid) ON UPDATE CASCADE ON DELETE CASCADE ) TABLESPACE pg_default;
Because of current port configuration it is important to run FastAPI first, and Reflex second, Reflex will dynamically adjust to an available port. The default port of FastAPI is 8000.
# Start the FastAPI application
cd ./src
uv run fastapi dev
# Access the API at http://localhost:8000
# View API docs at http://localhost:8000/docs# Start the Reflex Web App
cd ./reflex_ui
uv run reflex run --env dev
See the π³ Docker Deployment section above for complete Docker setup instructions.
Context-aware chat with optional file-specific conversations.
Request Body:
{
"message": "What are the key principles in this document?",
"session_id": "optional-session-id",
"uploaded_file_context_name": "document.pdf",
"mode": "chat"
}Response:
{
"response": "Based on the document context...",
"session_id": "generated-or-provided-session-id",
"current_step": 2,
"context_sources": ["document.pdf"]
}Upload and process documents for semantic search.
Request:
- Multipart form with
filefield - Optional
collectionparameter
Response:
{
"message": "File uploaded successfully",
"filename": "document.pdf",
"collection": "user_documents"
}List uploaded files and collections.
Semantic search across document collections.
Request Body:
{
"query": "machine learning concepts",
"collection": "user_documents",
"max_results": 5
}Get session information and metadata.
Delete a specific chat session.
Clean up sessions older than specified days.
Request Body:
{
"days": 30
}Process all documents in configured directories.
Check ChromaDB connection status.
Objectives:
- Establish clean architecture with separation of concerns
- Implement centralized configuration management
- Set up dependency injection for testability and maintainability
Key Changes:
-
Centralized Configuration (
src/app/core/config.py)class Settings: def __init__(self): self.PROJECT_NAME = "VINO API" self.GOOGLE_API_KEY = SecretStr(os.getenv("GOOGLE_API_KEY")) self.SUPABASE_URL = os.getenv("SUPABASE_URL", "") # ... all configuration centralized
-
Dependency Injection (
src/app/dependencies.py)def get_chat_service() -> ChatService: return ChatService( vector_db_service=get_vector_db_service(), session_storage_service=get_session_storage_service() )
-
Service Refactoring
- All services now accept configuration via dependency injection
- Removed global state and hardcoded configuration
- Clean interfaces between services
Benefits:
- Testable services with dependency injection
- Single source of truth for configuration
- Easy environment-specific configuration
- Improved error handling and validation
Objectives:
- Create pure, single-responsibility services
- Implement document ingestion pipeline
- Separate file operations from business logic
Key Changes:
-
Service Purification
VectorDBService: Only handles vector database operationsSupabaseService: Pure client for Supabase operationsFileSystemService: Handles all file operations and storage
-
Ingestion Pipeline (
src/app/services/ingestion_service.py)class IngestionService: def process_documents(self, directory: str, collection: str): # Orchestrates: file discovery β processing β chunking β storage files = self.file_system_service.discover_files(directory) for file in files: doc = self.document_service.load_document(file) chunks = self.chunking_service.chunk_document(doc) self.vector_db_service.store_chunks(chunks, collection)
-
Error Handling & Validation
- Comprehensive error handling at service boundaries
- Input validation with Pydantic models
- Graceful degradation for external service failures
Benefits:
- Clear separation of concerns
- Reusable, composable services
- Robust error handling (WIP)
- Easier testing and maintenance (WIP)
Objectives:
- Enable file-specific conversations
- Implement context-aware prompt engineering
- Organize endpoints with FastAPI routers
Key Changes:
-
Context-Aware Chat (
src/app/services/chat_service.py)def chat(self, message: str, session_id: str, uploaded_file_context_name: str = None): if uploaded_file_context_name: # Query vector DB with file filter file_context = self.vector_db_service.query_collection( query=message, where={"source": uploaded_file_context_name} ) # Inject context into prompt enhanced_prompt = self._build_context_prompt(message, file_context)
-
Advanced Prompt Engineering (
src/app/prompt_engineering/builder.py)- Context injection based on file selection
- Universal matrix prompt system
- Dynamic prompt building based on conversation state
-
Router Organization (
src/app/endpoints/chat.py)@router.post("/v1/chat", response_model=ChatResponse) async def chat_endpoint( request: ChatRequest, chat_service: ChatService = Depends(get_chat_service) ): return chat_service.chat( message=request.message, session_id=request.session_id, uploaded_file_context_name=request.uploaded_file_context_name )
Benefits:
- File-specific conversations with document context
- Intelligent prompt engineering
- Clean API organization
- Enhanced user experience with contextual responses
Objectives:
- Move session state out of memory for scalability
- Enable horizontal scaling with stateless design
- Implement persistent session storage with Supabase
Key Changes:
-
Persistent Session Storage (
src/app/services/session_storage_service.py)class SessionStorageService: def get_session_data(self, session_id: str) -> Tuple[List[BaseMessage], int, str]: # Load from Supabase database result = self.supabase_service.client.table("chat_sessions")... def update_session_data(self, session_id: str, history, step, planner): # Persist to Supabase with fallback to memory
-
Stateless ChatService
class ChatService: def _get_session_data(self, session_id: str): if self.session_storage_service: return self.session_storage_service.get_session_data(session_id) # Fallback to memory def _update_session_data(self, session_id: str, ...): if self.session_storage_service: self.session_storage_service.update_session_data(...) # Fallback to memory
-
Database Schema (Supabase)
CREATE TABLE chat_sessions ( id SERIAL PRIMARY KEY, session_id VARCHAR(255) UNIQUE NOT NULL, conversation_history JSONB DEFAULT '[]'::jsonb, current_step INTEGER DEFAULT 1, planner_details TEXT, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), last_accessed TIMESTAMP WITH TIME ZONE DEFAULT NOW() );
-
Admin Management Endpoints
- Session information retrieval
- Session deletion and cleanup
- Automatic cleanup of old sessions
Benefits:
- Horizontal Scalability: Multiple app instances share session data (WIP)
- Persistence: Sessions survive server restarts
- Reliability: Graceful fallback to memory storage (WIP)
- Management: Admin tools for session lifecycle
- Performance: Efficient JSON storage in PostgreSQL
Key configuration options in src/app/core/config.py:
PROJECT_NAME: Application name (default: "VINO API")VERSION: API version (default: "1.3.0")GOOGLE_API_KEY: Required Google Generative AI API keySUPABASE_URL: Supabase project URL for session storageSUPABASE_ANON_KEY: Supabase anonymous key
CHUNK_SIZE: Document chunk size (configurable)CHUNK_OVERLAP: Overlap between chunks (configurable)DOCUMENTS_DIR: Framework documentation directoryUSER_UPLOADS_DIR: User upload directoryCHUNKING_DEBUG: Enable debug mode for chunking
FRAMEWORKS_COLLECTION_NAME: Collection for framework docsUSER_DOCUMENTS_COLLECTION_NAME: Collection for user docsUSE_CHROMA_SERVER: Use ChromaDB server vs local storageCHROMA_SERVER_HOST/CHROMA_SERVER_PORT: Server configuration
LLM_MODEL_NAME: Google AI model (default: "gemini-1.5-pro")LLM_TEMPERATURE: Model temperature (default: 0)LLM_MAX_RETRIES: Maximum retry attempts (default: 2)
# Upload a document
curl -X POST "http://localhost:8000/v1/upload" \
-H "Content-Type: multipart/form-data" \
-F "file=@research_paper.pdf"
# Start a file-specific conversation
curl -X POST "http://localhost:8000/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "What are the main findings in this research?",
"uploaded_file_context_name": "research_paper.pdf",
"mode": "chat"
}'curl -X POST "http://localhost:8000/v1/query" \
-H "Content-Type: application/json" \
-d '{
"query": "machine learning best practices",
"collection": "user_documents",
"max_results": 5
}'# Get session information
curl -X GET "http://localhost:8000/v1/admin/session/my-session-id"
# Clean up old sessions (admin)
curl -X POST "http://localhost:8000/v1/admin/cleanup_sessions" \
-H "Content-Type: application/json" \
-d '{"days": 30}'# First message in session
curl -X POST "http://localhost:8000/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Summarize the key concepts in this document",
"session_id": "research-session-1",
"uploaded_file_context_name": "research_paper.pdf"
}'
# Follow-up question in same session
curl -X POST "http://localhost:8000/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "What are the limitations mentioned?",
"session_id": "research-session-1",
"uploaded_file_context_name": "research_paper.pdf"
}'-
Environment Variables
- Use secure secret management for API keys
- Configure proper CORS origins for your frontend
- Set up proper logging levels and monitoring
-
Database Setup
- Redis is required for session storage and caching
- Optionally use managed Supabase instance for persistent session storage
- Set up proper database indexing for performance (if using Supabase)
- Configure backup and recovery procedures
-
Scaling Considerations
- The application is stateless and supports horizontal scaling
- Session state is cached in Redis and optionally persisted in Supabase
- Consider using a load balancer for multiple instances
-
Security
- Implement rate limiting and authentication as needed
- Use HTTPS in production
- Validate and sanitize all user inputs
# Full stack with Docker Compose (all services)
docker-compose --profile all up -d
# Or run services separately
docker-compose up -d redis # Session storage
docker-compose up -d fastapi # Backend API with embedded ChromaDB
docker-compose up -d frontend # Reflex UI frontendThe project uses multi-stage Docker builds with dependency separation:
# FastAPI Backend (Dockerfile.fastapi)
FROM python:3.11-slim
WORKDIR /app
# Install uv for faster dependency management
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
# Copy dependency files
COPY pyproject.toml uv.lock ./
# Install FastAPI dependencies only
RUN uv sync --frozen --no-dev --group fastapi
# Copy application code
COPY src/ src/
COPY .env .
EXPOSE 8000
CMD ["uv", "run", "--group", "fastapi", "uvicorn", "src.app.main:app", "--host", "0.0.0.0", "--port", "8000"]This project uses Docker Compose for easy deployment with separate services:
- FastAPI Backend: API server with embedded ChromaDB for vector storage
- Reflex Frontend: Interactive chat UI
- Redis: Session storage and caching
# Build and start all services
docker-compose --profile all up -d --build
# View running services
docker-compose ps
# View logs
docker-compose logs -f
# Stop all services
docker-compose down# Start only development services (FastAPI + ChromaDB + Redis)
docker-compose --profile dev up -d
# Start with automatic rebuilding
docker-compose --profile dev up -d --build# Start production services with optimizations
docker-compose --profile prod up -d
# Production with specific resource limits
docker-compose --profile prod up -d --build# Start only Redis (for session storage)
docker-compose up -d redis
# Start backend with embedded ChromaDB
docker-compose up -d fastapi
# Start only frontend
docker-compose up -d frontendOnce running, access the application at:
- Reflex UI Frontend: http://localhost:3000
- FastAPI Backend: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Redis: localhost:6379
Note: ChromaDB is embedded within the FastAPI service and not directly accessible
# Run all tests
python -m pytest tests/
# Run specific phase tests
python -m pytest tests/test_phase1.py -v
python -m pytest tests/test_phase2.py -v
python -m pytest tests/test_phase3.py -v
python -m pytest tests/test_phase4.py -v
# Run integration tests
python -m pytest tests/test_phase3_integration.py -v
# Run with coverage
python -m pytest --cov=src tests/The test suite covers:
- β Configuration management and dependency injection
- β Service interactions and error handling
- β Document processing and vector storage
- β Context-aware chat functionality
- β Session storage and persistence
- β Integration scenarios and edge cases
-
Session Storage Connection
Error: Cannot connect to Supabase Solution: Check SUPABASE_URL and SUPABASE_ANON_KEY in .env Fallback: Application will use Redis and memory storage automatically -
ChromaDB Issues
Error: ChromaDB connection failed Solution: Check Docker logs for FastAPI service (ChromaDB is embedded) Commands: docker-compose logs fastapi
-
Google AI API Issues
Error: Invalid API key or quota exceeded Solution: Verify GOOGLE_API_KEY and check quota limits
-
File Upload Problems
Error: File processing failed Solution: Check file permissions and supported formats Supported: PDF, TXT, DOCX
Enable detailed logging:
# Set environment variable
export CHUNKING_DEBUG=true
# Or in .env file
CHUNKING_DEBUG=trueMonitor system status:
# Check ChromaDB
curl http://localhost:8000/health/chromadb
# Check API status
curl http://localhost:8000/docs- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests (
python -m pytest tests/) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Horizontal Scaling: Multiple FastAPI instances with shared Supabase sessions
- Caching: Consider Redis for frequently accessed data
- Database Optimization: Index optimization for session queries
- Vector Search: ChromaDB performance tuning for large collections
- Session storage performance and connection pooling
- Vector database query performance
- API response times and error rates
- Memory usage and garbage collection
[Add your license information here]
For questions, issues, or contributions:
- Create an issue in the GitHub repository
- Check the documentation in the
docs/directory - Review the test files for usage examples
For Supabase table and storage setup instructions, see Supabase Setup Guide. Supabase is used for persistent session storage and document metadata management and must be configured before uploading documents.
For developers, to permanently add new documents to the Knowledge Bank:
-
Place your
.md,.txt,.pdf, or.docxfile in thedata/kb_new/directory. -
Run the following command from the project root:
uv run python -m file_upload.file_processor --default
-
The script will process and upload documents to both ChromaDB and Supabase, then move processed files to
data/kb/. -
Check the terminal output for upload status and errors.
For detailed requirements, document structure, and troubleshooting, see Document Upload Guide.
Built with passion using FastAPI, ChromaDB, Google Generative AI, and Supabase