Amber

A production-ready Graph-Enhanced Retrieval-Augmented Generation (GraphRAG) platform that combines vector similarity search with knowledge graph reasoning to deliver contextual, sourced, and high-quality answers over document collections.

Overview

Traditional RAG systems rely solely on embedding similarity to retrieve context. Amber augments this with graph-based reasoning that leverages:

Chunk similarity relationships - Semantic connections between text segments
Entity co-occurrence - Shared entities across documents
Multi-hop traversal - Indirect relationships through graph paths
Community detection - Clustered semantic neighborhoods

This graph-enhanced approach surfaces contextually relevant information that pure vector search might miss.

Architecture

                         Frontend Layer
    ┌─────────────────────────────────────────────────────────────┐
    │    Next.js 14 + React + TypeScript + Zustand + Tailwind     │
    │    Chat Interface  |  Document View  |  Graph Visualization │
    └─────────────────────────────────────────────────────────────┘
                                  │
                             HTTP / SSE
                                  │
                         Backend Layer
    ┌─────────────────────────────────────────────────────────────┐
    │         FastAPI + Python 3.10+ + LangGraph + Pydantic       │
    │    API Routers  |  RAG Pipeline  |  Ingestion  |  Services  │
    └─────────────────────────────────────────────────────────────┘
                                  │
                        Neo4j Bolt Protocol
                                  │
                          Data Layer
    ┌─────────────────────────────────────────────────────────────┐
    │              Neo4j 5.x Graph Database + Cypher              │
    │    Documents  |  Chunks  |  Entities  |  Relationships      │
    └─────────────────────────────────────────────────────────────┘

Technology Stack

Layer	Technologies
Frontend	Next.js 14, TypeScript, Zustand, Tailwind CSS, Force-Graph 3D
Backend	FastAPI, LangGraph, Pydantic, asyncio, Neo4j driver
Database	Neo4j 5.x with HNSW vector indexes
AI/ML	OpenAI (GPT-4/3.5, embeddings), Google Gemini (optional), Ollama (optional), FlashRank reranking
Document Processing	Multi-format loaders, Docling (optional, PDF/DOCX/PPTX/XLSX/HTML), Marker (optional OCR), HTML heading chunker

Features

Core GraphRAG Capabilities

Feature	Description
Hybrid Retrieval	Combined vector search (70%) and entity search (30%) with configurable weights
Graph Expansion	Multi-hop traversal (1-3 hops) through similarity and entity relationship edges
FlashRank Reranking	Cross-encoder reranking with 10-15% relevance improvement
Sentence-Window Retrieval	Fine-grained sentence-level embedding with ±N context expansion for precision
Leiden Community Detection	Automatic entity clustering into semantic communities
Entity Extraction with Gleaning	Multi-pass extraction with 30-40% recall improvement
Quality Scoring	Chunk quality assessment for filtering low-quality content
Token-Aware Chunking	HTML heading chunker + Docling hybrid with tunable token budgets

Advanced Intelligence

Feature	Description
Query Routing	LLM-based query classification with semantic caching (30-50% latency reduction)
Adaptive Routing	Feedback-based weight learning that improves 10-15% after 50+ samples
Category-Specific Prompts	10 pre-configured templates with tailored retrieval strategies
Smart Consolidation	Category-aware ranking with semantic deduplication
Structured KG Queries	Text-to-Cypher translation (60-80% faster for aggregation queries)

Platform Features

Feature	Description
Chat Tuning	Runtime controls for model selection and retrieval parameters
Document Classification	Rule-based and LLM-based automatic labeling
Incremental Updates	Content-hash diffing for efficient document updates (only changed chunks reprocessed)
Automatic Orphan Cleanup	Startup cleanup of disconnected chunks/entities with configurable grace period
Multi-Layer Caching	Persistent embedding/response cache (disk), in-memory entity/retrieval cache
Secure API Key Management	SHA-256 hashed API keys with tenant isolation and database-backed authentication
External User Integration	API key authentication with minimal chat interface
SSE Streaming	Real-time token streaming (20-50ms per token)

Quick Start

Docker Compose (Recommended)

Run the full stack locally:

# Start all services
docker compose up -d

# View logs
docker compose logs -f

# Rebuild after changes
docker compose up -d --build

First Time Setup (Important): Since v2.1.0, admin keys are no longer set via environment variables. You must generate the first admin key manually:

# Generate admin key for the first login
docker compose exec backend python scripts/generate_admin_key.py

Copy the generated key (starts with sk-...) and use it to log in.

Access the application:

Local Development

Backend:

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your API keys and credentials

# Start backend
python api/main.py

Generate Admin Key: In a separate terminal, while the backend and Neo4j are running:

python scripts/generate_admin_key.py

Frontend:

cd frontend
npm install
cp .env.local.example .env.local
npm run dev

Recent Updates

v2.2.0 (December 2025)

New Features:

Docling Integration: Optional Docling library support for state-of-the-art document conversion (PDF, DOCX, PPTX, XLSX, HTML, images) with layout analysis and structure preservation
LLM Token Usage Metrics: Comprehensive analytics dashboard tracking token usage, costs (USD/EUR), breakdowns by operation/provider/model/conversation, time trends, and efficiency metrics
Selective Database Clearing: Granular control to clear Knowledge Base or Conversation History independently
Google Gemini Support: Added Gemini as LLM provider option with full configuration support
HTML Heading Chunker: New strategy for HTML documents with semantic structure and heading path extraction
Sentence-Window Retrieval: Fine-grained sentence-level embedding and retrieval with configurable context window (±5 sentences) for improved precision on detailed queries

Infrastructure:

Token Management Enhancements: Comprehensive token tracking and context management
- New token_counter.py utility and enhanced token_manager.py with intelligent context splitting
- Standardized include_usage parameter across all LLM providers
- Updated model context sizes for 2024-2025 models

Configuration:

TruLens Toggle Persistence: TruLens state now persists to config.yaml and defaults to disabled

UI Improvements:

Search UI Enhancement: Added fuzzy search to Chat and RAG Tuning panels
Bottom Panel Padding: Consistent padding across all scrollable panels

v2.1.1 (December 2024)

Bug Fixes:

Orphaned Chunks Resolution: Fixed RAG retrieval returning 0 results due to orphaned chunks not connected to documents
Document Update Progress: Fixed progress bar stuck at 5% during document updates with proper status reporting
UI Cleanup: Removed redundant "Processing in progress..." banner from Database panel

New Features:

Automatic Orphan Cleanup: Startup cleanup of orphaned chunks and entities with configurable grace period
- Configure via ENABLE_ORPHAN_CLEANUP_ON_STARTUP (default: true)
- Grace period via ORPHAN_CLEANUP_GRACE_PERIOD_MINUTES (default: 5 minutes)
Manual Cleanup API: New POST /api/database/cleanup-orphans endpoint for on-demand cleanup
Cleanup UI Button: Added "Cleanup" button in Database panel toolbar with confirmation dialog

v2.1.0 (December 2024)

Major Security Improvements:

Static Admin Token Removed: Authentication now strictly enforces database-backed API keys (no more JOBS_ADMIN_TOKEN)
API Key Hashing: Implemented SHA-256 hashing for secure API key storage
Tenant Isolation: Enforced one active API key per user
Session Security: Configured Secure and HttpOnly flags for admin cookies
Access Control: Fixed broken access control on conversation endpoints

Stability & Reliability:

Persistent Processing State: Refactored in-memory state to persist in Neo4j (prevents data loss on restart)
Community Detection: Replaced Neo4j GDS dependency with igraph for better stability
Settings Synchronization: Full alignment of RAG tuning parameters, LLM overrides, and static matching thresholds
Memory Leak Fixes: Added bounds to routing cache and adaptive router feedback history

Logic & Correctness:

Progress Calculation: Fixed "jumping" progress bars with proper stage interpolation
Metadata Merging: Changed to map merging instead of complete overwrite
Orphan Detection: Improved algorithm for detecting disconnected components
Async Improvements: Fixed unsafe asyncio.run() nesting

50+ issues addressed from the December 2024 audit. See CHANGELOG.md for complete details.

GraphRAG Pipeline

The RAG pipeline is implemented as a LangGraph StateGraph with the following stages:

Query Analysis     Parse query, extract filters, normalize
       │
       ▼
   Retrieval       Hybrid vector + entity search
       │
       ▼
Graph Reasoning    Multi-hop expansion via chunk/entity relationships
       │
       ▼
   Reranking       Optional FlashRank cross-encoder (if enabled)
       │
       ▼
  Generation       LLM generates response with source citations
       │
       ▼
Quality Scoring    Evaluate response quality (background)
       │
       ▼
  Follow-ups       Generate suggested questions (background)

Each node is independently testable, configurable, and replaceable. State flows through a typed dictionary, enabling observability and debugging.

Knowledge Graph Schema

Document ──[HAS_CHUNK]──> Chunk ──[SIMILAR_TO]──> Chunk
                             │
                     [CONTAINS_ENTITY]
                             │         [HAS_SENTENCE]
                             ▼              │
                          Entity ───────────▼
                             │          Sentence
                       [RELATED_TO]
                             │
                             ▼
                          Entity
                             │
                       [IN_COMMUNITY]
                             │
                             ▼
                         Community

Relationship	Description	Key Properties
`HAS_CHUNK`	Document to Chunk	`chunk_index`
`HAS_SENTENCE`	Chunk to Sentence	`index_in_chunk`
`CONTAINS_ENTITY`	Chunk to Entity	`mention_count`
`SIMILAR_TO`	Chunk to Chunk	`strength` (0-1 cosine similarity)
`RELATED_TO`	Entity to Entity	`strength`, `co_occurrence`
`IN_COMMUNITY`	Entity to Community	-

Configuration

Configuration follows a clear precedence hierarchy:

Chat Tuning > RAG Tuning > Environment Variables > Defaults

Key Environment Variables

# LLM and Embeddings
OPENAI_API_KEY=your-key
OPENAI_MODEL=gpt-4o-mini
EMBEDDING_MODEL=text-embedding-3-small

# Database
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password

# Feature Toggles
ENABLE_ENTITY_EXTRACTION=true
ENABLE_CLUSTERING=true
FLASHRANK_ENABLED=true
ENABLE_CACHING=true
CACHE_TYPE=disk
ENABLE_QUERY_ROUTING=false
ENABLE_STRUCTURED_KG=true
ENABLE_ADAPTIVE_ROUTING=true
ENABLE_SENTENCE_WINDOW_RETRIEVAL=false
SENTENCE_WINDOW_SIZE=5
SENTENCE_MIN_LENGTH=10

Chat Tuning vs RAG Tuning

Chat Tuning (/api/chat-tuning) - Affects query execution at runtime:

Retrieval weights, top_k, expansion depth
LLM temperature, max tokens, model selection
Changes take effect immediately

RAG Tuning (/api/rag-tuning) - Affects document ingestion:

Entity extraction, PDF processing, clustering
Chunking strategy, embedding model
Requires reindexing to apply to existing documents

Performance

Latency Targets

Operation	Cold	Cached
Chat Query (end-to-end)	2-5 seconds	0.5-1 second
Streaming Start	<500ms	<200ms
1-hop Expansion	50-100ms	-
2-hop Expansion	200-500ms	-

Ingestion Performance

Document Size	Time
Small (10 pages)	5-10 seconds
Large (1000 pages)	2-5 minutes
+ Entity Extraction	+50% (async)

Scalability (Single Host)

Documents: 10,000+
Chunks: 1,000,000+
Entities: 5,000,000+
Concurrent Users: 50+

API Reference

Chat and Retrieval

Endpoint	Description
`POST /api/chat/query`	Chat query with structured response
`POST /api/chat/stream`	SSE streaming tokens
`POST /api/chat/follow-ups`	Generate follow-up questions

Documents

Endpoint	Description
`GET /api/documents`	List documents
`GET /api/documents/{id}`	Document metadata and analytics
`POST /api/database/upload`	Upload and ingest document
`PUT /api/documents/{id}`	Incremental update (only changed chunks reprocessed)
`DELETE /api/database/documents/{id}`	Delete document
`POST /api/database/cleanup-orphans`	Manual cleanup of orphaned chunks and entities

Configuration

Endpoint	Description
`GET /api/chat-tuning/config/values`	Current retrieval tuning values
`GET /api/rag-tuning/config/values`	Current ingestion tuning values
`GET /api/database/stats`	Database statistics (includes orphan counts)
`GET /api/database/cache-stats`	Cache performance metrics

Note: Admin endpoints now support pagination via limit and offset query parameters (v2.1.0).

Structured Queries

Endpoint	Description
`POST /api/structured-kg/execute`	Execute Text-to-Cypher query
`GET /api/structured-kg/schema`	Get graph schema

For the complete API reference, see the interactive documentation at /docs.

Document Processing

Supported Formats

PDF (with optional Marker OCR)
DOCX, PPTX, XLSX
TXT, Markdown
CSV
Images (with Marker)

Ingestion Pipeline

Format detection and loader selection
Text extraction with format-specific processing
Chunking with configurable size and overlap
Async embedding generation with caching
LLM-based entity extraction with optional gleaning
Quality scoring and filtering
Batch persistence to Neo4j
Similarity edge calculation
Optional Leiden community detection

Testing

# All tests
pytest tests/

# By category
pytest tests/unit/           # Fast, isolated
pytest tests/integration/    # Requires Neo4j
pytest tests/e2e/            # Full pipeline

# Specific tests
pytest tests/integration/test_chat_pipeline.py -v
pytest tests/integration/test_full_ingestion_pipeline.py -v

# With parallel execution
pytest tests/ -n auto

E2E with Docker

# Start Neo4j and run E2E tests
make e2e-local

# Or use Docker Compose
make e2e-dc

Project Structure

amber/
├── api/                    # FastAPI routers and services
│   ├── routers/            # REST endpoint definitions
│   └── services/           # Business logic
├── core/                   # Shared utilities
│   ├── embeddings.py       # Embedding manager
│   ├── graph_db.py         # Neo4j operations
│   └── entity_extraction.py
├── rag/                    # RAG pipeline
│   ├── graph_rag.py        # LangGraph state machine
│   ├── retriever.py        # Hybrid retrieval
│   └── nodes/              # Pipeline nodes
├── ingestion/              # Document processing
│   ├── document_processor.py
│   └── loaders/            # Format-specific loaders
├── frontend/               # Next.js application
│   ├── src/app/            # Page routes
│   ├── src/components/     # UI components
│   └── src/lib/api.ts      # Backend API client
├── config/                 # Configuration files
├── scripts/                # Utility scripts
├── tests/                  # Test suite
└── documentation/          # Detailed documentation

Documentation

Comprehensive documentation is available in the documentation/ directory:

Getting Started - Architecture, setup, configuration
Core Concepts - Data model, pipeline, caching, retrieval strategies
Components - Backend, frontend, ingestion implementation details
Features - Detailed documentation for 20+ features
Data Flows - End-to-end traces of key operations
API Reference - REST endpoint documentation
Configuration - Complete parameter reference
Operations - Monitoring and maintenance guides
Development - Testing and feature development guides
Scripts - Utility scripts and CLI tools

Deployment

Docker Compose is recommended for local and demo deployments. For production:

Deploy services behind a load balancer with HTTPS
Ensure Neo4j has adequate memory (8GB+ recommended)
Install the GDS plugin when clustering is enabled
Use environment variables or secrets management for credentials
Configure appropriate connection pool sizes

Contributing

Fork the repository and create a feature branch
Run tests and linters locally (pytest, ruff check ., black .)
Open a pull request with description and test results

License

This project is licensed under the MIT License. See LICENSE.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
api		api
backup		backup
config		config
core		core
documentation		documentation
evals		evals
frontend		frontend
ingestion		ingestion
rag		rag
reports/ragas		reports/ragas
scripts		scripts
tests		tests
vendor		vendor
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
backend.pid		backend.pid
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
docker-compose.yml.bak		docker-compose.yml.bak
package-lock.json		package-lock.json
pytest.ini		pytest.ini
reproduce_persistence.py		reproduce_persistence.py
requirements-docker.txt		requirements-docker.txt
requirements.txt		requirements.txt
setup.sh		setup.sh
test.txt		test.txt
uv.lock		uv.lock
verify_admin_flow.py		verify_admin_flow.py

License

danve93/Amber

Folders and files

Latest commit

History

Repository files navigation

Amber

Overview

Architecture

Technology Stack

Features

Core GraphRAG Capabilities

Advanced Intelligence

Platform Features

Quick Start

Docker Compose (Recommended)

Local Development

Recent Updates

v2.2.0 (December 2025)

v2.1.1 (December 2024)

v2.1.0 (December 2024)

GraphRAG Pipeline

Knowledge Graph Schema

Configuration

Key Environment Variables

Chat Tuning vs RAG Tuning

Performance

Latency Targets

Ingestion Performance

Scalability (Single Host)

API Reference

Chat and Retrieval

Documents

Configuration

Structured Queries

Document Processing

Supported Formats

Ingestion Pipeline

Testing

E2E with Docker

Project Structure

Documentation

Deployment

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages