AI-Memory is a persistent context layer designed to give your agents institutional memory. By bridging LLMs with a high-performance vector database (Qdrant), this framework ensures your agents remember architectural decisions, project rules, and past interactions across every session.
Explore the Docs | Report a Bug | Request a Feature
- πΎ Long-Term Persistence: Stop re-explaining your codebase. Agents retrieve past context automatically at session start.
- π Structured BMAD Integration: Purpose-built for BMAD workflows and multi-agent "Party Mode."
- π Semantic Retrieval: Uses vector embeddings to find relevant memories based on intent, not just keywords.
- βοΈ Decision Tracking: Automatically captures "lessons learned" and integration rules during the dev cycle.
|
This isn't a database you configure. It's institutional memory that forms as you build. Traditional knowledge bases require upfront schema design and manual curation. AI-Memory takes a different approach: let the LLM and human decide what matters, and capture it as it happens.
Your agents don't just executeβthey learn. |
|
- ποΈ Three Specialized Collections: code-patterns (HOW), conventions (WHAT), discussions (WHY)
- π― 17 Memory Types: Precise categorization for implementation, errors, decisions, Jira issues, and more
- β‘ 6 Automatic Triggers: Smart context injection when you need it most
- π Intent Detection: Automatically routes queries to the right collection
- π¬ Conversation Memory: Turn-by-turn capture with post-compaction context continuity
- π Cascading Search: Falls back across collections for comprehensive results
- π Monitoring: Prometheus metrics + Grafana dashboards
- π‘οΈ Graceful Degradation: Works even when services are temporarily unavailable
- π₯ Multi-Project Isolation:
group_idfiltering keeps projects separate
New here? Jump to Quick Start to get running in 5 minutes.
Bring your work context into semantic memory with built-in Jira Cloud support:
- Semantic Search: Search Jira issues and comments by meaning, not just keywords
- Full & Incremental Sync: Initial backfill or fast daily updates via JQL
- ADF Conversion: Atlassian Document Format β plain text for accurate embeddings
- Rich Filtering: Search by project, issue type, status, priority, or author
- Issue Lookup: Retrieve complete issue context (issue + all comments, chronologically)
- Dedicated Collection:
jira-datacollection keeps Jira content separate from code memory - Tenant Isolation:
group_idbased on Jira instance hostname prevents cross-instance leakage - Two Skills:
/jira-syncfor synchronization,/search-jirafor semantic search
See docs/JIRA-INTEGRATION.md for setup and usage guide.
When you ask "how should I..." or "what's the best way to...", AI-Memory's best-practices-researcher activates:
- Search Local Knowledge - Checks the conventions collection first
- Web Research - Searches 2024-2026 sources if needed
- Save Findings - Stores to
oversight/knowledge/best-practices/BP-XXX.md - Database Storage - Adds to Qdrant for future retrieval
- Skill Evaluation - Determines if findings warrant a reusable skill
When research reveals a repeatable process, the skill-creator agent can generate a Claude Code skill:
User: "Research best practices for writing commit messages"
β Best Practices Researcher finds patterns
β Evaluates: "This is a repeatable process with clear steps"
β User confirms: "Yes, create a skill"
β Skill Creator generates .claude/skills/writing-commits/SKILL.md
The Result: Your AI agents continuously discover and codify knowledge into reusable skills.
|
Memory is only half the equation. Quality is the other half. AI-Memory gives your agents institutional knowledge. Parzival ensures they use it wisely.
Parzival recommends. You decide. |
|
Works with BMAD Method β Enhances BMAD workflows with persistent memory, but works standalone with any Claude Code project.
Claude Code Session
βββ SessionStart Hooks (resume|compact) β Context injection on session resume and post-compaction
βββ UserPromptSubmit Hooks β Unified keyword trigger (decisions/best practices/session history)
βββ PreToolUse Hooks β Smart triggers (new file/first edit conventions)
βββ PostToolUse Hooks β Capture code patterns + error detection
βββ PreCompact Hook β Save conversation before compaction
βββ Stop Hook β Capture agent responses
Python Core (src/memory/)
βββ config.py β Environment configuration
βββ storage.py β Qdrant CRUD operations
βββ search.py β Semantic search + cascading
βββ intent.py β Intent detection + routing
βββ triggers.py β Automatic trigger configuration
βββ embeddings.py β Jina AI embeddings (768d)
βββ deduplication.py β Hash + similarity dedup
Docker Services
βββ Qdrant (port 26350)
βββ Embedding Service (port 28080)
βββ Classifier Worker (LLM reclassification)
βββ Streamlit Dashboard (port 28501)
βββ Monitoring Stack (--profile monitoring)
βββ Prometheus (port 29090)
βββ Pushgateway (port 29091)
βββ Grafana (port 23000)
| Collection | Purpose | Example Types |
|---|---|---|
| code-patterns | HOW things are built | implementation, error_fix, refactor |
| conventions | WHAT rules to follow | rule, guideline, naming, structure |
| discussions | WHY things were decided | decision, session, preference |
| jira-data | External work items from Jira Cloud | jira_issue, jira_comment |
Note: The
jira-datacollection is conditional β it is only created when Jira sync is enabled (JIRA_SYNC_ENABLED=true).
The memory system automatically retrieves relevant context:
- Error Detection: When a command fails, retrieves past error fixes
- New File Creation: Retrieves naming conventions and structure patterns
- First Edit: Retrieves file-specific patterns on first modification
- Decision Keywords: "Why did we..." triggers decision memory retrieval
- Best Practices Keywords: "How should I..." triggers convention retrieval
- Session History Keywords: "What have we done..." triggers session summary retrieval
The following keywords automatically activate memory retrieval when detected in your prompts:
Decision Keywords (20 patterns) β Searches discussions for past decisions
| Category | Keywords |
|---|---|
| Decision recall | why did we, why do we, what was decided, what did we decide |
| Memory recall | remember when, remember the decision, remember what, remember how, do you remember, recall when, recall the, recall how |
| Session references | last session, previous session, earlier we, before we, previously, last time we, what did we do, where did we leave off |
Session History Keywords (16 patterns) β Searches discussions for session summaries
| Category | Keywords |
|---|---|
| Project status | what have we done, what did we work on, project status, where were we, what's the status |
| Continuation | continue from, pick up where, continue where |
| Remaining work | what's left to do, remaining work, what's next for, what's next on, what's next in the, next steps, todo, tasks remaining |
Best Practices Keywords (27 patterns) β Searches conventions for guidelines
| Category | Keywords |
|---|---|
| Standards | best practice, best practices, coding standard, coding standards, convention, conventions for |
| Patterns | what's the pattern, what is the pattern, naming convention, style guide |
| Guidance | how should i, how do i, what's the right way, what is the right way |
| Research | research the pattern, research best practice, look up, find out about, what do the docs say |
| Recommendations | should i use, what's recommended, what is recommended, recommended approach, preferred approach, preferred way, industry standard, common pattern |
Note: Keywords are case-insensitive. Only structured patterns trigger retrieval to avoid false positives on casual conversation.
The optional LLM Classifier automatically reclassifies captured memories into more precise types:
- Rule-based first: Fast pattern matching (free, <10ms)
- LLM fallback: AI classification when rules don't match
- Provider chain: Primary provider with automatic fallback
Quick Setup:
# Configure in docker/.env
MEMORY_CLASSIFIER_ENABLED=true
MEMORY_CLASSIFIER_PRIMARY_PROVIDER=ollama # or: openrouter, claude, openai
MEMORY_CLASSIFIER_FALLBACK_PROVIDERS=openrouter
# For Ollama (free, local)
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=sam860/LFM2:2.6b
# For OpenRouter (free tier available)
OPENROUTER_API_KEY=sk-or-v1-your-key
OPENROUTER_MODEL=google/gemma-2-9b-it:freeSee docs/llm-classifier.md for complete setup guide, provider options, and troubleshooting.
# Core services (Qdrant + Embedding)
docker compose -f docker/docker-compose.yml up -d
# With monitoring (adds Prometheus, Grafana, Pushgateway)
docker compose -f docker/docker-compose.yml --profile monitoring up -d# Check Qdrant (port 26350)
curl http://localhost:26350/health
# Check Embedding Service (port 28080)
curl http://localhost:28080/health
# Check Grafana (port 23000) - if monitoring enabled
open http://localhost:23000 # admin/admin./scripts/install.sh /path/to/your-project
# With convention seeding (recommended)
SEED_BEST_PRACTICES=true ./scripts/install.sh /path/to/your-projectExpected Output:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AI Memory Module Health Check
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[1/3] Checking Qdrant (localhost:26350)...
β
Qdrant is healthy
[2/3] Checking Embedding Service (localhost:28080)...
β
Embedding service is healthy
[3/3] Checking Monitoring API (localhost:28000)...
β
Monitoring API is healthy
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
All Services Healthy β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Python 3.10+ (3.11+ required for AsyncSDKWrapper)
- Docker 20.10+ (for Qdrant + embedding service)
- Claude Code (target project where memory will be installed)
See INSTALL.md for detailed installation instructions including:
- System requirements with version compatibility
- Step-by-step installation for macOS, Linux, and Windows (WSL2)
- Automated installer and manual installation methods
- Post-installation verification
- Configuration options
- Uninstallation procedures
All services use 2XXXX prefix to avoid conflicts:
| Service | External | Internal | Access URL |
|---|---|---|---|
| Qdrant | 26350 | 6333 | localhost:26350 |
| Embedding | 28080 | 8080 | localhost:28080/embed |
| Monitoring API | 28000 | 8000 | localhost:28000/health |
| Streamlit | 28501 | 8501 | localhost:28501 |
| Grafana | 23000 | 3000 | localhost:23000 |
| Prometheus | 29090 | 9090 | localhost:29090 (--profile monitoring) |
| Pushgateway | 29091 | 9091 | localhost:29091 (--profile monitoring) |
| Variable | Default | Description |
|---|---|---|
QDRANT_HOST |
localhost |
Qdrant server hostname |
QDRANT_PORT |
26350 |
Qdrant external port |
EMBEDDING_HOST |
localhost |
Embedding service hostname |
EMBEDDING_PORT |
28080 |
Embedding service port |
AI_MEMORY_INSTALL_DIR |
~/.ai-memory |
Installation directory |
MEMORY_LOG_LEVEL |
INFO |
Logging level (DEBUG/INFO/WARNING) |
Jira Cloud Integration (Optional):
| Variable | Default | Description |
|---|---|---|
JIRA_INSTANCE_URL |
(empty) | Jira Cloud URL (e.g., https://company.atlassian.net) |
JIRA_EMAIL |
(empty) | Jira account email for Basic Auth |
JIRA_API_TOKEN |
(empty) | API token from id.atlassian.com |
JIRA_PROJECTS |
(empty) | Comma-separated project keys (e.g., PROJ,DEV,OPS) |
JIRA_SYNC_ENABLED |
false |
Enable Jira synchronization |
JIRA_SYNC_DELAY_MS |
100 |
Delay between API requests (ms) |
See docs/JIRA-INTEGRATION.md for complete Jira setup guide.
Override Example:
export QDRANT_PORT=16333 # Use custom port
export MEMORY_LOG_LEVEL=DEBUG # Enable verbose loggingMemory capture happens automatically via Claude Code hooks:
- SessionStart (resume/compact only): Injects relevant memories when resuming a session or after context compaction
- PostToolUse: Captures code patterns (Write/Edit/NotebookEdit tools) in background (<500ms)
- PreCompact: Saves session summary before context compaction (auto or manual
/compact) - Stop: Optional per-response cleanup
No manual intervention required - hooks handle everything.
The "Aha Moment": Claude remembers your previous sessions automatically. Start a new session and Claude will say "Welcome back! Last session we worked on..." without you reminding it.
Use slash commands for manual control:
# Check system status
/memory-status
# Manually save current session
/save-memory
# Search across all memories
/search-memory <query>
# Jira Cloud Integration (requires JIRA_SYNC_ENABLED=true)
/jira-sync # Incremental sync from Jira
/jira-sync --full # Full sync (all issues and comments)
/search-jira "query" # Semantic search across Jira content
/search-jira --issue PROJ-42 # Lookup issue + all commentsSee docs/HOOKS.md for hook documentation, docs/COMMANDS.md for commands, docs/llm-classifier.md for LLM classifier setup, and docs/JIRA-INTEGRATION.md for Jira integration guide.
The AsyncSDKWrapper provides full async/await support for building custom Agent SDK agents with persistent memory.
Features:
- Full async/await support compatible with Agent SDK
- Rate limiting with token bucket algorithm (Tier 1: 50 RPM, 30K TPM)
- Exponential backoff retry with jitter (3 retries: 1s, 2s, 4s Β±20%)
- Automatic conversation capture to discussions collection
- Background storage (fire-and-forget pattern)
- Prometheus metrics integration
Basic Usage:
import asyncio
from src.memory import AsyncSDKWrapper
async def main():
async with AsyncSDKWrapper(cwd="/path/to/project") as wrapper:
# Send message with automatic rate limiting and retry
result = await wrapper.send_message(
prompt="What is async/await?",
model="claude-sonnet-4-5-20250929",
max_tokens=500
)
print(f"Response: {result['content']}")
print(f"Session ID: {result['session_id']}")
asyncio.run(main())Streaming Responses (Buffered):
Note: Current implementation buffers the full response for retry reliability. True progressive streaming planned for future release.
async with AsyncSDKWrapper(cwd="/path/to/project") as wrapper:
async for chunk in wrapper.send_message_buffered(
prompt="Explain Python async",
model="claude-sonnet-4-5-20250929",
max_tokens=800
):
print(chunk, end='', flush=True)Custom Rate Limits:
async with AsyncSDKWrapper(
cwd="/path/to/project",
requests_per_minute=100, # Tier 2
tokens_per_minute=100000 # Tier 2
) as wrapper:
result = await wrapper.send_message("Hello!")Examples:
examples/async_sdk_basic.py- Basic async/await usage, context manager pattern, session ID logging, rate limiting demonstrationexamples/async_sdk_streaming.py- Streaming response handling (buffered), progressive chunk processing, retry behaviorexamples/async_sdk_rate_limiting.py- Custom rate limit configuration, queue depth/timeout settings, error handling for different API tiers
Configuration:
Set ANTHROPIC_API_KEY environment variable before using AsyncSDKWrapper:
export ANTHROPIC_API_KEY=sk-ant-api03-...Rate Limiting:
The wrapper implements token bucket algorithm matching Anthropic's rate limits:
| Tier | Requests/Min | Tokens/Min |
|---|---|---|
| Free | 5 | 10,000 |
| Tier 1 | 50 (default) | 30,000 (default) |
| Tier 2 | 100 | 100,000 |
| Tier 3+ | 1,000+ | 400,000+ |
Circuit breaker protections:
- Max queue depth: 100 requests
- Queue timeout: 60 seconds
- Raises
QueueTimeoutErrororQueueDepthExceededErrorif exceeded
Retry Strategy:
Automatic exponential backoff retry (DEC-029):
- Max retries: 3
- Delays: 1s, 2s, 4s (with Β±20% jitter)
- Retries on: 429 (rate limit), 529 (overload), network errors
- No retry on: 4xx client errors (except 429), auth failures
- Respects
retry-afterheader when provided
Memory Capture:
All messages are automatically captured to the discussions collection:
- User messages β
user_messagetype - Agent responses β
agent_responsetype - Background storage (non-blocking)
- Session-based grouping with turn numbers
See src/memory/async_sdk_wrapper.py for complete API documentation.
For complete design rationale, see oversight/specs/tech-debt-035/phase-2-design.md.
Memories are automatically isolated by group_id (derived from project directory):
# Project A: group_id = "project-a"
# Project B: group_id = "project-b"
# Searches only return memories from current projectV2.0 Collection Isolation:
- code-patterns: Implementation patterns (per-project isolation)
- conventions: Coding standards and rules (shared across projects by default)
- discussions: Decisions, sessions, conversations (per-project isolation)
See TROUBLESHOOTING.md for comprehensive troubleshooting, including:
- Services won't start
- Health check failures
- Memories not captured
- Search not working
- Performance problems
- Data persistence issues
If hooks are misbehaving (e.g., after a failed install or upgrade), use the recovery script to scan and repair all project configurations:
# Dry-run: shows what would change (safe, no modifications)
python scripts/recover_hook_guards.py
# Apply fixes across all discovered projects
python scripts/recover_hook_guards.py --apply
# Scan only: list all discovered project settings.json files
python scripts/recover_hook_guards.py --scanThe recovery script automatically discovers projects via:
~/.ai-memory/installed_projects.jsonmanifest (primary)- Sibling directories of
AI_MEMORY_INSTALL_DIR(fallback) - Common project paths (additional fallback)
It fixes: unguarded hook commands (BUG-066), broad SessionStart matchers (BUG-078), and other known configuration issues. Always run with dry-run first to review changes.
# Check all services
docker compose -f docker/docker-compose.yml ps
# Check logs
docker compose -f docker/docker-compose.yml logs
# Check health
python scripts/health-check.py
# Check ports
lsof -i :26350 # Qdrant
lsof -i :28080 # Embedding
lsof -i :28000 # Monitoring APISymptom: docker compose up -d fails or services exit immediately
Solution:
-
Check port availability:
lsof -i :26350 # Qdrant lsof -i :28080 # Embedding
-
Check Docker logs:
docker compose -f docker/docker-compose.yml logs
-
Ensure Docker daemon is running:
docker ps # Should not error
Symptom: python scripts/health-check.py shows unhealthy services
Solution:
-
Check service status:
docker compose -f docker/docker-compose.yml ps
-
Verify ports are accessible:
curl http://localhost:26350/health # Qdrant curl http://localhost:28080/health # Embedding
-
Check logs for errors:
docker compose -f docker/docker-compose.yml logs qdrant docker compose -f docker/docker-compose.yml logs embedding
Symptom: PostToolUse hook doesn't store memories
Solution:
-
Check hook configuration in
.claude/settings.json:{ "hooks": { "PostToolUse": [{ "matcher": "Write|Edit", "hooks": [{"type": "command", "command": ".claude/hooks/scripts/post_tool_capture.py"}] }] } } -
Verify hook script is executable:
ls -la .claude/hooks/scripts/post_tool_capture.py chmod +x .claude/hooks/scripts/post_tool_capture.py
-
Check hook logs (if logging enabled):
cat ~/.ai-memory/logs/hook.log
For more detailed troubleshooting, see TROUBLESHOOTING.md.
Access Grafana at http://localhost:23000 (admin/admin):
| Dashboard | Purpose |
|---|---|
| NFR Performance Overview | All 6 NFR metrics with SLO compliance |
| Hook Activity | Hook execution rates, latency heatmaps |
| Memory Operations | Captures, retrievals, deduplication |
| System Health | Service status, error rates |
| NFR | Metric | Target |
|---|---|---|
| NFR-P1 | Hook execution | <500ms |
| NFR-P2 | Batch embedding | <2s |
| NFR-P3 | Session injection | <3s |
| NFR-P4 | Dedup check | <100ms |
| NFR-P5 | Retrieval query | <500ms |
| NFR-P6 | Real-time embedding | <500ms |
| Service | Port |
|---|---|
| Grafana | 23000 |
| Prometheus | 29090 |
| Pushgateway | 29091 |
All metrics use aimemory_ prefix (BP-045 compliant):
| Metric | Description |
|---|---|
aimemory_hook_duration_seconds |
Hook execution time (NFR-P1) |
aimemory_captures_total |
Total memory capture attempts |
aimemory_retrievals_total |
Total retrieval operations |
aimemory_trigger_fires_total |
Automatic trigger activations |
See docs/MONITORING.md for complete monitoring guide and docs/prometheus-queries.md for query examples.
Protect your AI memories with built-in backup and restore scripts.
# Setup (one-time)
cd /path/to/ai-memory
python3 -m venv .venv && source .venv/bin/activate
pip install httpx
# Get your Qdrant API key
cat ~/.ai-memory/docker/.env | grep QDRANT_API_KEY
export QDRANT_API_KEY="your-key-here"
# Run backup
python scripts/backup_qdrant.pyBackups are stored in backups/ directory with timestamped folders containing:
- Collection snapshots (discussions, conventions, code-patterns)
- Configuration files
- Verification manifest
python scripts/restore_qdrant.py backups/2026-02-03_143052See docs/BACKUP-RESTORE.md for complete instructions including troubleshooting.
Coming soon: Backup and restore scripts will be updated in the next version to support the
jira-datacollection, including Jira database backup and reinstall.
- Hook overhead: <500ms (PostToolUse forks to background)
- Embedding generation: <2s (pre-warmed Docker service)
- SessionStart context injection: <3s
- Deduplication check: <100ms
-
Enable monitoring profile for production use:
docker compose -f docker/docker-compose.yml --profile monitoring up -d
-
Adjust batch size for large projects:
export MEMORY_BATCH_SIZE=100 # Default: 50
-
Increase cache TTL for stable projects:
export MEMORY_CACHE_TTL=3600 # Default: 1800 seconds
# Run all tests
pytest tests/
# Run specific test file
pytest tests/test_storage.py -v
# Run integration tests only
pytest tests/integration/ -vai-memory/
βββ src/memory/ # Core Python modules
βββ .claude/
β βββ hooks/scripts/ # Hook implementations
β βββ skills/ # Skill definitions
βββ docker/ # Docker Compose and service configs
βββ scripts/ # Installation and management scripts
βββ tests/ # pytest test suite
βββ docs/ # Additional documentation
- Python (PEP 8 Strict): Files
snake_case.py, Functionssnake_case(), ClassesPascalCase, ConstantsUPPER_SNAKE - Qdrant Payload Fields: Always
snake_case(content_hash,group_id,source_hook) - Structured Logging: Use
logger.info("event", extra={"key": "value"}), never f-strings - Hook Exit Codes:
0(success),1(non-blocking error),2(blocking error - rare) - Graceful Degradation: All components must fail silently - Claude works without memory
See CONTRIBUTING.md for complete development setup and coding standards.
We welcome contributions! To contribute:
- Fork the repository and create a feature branch
- Follow coding conventions (see Development section above)
- Write tests for all new functionality
- Ensure all tests pass:
pytest tests/ - Update documentation if adding features
- Submit a pull request with a clear description
See CONTRIBUTING.md for detailed development setup and pull request process.
MIT License - see LICENSE for details.
This documentation follows WCAG 2.2 Level AA accessibility standards (ISO/IEC 40500:2025):
- β Proper heading hierarchy (h1 β h2 β h3)
- β Descriptive link text (no "click here")
- β Code blocks with language identifiers
- β Tables with headers for screen readers
- β Consistent bullet style (hyphens)
- β ASCII art diagrams for universal compatibility
For accessibility concerns or suggestions, please open an issue.
Documentation Best Practices Applied (2026):
This README follows current best practices for technical documentation:
- Documentation as Code (Technical Documentation Best Practices)
- Markdown standards with consistent formatting (Markdown Best Practices)
- Essential sections per README standards (Make a README)
- Quick value communication (README Best Practices - Tilburg Science Hub)
- WCAG 2.2 accessibility compliance (W3C WCAG 2.2 as ISO Standard)
