Skip to content

Freeky7819/DragonMemory

Repository files navigation

🐉 DragonMemory

Neural Embedding Compression System for RAG Applications

DragonMemory is a production-ready RAG system that utilizes a custom neural architecture (Dragon v7) to compress semantic context. It reduces the sequence length of embeddings by a factor of 16 (16:1 pooling) while maintaining high semantic retrieval accuracy.

This allows for efficient handling of long contexts by representing text chunks as compact latent vectors rather than raw tokens.

✨ Features

  • Latent Vector Pooling: Compresses 128 input tokens into 8 resonant vectors (16:1 sequence reduction).
  • High Semantic Fidelity: Maintains >90% cosine similarity in reconstruction tasks.
  • RAG Integration: Seamless integration with retrieval-augmented generation using local LLMs.
  • Streamlit GUI: User-friendly interface for document processing, chat, and management.
  • Multi-format Support: PDF, DOCX, TXT, MD document processing.
  • Audio Transcription: OpenAI Whisper integration for audio-to-text processing.
  • Persistent Memory: Save and load vector knowledge bases efficiently.

🚀 Quick Start

Installation

# Clone the repository
git clone <repository-url>
cd DragonMemory

# Install dependencies
pip install -r requirements.txt

Ensure the model weights are in place

Place dragon_pro_1_16.pth in models/ directory

ConfigurationCopy .env.example to .env:Bashcp .env.example .env 2. Edit .env with your settings:

OLLAMA_BASE_URL=http://localhost:11434
# Optional: Add OPENAI_API_KEY if using GPT models
Make sure Ollama is running locally with your preferred model (llama3, mistral, etc.).

Running the Application
Bash

streamlit run gui_app.py

📁 Project Structure
Plaintext

DragonMemory/
├── assets/                  # Images and assets
├── src/                     # Main source code
│   ├── __init__.py
│   ├── memory_v3_model.py   # Dragon v7 Architecture (PyTorch)
│   ├── resonant_rag.py      # RAG logic & Vector Store
│   └── resonant_agent.py    # Agent logic & Multi-backend support
├── models/                  # Model weights
│   └── dragon_pro_1_16.pth  # Pre-trained Dragon compressor
├── gui_app.py               # Main Streamlit application
├── requirements.txt         # Python dependencies
├── .gitignore               # Git ignore rules
├── .env.example             # Environment template
├── LICENSE                  # AGPLv3
└── README.md                # Documentation

🧠 Architecture

Dragon v7: Neural Compressor

The core of the system is a custom PyTorch model designed to compress embedding sequences.

  • Mechanism: Multi-phase resonant pointer with neighbor mixing.
  • Input: 128 Token Embeddings (d=384).
  • Output: 8 Latent Vectors (d=384).
  • Goal: Reduce the "needle-in-a-haystack" search space for RAG systems by condensing information density.

See src/memory_v3_model.py for architecture details.

📖 Usage

Adding Documents

  1. Navigate to the Documents tab.
  2. Upload PDF, DOCX, TXT, or MD files.
  3. Click Process Documents.
  4. Documents are chunked, encoded, and compressed into the vector store.

Chat Interface

  1. Navigate to the Chat tab.
  2. Ask questions about your documents.
  3. The system retrieves relevant context using the compressed vectors.
  4. Responses are generated using your selected LLM (Ollama or OpenAI).

Audio Processing

  1. Navigate to the Audio tab.
  2. Upload MP3, WAV, or M4A files.
  3. Click Start Transcription (uses Whisper).
  4. Transcribed text can be directly saved to the RAG memory.

🔧 Configuration

Model Selection

In the sidebar, select your inference backend:

  • Local (Ollama): llama3, mistral, gemma (Free, private, requires Ollama installed).
  • Cloud (OpenAI): gpt-4o (Requires API Key).

Memory Management

  • Save DB: Persist the knowledge base to disk (memory.dragon format).
  • Load DB: Restore a previously saved knowledge base.
  • Clear Conversation: Reset the current chat history.

📉 Benchmarks & Verification

Included in this repository is a reproduction script (eval_dragon_benchmark.py) to verify the compression and retrieval performance on a controlled dataset.

To run the benchmark:

python eval_dragon_benchmark.py --dataset-dir benchmarks/toy_rag

**Internal Benchmark ResultsThe following results demonstrate the Sequence Compression capability. While the vector dimension increases (384 $\to$ 3072) to capture semantic nuance, the sequence length is drastically reduced (128 $\to$ 8), allowing for massive context windows in LLM processing. The following results demonstrate the Sequence Compression capability. While the vector dimension increases (384 $\to$ 3072) to capture semantic nuance, the sequence length is drastically reduced (128 $\to$ 8), allowing for massive context windows in LLM processing.

================= RESULTS =================
Number of questions: 6
Baseline dim: 384 (1 vector per doc)
Dragon dim:   3072 (8 vectors per doc)
Sequence compression: 128 tokens -> 8 vectors (16x reduction)
--------------------------------------------
BASELINE (Standard RAG):
  hit@1 = 1.000
  hit@3 = 1.000
  mrr@3 = 1.000
DRAGON (Compressed RAG):
  hit@1 = 1.000
  hit@3 = 1.000
  mrr@3 = 1.000
=============================================**

Storage Efficiency Analysis:When storing full context (e.g., for reranking or long-context input), Dragon offers significant savings over storing raw token embeddings:Raw Token Embeddings (128 tokens): ~0.56 MBDragon Latents (8 vectors): ~0.03 MBEffective Compression: 16.0xNote: Tests performed on the internal toy_rag dataset for logic verification.

📊 Performance Metrics

Based on internal validation on technical documentation datasets:

  • Sequence Reduction: 16x (128 tokens $\to$ 8 vectors).
  • Reconstruction Accuracy: ~90.4% (Cosine Similarity).
  • Retrieval Recall: >85% @ k=3.
  • Inference Speed: <10ms per query encoding on GPU.

🤝 Contributing

Contributions are welcome! Please read the LICENSE file for details.

📄 License

DragonMemory is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

In summary:

  • ✅ Free to use for personal and commercial purposes.
  • 🔁 If you modify the source and provide it as a service, you must open-source your changes.

🙏 Acknowledgments

  • Sentence Transformers for the base embedding models.
  • Ollama for enabling local LLM inference.
  • Streamlit for the rapid GUI development.
  • OpenAI Whisper for robust audio transcription.

DragonMemoryEfficient Contextual Memory for AI Agents.

About

Neural Memory Compression System for RAG Applications

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages