-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Overview
Enhance the Retrieval Augmented Generation (RAG) system by implementing a true hybrid search pipeline combining vector similarity (semantic search) and keyword-based (full-text) search. This matches the latest best practices for RAG reliability and accuracy (Reference 1, Reference 2).
Why?
- Increases recall and relevance of retrieved matches
- Catches both semantic similarity and explicit franchise/studio/keyword matches
- Reduces user corrections on edge cases
Implementation Plan
-
Backend (Node/Express)
- Update
RAGRetriever.hybridSearch()inserver/src/services/ragRetriever.jsto fuse pgvector results with full-text search fromclassification_history. - Use Reciprocal Rank Fusion (RRF) or a weighted average as in top RAG systems (Reference 5).
// Example hybrid fusion tweak (Node.js excerpt) // Inside hybridSearch() let results; if (fusionMethod === 'rrf') { results = this.calculateRRF(semanticMatches, textMatches, rrfK); } else { results = this.legacyHybridCombine(semanticMatches, textMatches, limit); }
- Tune weights between vector and keyword match. Consider testing different fusion algorithms.
- Update tests in
server/src/__tests__/ragRetriever.rrf.test.jsfor all fusion paths.
- Update
-
Database
- Ensure
classification_historyhas a full-text search index (e.g., on the title, overview, genres columns). - Example SQL:
CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_history_fts ON classification_history USING gin(to_tsvector('english', title || ' ' || overview || ' ' || genres));
- Ensure
-
Validation
- Benchmark accuracy/recall against pure vector and pure text retrieval
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request