A high-performance semantic code search server built with Go that enables intelligent code discovery through natural language queries. Built on the Model Context Protocol (MCP), it combines AST parsing, vector embeddings, and real-time file monitoring to provide powerful code search capabilities.
- 🔍 Semantic Search: Search your codebase using natural language queries powered by OpenAI embeddings
- 🌳 AST-Based Parsing: Extracts functions, classes, and methods using Tree-sitter for accurate code structure understanding
- ⚡ Real-Time Monitoring: Automatic re-indexing on file changes using fsnotify with concurrent processing
- 🎯 Targeted Retrieval: Reduces token usage by retrieving only relevant code segments
- 🔧 Git Integration: Respects
.gitignorepatterns to exclude unnecessary files - 🗄️ Vector Storage: Fast similarity search using chromem-go for efficient vector database operations
- 🌐 Multi-Language Support: Supports Go, Python, TypeScript, JavaScript, and Markdown
- File Discovery: Scans your project directory while respecting
.gitignorerules - AST Parsing: Uses Tree-sitter to parse source files and extract code entities (functions, classes, methods)
- Embedding Generation: Creates semantic embeddings using OpenAI's API
- Vector Storage: Stores embeddings in chromem-go for fast similarity search
- Real-Time Updates: Monitors file changes and automatically re-indexes modified files
- Semantic Search: Queries return the most relevant code segments based on semantic similarity
- Go 1.21 or higher
- OpenAI API key
- Git (for gitignore integration)
# Clone the repository
git clone https://github.com/yourusername/code-search-mcp.git
cd code-search-mcp
# Install dependencies
go mod download
# Build the server
go build -o code-search-mcpSet your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your-api-key-here./code-search-mcp --path /path/to/your/project--path Path to the project directory to index (required)
--port Server port (default: 8080)
--watch Enable file watching for real-time updates (default: true)
--languages Comma-separated list of languages to index (default: all supported)
Once the server is running, you can search your codebase:
# Find authentication logic
curl -X POST http://localhost:8080/search \
-d '{"query": "user authentication and login"}'
# Find database connection code
curl -X POST http://localhost:8080/search \
-d '{"query": "database connection setup"}'
# Find error handling patterns
curl -X POST http://localhost:8080/search \
-d '{"query": "error handling middleware"}'- Go (.go)
- Python (.py)
- TypeScript (.ts, .tsx)
- JavaScript (.js, .jsx)
- Markdown (.md)
Additional language support can be added by extending the Tree-sitter grammar integration.
┌─────────────────┐
│ File Watcher │
│ (fsnotify) │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ AST Parser │────▶│ Code Extractor │
│ (Tree-sitter) │ │ │
└─────────────────┘ └────────┬─────────┘
│
▼
┌──────────────────┐
│ OpenAI Embeddings│
│ API │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Vector Database │
│ (chromem-go) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Search Engine │
│ (Similarity) │
└──────────────────┘
- Concurrent Processing: File monitoring and indexing run in parallel
- Incremental Updates: Only changed files are re-indexed
- Efficient Storage: Vector database optimized for similarity search
- Token Optimization: Returns only relevant code segments, reducing context size
go test ./...go build -o code-search-mcp ./cmd/serverContributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Tree-sitter for powerful AST parsing
- chromem-go for efficient vector storage
- OpenAI for embedding generation
- fsnotify for file system monitoring
- MCP for the protocol specification
- Add support for more programming languages
- Implement caching layer for frequently accessed embeddings
- Add web UI for interactive search
- Support for local embedding models (Ollama, etc.)
- Multi-repository indexing
- Advanced filtering (by file type, date, author)
- Export search results to various formats
If you encounter any issues or have questions, please open an issue on GitHu