Skip to content

yingkitw/codesearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CodeSearch

Fast, intelligent code search and analysis for 48+ languages.

Find what you need in seconds: functions, classes, duplicates, dead code, complexity issues.

Rust License: Apache-2.0

What Can You Search & Detect?

πŸ” Search Capabilities

What Example Use Case
Functions codesearch "fn authenticate" -e rs Find where authentication logic lives
Classes codesearch "class User" -e py Locate data models
TODO/FIXME codesearch "TODO|FIXME" . Track technical debt
Imports codesearch "^import" -e js Understand dependencies
Patterns codesearch "async.*await" --fuzzy Find async code (handles typos)
Exact Text codesearch "deprecated_function" Find all usages before refactoring

Supports: Regex, fuzzy matching, case-insensitive, multi-language (48+ languages)

πŸ”Ž Code Quality Detection

What Command Value
Dead Code codesearch deadcode Find unused functions, variables, imports, empty functions
Duplicates codesearch duplicates Identify copy-paste code (Type-1/2/3 clones)
Complexity codesearch complexity Spot overly complex functions (cyclomatic/cognitive)
Circular Deps codesearch circular Detect circular dependencies

πŸ’‘ Real-World Use Cases

Before Refactoring:

# Find all usages of old function
codesearch "oldAuthMethod" .
# Result: Found in 12 files, 23 occurrences

Code Review:

# Check for technical debt
codesearch deadcode ./src
# Result: 5 unused functions, 12 TODO comments

Understanding Codebase:

# Find all authentication-related code
codesearch "auth" -e rs,py --rank
# Result: Ranked by relevance, with line numbers

Quality Check:

# Find duplicated code
codesearch duplicates --min-lines 5
# Result: 8 duplicate blocks (90%+ similar)

πŸš€ Quick Start

# Simple search: codesearch <query> [path]
codesearch "function"           # Search current directory
codesearch "TODO" ./src         # Search specific path
codesearch "class" ./src -e py  # Filter by extension

# Fuzzy search (handles typos)
codesearch "usrmngr" . --fuzzy

# Interactive mode
codesearch interactive

# Analysis commands
codesearch analyze              # Codebase metrics
codesearch complexity           # Complexity scores
codesearch metrics              # Comprehensive metrics (all-in-one)
codesearch design-metrics       # Coupling & cohesion
codesearch duplicates           # Find similar code
codesearch deadcode             # Find unused code

# Advanced features
codesearch index                # Build incremental index
codesearch watch                # Watch for file changes

# Graph analysis (6 types)
codesearch ast file.rs          # Abstract Syntax Tree
codesearch cfg file.rs          # Control Flow Graph
codesearch dfg file.rs          # Data Flow Graph
codesearch callgraph .          # Call Graph
codesearch depgraph .           # Dependency Graph
codesearch pdg file.rs          # Program Dependency Graph
codesearch graph-all file.rs    # All graphs

# Other advanced features
codesearch git-history "TODO"   # Search git history
codesearch remote --github "pattern" # Search GitHub

Why CodeSearch?

Fast & Precise

  • Parallel processing using Rust and rayon
  • Exact line numbers with precise matching
  • Smart caching for repeated searches
  • Typical search: 3-50ms for codebases < 1000 files

Language-Aware

  • 48+ languages supported
  • Understands functions, classes, imports
  • Syntax-specific patterns

Quality Focused

  • Detect dead code before it ships
  • Find duplicates to improve DRY
  • Measure complexity to guide refactoring
  • Analyze design quality (coupling, cohesion, instability)

Developer Friendly

  • Interactive REPL mode
  • Export to CSV/Markdown
  • MCP server for AI agents

Advanced Capabilities

  • Incremental indexing for large codebases
  • Real-time file watching
  • Git history search
  • Remote repository search (GitHub/GitLab)

Graph Analysis (6 Types)

  • Abstract Syntax Tree (AST) - code structure
  • Control Flow Graph (CFG) - execution paths
  • Data Flow Graph (DFG) - variable dependencies
  • Call Graph - function relationships
  • Dependency Graph - module dependencies
  • Program Dependency Graph (PDG) - combined analysis

High Code Quality

  • βœ… 100% test pass rate (173 unit + 36 integration tests)
  • βœ… Zero clippy warnings (as of Jan 2026)
  • βœ… Modular architecture (19 focused modules)
  • βœ… DRY, KISS, and SoC principles throughout
  • βœ… Thread-safe parallel processing
  • βœ… Comprehensive error handling

Installation

git clone https://github.com/yingkitw/codesearch.git
cd codesearch
cargo build --release

# Optional: MCP server for AI agents
cargo build --release --features mcp

Common Options

# Filter by file type
codesearch "pattern" -e rs,py,js

# Exclude directories
codesearch "pattern" -x target,node_modules

# Case-insensitive
codesearch "pattern" -i

# Fuzzy matching (handles typos)
codesearch "patern" --fuzzy

# Rank by relevance
codesearch "pattern" --rank

# Export results
codesearch "pattern" --export csv

πŸ“– Usage Examples

Search Patterns

# codesearch <query> [path] [options]
codesearch "TODO"                       # Search current directory
codesearch "class" ./src                # Search specific folder
codesearch "error" . -e py,js,ts        # Filter by extensions

# Regex patterns
codesearch "fn\\s+\\w+" ./src -e rs     # Rust functions
codesearch "import.*from" . -e ts       # TypeScript imports

# Fuzzy search (handles typos)
codesearch "authetication" . --fuzzy    # Finds "authentication"

Code Analysis

# Codebase overview
codesearch analyze
# Output: Files, lines, languages, function count, class count

# Complexity analysis
codesearch complexity --threshold 15 --sort
# Output: Files ranked by cyclomatic/cognitive complexity

# Dead code detection (enhanced with 6+ detection types)
codesearch deadcode -e rs,py,js
# Output: Unused variables, unreachable code, empty functions, 
#         TODO/FIXME markers, commented code, unused imports

# Duplicate detection
codesearch duplicates --similarity 0.8
# Output: Similar code blocks that violate DRY

Interactive Mode

codesearch interactive

Commands:

  • Type any pattern to search
  • /f - Toggle fuzzy mode
  • /i - Toggle case insensitivity
  • analyze - Codebase metrics
  • complexity - Complexity analysis
  • deadcode - Dead code detection
  • duplicates - Find duplicates
  • help - All commands

MCP Server (AI Integration)

# Start MCP server
cargo run --features mcp -- mcp-server

# Agents can call:
# - search_code(query, path, extensions, fuzzy, regex)
# - list_files(path, extensions, exclude)
# - analyze_codebase(path, extensions)

πŸ“Š Output Examples

Search Results

πŸ” Search Results for "fn main"
──────────────────────────────

πŸ“ ./src/main.rs (1 match)
  358: fn main() -> Result<(), Box<dyn std::error::Error>> {

πŸ“Š Statistics:
  Files searched: 12
  Matches found: 1
  Time: 0.003s

Dead Code Detection

πŸ” Dead Code Detection
──────────────────────────────

⚠️  Found 12 potential dead code items:

πŸ“„ src/example.rs
   [var] L  10: variable 'unused_var' - Variable declared but never used
   [!]   L  25: unreachable - Code after return statement is unreachable
   [βˆ…]   L  42: empty_helper - Empty function with no implementation
   [?]   L  58: // TODO: implement this - TODO marker - incomplete implementation
   [imp] L  72: import 'HashMap' - Imported but never used

πŸ“Š Summary:
   β€’ variable: 3
   β€’ unreachable: 2
   β€’ empty: 2
   β€’ todo: 3
   β€’ import: 2

Complexity Analysis

πŸ“Š Code Complexity Analysis
──────────────────────────────

πŸ“ Files by Complexity (highest first):

  src/search.rs
    Cyclomatic: 45  Cognitive: 38  Lines: 645

  src/analysis.rs
    Cyclomatic: 28  Cognitive: 22  Lines: 378

Code Quality & Architecture

Maintainability

  • Modular Design: 19 focused modules following single responsibility principle
  • Clean Code: Average module size ~200 LOC, functions < 100 LOC
  • Design Patterns: Strategy, Observer, Facade patterns for extensibility
  • Best Practices: DRY, KISS, SoC principles consistently applied

Test Coverage

  • 173 Unit Tests: Core functionality thoroughly tested
  • 36 Integration Tests: End-to-end CLI command verification
  • 23 MCP Tests: AI agent integration validated
  • Edge Cases: Empty files, unicode, large files, special characters

Performance

  • Parallel Processing: Auto-scales to available CPU cores with rayon
  • Smart Caching: 70-90% cache hit rate for repeated searches
  • Memory Efficient: Streaming file reading, < 100MB for 10K files
  • Optimized: Regex compilation moved outside loops, fast hashing with ahash

Future Improvements

See TODO.md for planned enhancements:

  • Trait abstractions for better testability
  • Property-based testing with proptest
  • Performance profiling and optimization
  • Enhanced caching with LRU eviction
  • Workspace crate structure for modularity

Supported Languages

48+ languages including: Rust, Python, JavaScript, TypeScript, Go, Java, C/C++, Ruby, PHP, Swift, Kotlin, and more.

See ARCHITECTURE.md for technical details and design principles.

License

Apache-2.0 License


Built with Rust β€’ Fast β€’ Precise β€’ 48+ Languages

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published