Skip to content

asimunit/ai_recruiter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 AI Recruitr - Smart Resume Matcher

AI-powered recruitment system that uses semantic matching to find the best-fit candidates using FAISS vector database and Gemini AI.

AI Recruitr Banner

✨ Features

  • 🎯 Semantic Resume Matching - Goes beyond keyword matching using AI embeddings
  • πŸš€ Fast Vector Search - Lightning-fast similarity search with FAISS
  • πŸ€– AI-Powered Explanations - Gemini AI generates detailed match explanations
  • πŸ“„ Multi-Format Support - Process PDF, DOCX, and TXT resume files
  • πŸ’‘ Clean Architecture - Modular microservices design with FastAPI + Streamlit
  • πŸ“Š Analytics Dashboard - Comprehensive insights and matching analytics
  • ⚑ Real-Time Processing - Instant resume processing and matching
  • πŸ” Advanced Filtering - Filter by skills, experience, location, and more

πŸ—οΈ Architecture

ai_recruitr/
β”œβ”€β”€ backend/              # FastAPI microservices
β”‚   β”œβ”€β”€ services/         # Core business logic
β”‚   β”œβ”€β”€ api/             # REST API endpoints
β”‚   └── models/          # Pydantic schemas
β”œβ”€β”€ frontend/            # Streamlit UI
β”‚   β”œβ”€β”€ pages/           # UI pages
β”‚   └── components/      # Reusable components
β”œβ”€β”€ config/              # Configuration
β”œβ”€β”€ data/                # Data storage
└── utils/               # Utilities

πŸ› οΈ Tech Stack

Component Technology
Backend FastAPI + Python 3.9+
Frontend Streamlit
Embeddings mxbai-embed-large-v1 (Hugging Face)
Vector DB FAISS
LLM Google Gemini
Resume Parsing PyMuPDF, python-docx
Data Processing Pandas, NumPy

πŸš€ Quick Start

Prerequisites

  • Python 3.9 or higher
  • Git
  • Google Gemini API key

1. Clone Repository

git clone https://github.com/yourusername/ai-recruitr.git
cd ai-recruitr

2. Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Set Up Environment Variables

Create a .env file in the project root:

cp .env.example .env

Edit .env and add your API keys:

# Required: Google Gemini API Key
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: Customize settings
API_HOST=localhost
API_PORT=8000
STREAMLIT_HOST=localhost
STREAMLIT_PORT=8501
LOG_LEVEL=INFO

5. Get Your Gemini API Key

  1. Go to Google AI Studio
  2. Create a new API key
  3. Copy and paste it into your .env file

6. Run the Application

Option A: Run Both Services (Recommended)

# Terminal 1: Start FastAPI backend
python -m backend.main

# Terminal 2: Start Streamlit frontend
streamlit run frontend/app.py

Option B: Using Scripts (Windows)

# Start backend
start_backend.bat

# Start frontend  
start_frontend.bat

Option C: Using Scripts (macOS/Linux)

# Start backend
./start_backend.sh

# Start frontend
./start_frontend.sh

7. Access the Application

πŸ“– Usage Guide

1. Upload Resumes

  1. Navigate to "πŸ“„ Upload Resumes" page
  2. Drag and drop PDF/DOCX resume files
  3. Click "πŸš€ Process All Files"
  4. Wait for processing to complete

2. Match Job Descriptions

  1. Go to "πŸ” Job Matching" page
  2. Fill in the job description form:
    • Job title
    • Detailed job description
    • Required skills
    • Experience level
  3. Click "πŸ” Find Matching Resumes"
  4. Review the matching results

3. Analyze Results

  1. Visit "πŸ“Š Results & Analytics" page
  2. View current matching results
  3. Explore analytics and insights
  4. Export data in JSON/CSV format

πŸ”§ Configuration

Environment Variables

Variable Description Default
GEMINI_API_KEY Google Gemini API key Required
API_HOST FastAPI host localhost
API_PORT FastAPI port 8000
STREAMLIT_HOST Streamlit host localhost
STREAMLIT_PORT Streamlit port 8501
LOG_LEVEL Logging level INFO
MAX_FILE_SIZE Max upload size (bytes) 10485760 (10MB)
TOP_K_MATCHES Default max matches 10
SIMILARITY_THRESHOLD Default threshold 0.7

Model Configuration

The system uses:

  • Embedding Model: mixedbread-ai/mxbai-embed-large-v1
  • LLM: gemini-pro
  • Vector Dimension: 1024
  • Max Sequence Length: 512 tokens

πŸ§ͺ API Documentation

Upload Resume

curl -X POST "http://localhost:8000/api/v1/upload-resume" \
     -H "Content-Type: multipart/form-data" \
     -F "file=@resume.pdf"

Match Job Description

curl -X POST "http://localhost:8000/api/v1/match-job" \
     -H "Content-Type: application/json" \
     -d '{
       "job_description": {
         "title": "Senior Python Developer",
         "description": "We are looking for...",
         "skills_required": ["Python", "Django", "PostgreSQL"]
       },
       "top_k": 10,
       "similarity_threshold": 0.7
     }'

Get Resume Count

curl "http://localhost:8000/api/v1/resumes/count"

πŸ“ Project Structure

ai_recruitr/
β”œβ”€β”€ πŸ“ backend/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                 # FastAPI application
β”‚   β”œβ”€β”€ πŸ“ api/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── routes.py           # API endpoints
β”‚   β”œβ”€β”€ πŸ“ models/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── schemas.py          # Pydantic models
β”‚   └── πŸ“ services/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ embedding_service.py # mxbai embeddings
β”‚       β”œβ”€β”€ faiss_service.py    # Vector database
β”‚       β”œβ”€β”€ gemini_service.py   # Gemini LLM
β”‚       └── resume_parser.py    # Resume processing
β”œβ”€β”€ πŸ“ frontend/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py                  # Streamlit main app
β”‚   β”œβ”€β”€ πŸ“ pages/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ upload_resume.py    # Upload interface
β”‚   β”‚   β”œβ”€β”€ job_matching.py     # Matching interface
β”‚   β”‚   └── results.py          # Analytics dashboard
β”‚   └── πŸ“ components/
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── ui_components.py    # Reusable UI components
β”œβ”€β”€ πŸ“ config/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── settings.py             # Configuration
β”œβ”€β”€ πŸ“ data/
β”‚   β”œβ”€β”€ πŸ“ resumes/             # Uploaded resumes
β”‚   β”œβ”€β”€ πŸ“ faiss_index/         # FAISS index files
β”‚   └── πŸ“ processed/           # Processed data
β”œβ”€β”€ πŸ“ utils/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── helpers.py              # Utility functions
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ .env.example               # Environment template
β”œβ”€β”€ .gitignore                 # Git ignore rules
└── README.md                  # This file

🚨 Troubleshooting

Common Issues

1. "GEMINI_API_KEY is required" Error

Problem: Missing or invalid Gemini API key.

Solution:

# Check your .env file
cat .env

# Ensure GEMINI_API_KEY is set
echo $GEMINI_API_KEY

2. FAISS Installation Issues

Problem: FAISS installation fails on some systems.

Solution:

# Try installing CPU version specifically
pip install faiss-cpu==1.7.4

# On macOS with Apple Silicon:
conda install -c pytorch faiss-cpu

3. Resume Text Extraction Fails

Problem: PDF text extraction returns empty content.

Solution:

  • Ensure PDFs are text-based, not scanned images
  • Try converting PDFs to text format first
  • Check file permissions

4. Streamlit Connection Error

Problem: Frontend can't connect to FastAPI backend.

Solution:

# Check if backend is running
curl http://localhost:8000/health

# Verify ports in .env file
grep -E "(API_PORT|STREAMLIT_PORT)" .env

5. Slow Embedding Generation

Problem: Embedding generation takes too long.

Solution:

  • Check if you have GPU available
  • Reduce batch size in processing
  • Consider using smaller embedding model for testing

Debug Mode

Enable debug logging:

# Set in .env
LOG_LEVEL=DEBUG

# Or run with debug
python -m backend.main --log-level DEBUG

πŸ”’ Security Considerations

Production Deployment

  • Change default ports
  • Set up proper CORS origins
  • Use environment-specific API keys
  • Enable HTTPS
  • Implement rate limiting
  • Add authentication
  • Secure file uploads
  • Monitor API usage

Data Privacy

  • Implement data retention policies
  • Add resume deletion functionality
  • Encrypt sensitive data
  • Audit API access
  • Comply with GDPR/privacy laws

πŸš€ Advanced Features

Scaling

  • Database: Replace FAISS with Pinecone/Weaviate for production
  • Caching: Add Redis for embedding caching
  • Queue: Use Celery for async processing
  • Load Balancing: Deploy with multiple API instances

Enhancements

  • Multi-language Support: Add language detection
  • Resume Scoring: Implement comprehensive scoring
  • Bias Detection: Add fairness checking
  • Integration: Connect with LinkedIn, ATS systems
  • Real-time Updates: WebSocket for live updates

🀝 Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/

# Format code
black .
isort .

# Lint code
flake8 .

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“ž Support


Made with ❀️ for smarter recruiting

Python FastAPI Streamlit FAISS License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published