AI-powered recruitment system that uses semantic matching to find the best-fit candidates using FAISS vector database and Gemini AI.
- π― Semantic Resume Matching - Goes beyond keyword matching using AI embeddings
- π Fast Vector Search - Lightning-fast similarity search with FAISS
- π€ AI-Powered Explanations - Gemini AI generates detailed match explanations
- π Multi-Format Support - Process PDF, DOCX, and TXT resume files
- π‘ Clean Architecture - Modular microservices design with FastAPI + Streamlit
- π Analytics Dashboard - Comprehensive insights and matching analytics
- β‘ Real-Time Processing - Instant resume processing and matching
- π Advanced Filtering - Filter by skills, experience, location, and more
ai_recruitr/
βββ backend/ # FastAPI microservices
β βββ services/ # Core business logic
β βββ api/ # REST API endpoints
β βββ models/ # Pydantic schemas
βββ frontend/ # Streamlit UI
β βββ pages/ # UI pages
β βββ components/ # Reusable components
βββ config/ # Configuration
βββ data/ # Data storage
βββ utils/ # Utilities
| Component | Technology |
|---|---|
| Backend | FastAPI + Python 3.9+ |
| Frontend | Streamlit |
| Embeddings | mxbai-embed-large-v1 (Hugging Face) |
| Vector DB | FAISS |
| LLM | Google Gemini |
| Resume Parsing | PyMuPDF, python-docx |
| Data Processing | Pandas, NumPy |
- Python 3.9 or higher
- Git
- Google Gemini API key
git clone https://github.com/yourusername/ai-recruitr.git
cd ai-recruitr# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatepip install -r requirements.txtCreate a .env file in the project root:
cp .env.example .envEdit .env and add your API keys:
# Required: Google Gemini API Key
GEMINI_API_KEY=your_gemini_api_key_here
# Optional: Customize settings
API_HOST=localhost
API_PORT=8000
STREAMLIT_HOST=localhost
STREAMLIT_PORT=8501
LOG_LEVEL=INFO- Go to Google AI Studio
- Create a new API key
- Copy and paste it into your
.envfile
# Terminal 1: Start FastAPI backend
python -m backend.main
# Terminal 2: Start Streamlit frontend
streamlit run frontend/app.py# Start backend
start_backend.bat
# Start frontend
start_frontend.bat# Start backend
./start_backend.sh
# Start frontend
./start_frontend.sh- Streamlit UI: http://localhost:8501
- FastAPI Docs: http://localhost:8000/docs
- API Health: http://localhost:8000/health
- Navigate to "π Upload Resumes" page
- Drag and drop PDF/DOCX resume files
- Click "π Process All Files"
- Wait for processing to complete
- Go to "π Job Matching" page
- Fill in the job description form:
- Job title
- Detailed job description
- Required skills
- Experience level
- Click "π Find Matching Resumes"
- Review the matching results
- Visit "π Results & Analytics" page
- View current matching results
- Explore analytics and insights
- Export data in JSON/CSV format
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY |
Google Gemini API key | Required |
API_HOST |
FastAPI host | localhost |
API_PORT |
FastAPI port | 8000 |
STREAMLIT_HOST |
Streamlit host | localhost |
STREAMLIT_PORT |
Streamlit port | 8501 |
LOG_LEVEL |
Logging level | INFO |
MAX_FILE_SIZE |
Max upload size (bytes) | 10485760 (10MB) |
TOP_K_MATCHES |
Default max matches | 10 |
SIMILARITY_THRESHOLD |
Default threshold | 0.7 |
The system uses:
- Embedding Model:
mixedbread-ai/mxbai-embed-large-v1 - LLM:
gemini-pro - Vector Dimension: 1024
- Max Sequence Length: 512 tokens
curl -X POST "http://localhost:8000/api/v1/upload-resume" \
-H "Content-Type: multipart/form-data" \
-F "file=@resume.pdf"curl -X POST "http://localhost:8000/api/v1/match-job" \
-H "Content-Type: application/json" \
-d '{
"job_description": {
"title": "Senior Python Developer",
"description": "We are looking for...",
"skills_required": ["Python", "Django", "PostgreSQL"]
},
"top_k": 10,
"similarity_threshold": 0.7
}'curl "http://localhost:8000/api/v1/resumes/count"ai_recruitr/
βββ π backend/
β βββ __init__.py
β βββ main.py # FastAPI application
β βββ π api/
β β βββ __init__.py
β β βββ routes.py # API endpoints
β βββ π models/
β β βββ __init__.py
β β βββ schemas.py # Pydantic models
β βββ π services/
β βββ __init__.py
β βββ embedding_service.py # mxbai embeddings
β βββ faiss_service.py # Vector database
β βββ gemini_service.py # Gemini LLM
β βββ resume_parser.py # Resume processing
βββ π frontend/
β βββ __init__.py
β βββ app.py # Streamlit main app
β βββ π pages/
β β βββ __init__.py
β β βββ upload_resume.py # Upload interface
β β βββ job_matching.py # Matching interface
β β βββ results.py # Analytics dashboard
β βββ π components/
β βββ __init__.py
β βββ ui_components.py # Reusable UI components
βββ π config/
β βββ __init__.py
β βββ settings.py # Configuration
βββ π data/
β βββ π resumes/ # Uploaded resumes
β βββ π faiss_index/ # FAISS index files
β βββ π processed/ # Processed data
βββ π utils/
β βββ __init__.py
β βββ helpers.py # Utility functions
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
βββ .gitignore # Git ignore rules
βββ README.md # This file
Problem: Missing or invalid Gemini API key.
Solution:
# Check your .env file
cat .env
# Ensure GEMINI_API_KEY is set
echo $GEMINI_API_KEYProblem: FAISS installation fails on some systems.
Solution:
# Try installing CPU version specifically
pip install faiss-cpu==1.7.4
# On macOS with Apple Silicon:
conda install -c pytorch faiss-cpuProblem: PDF text extraction returns empty content.
Solution:
- Ensure PDFs are text-based, not scanned images
- Try converting PDFs to text format first
- Check file permissions
Problem: Frontend can't connect to FastAPI backend.
Solution:
# Check if backend is running
curl http://localhost:8000/health
# Verify ports in .env file
grep -E "(API_PORT|STREAMLIT_PORT)" .envProblem: Embedding generation takes too long.
Solution:
- Check if you have GPU available
- Reduce batch size in processing
- Consider using smaller embedding model for testing
Enable debug logging:
# Set in .env
LOG_LEVEL=DEBUG
# Or run with debug
python -m backend.main --log-level DEBUG- Change default ports
- Set up proper CORS origins
- Use environment-specific API keys
- Enable HTTPS
- Implement rate limiting
- Add authentication
- Secure file uploads
- Monitor API usage
- Implement data retention policies
- Add resume deletion functionality
- Encrypt sensitive data
- Audit API access
- Comply with GDPR/privacy laws
- Database: Replace FAISS with Pinecone/Weaviate for production
- Caching: Add Redis for embedding caching
- Queue: Use Celery for async processing
- Load Balancing: Deploy with multiple API instances
- Multi-language Support: Add language detection
- Resume Scoring: Implement comprehensive scoring
- Bias Detection: Add fairness checking
- Integration: Connect with LinkedIn, ATS systems
- Real-time Updates: WebSocket for live updates
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Format code
black .
isort .
# Lint code
flake8 .This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for mxbai embeddings
- Google for Gemini LLM
- Facebook Research for FAISS
- FastAPI team
- Streamlit team
- π§ Email: support@ai-recruitr.com
- π¬ Discord: AI Recruitr Community
- π Issues: GitHub Issues
- π Documentation: Full Docs
Made with β€οΈ for smarter recruiting