A comprehensive AI-powered hiring system that processes resumes and job data, generates embeddings, and provides intelligent matching capabilities using AWS services.
This project consists of multiple components working together to create an intelligent resume-job matching system:
- Resume Data Generation: Generate synthetic resume data for testing
- AWS Lambda Functions: Process data and generate embeddings
- OpenSearch Integration: Store and search vector embeddings
- DynamoDB Storage: Store processed data and matching results
- Similarity Matching: Calculate embeddings similarity between resumes and jobs
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Resume Data │ │ Job Data │ │ AWS Lambda │
│ Generation │ │ Processing │ │ Functions │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ DynamoDB │ │ OpenSearch │ │ Similarity │
│ Tables │ │ Indices │ │ Matching │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Generates three types of resume data (DS, MLE, PM)
- Creates 30 random samples for each type
- Uses OpenAI API for content generation
- Maintains original schema structure
- Processes resume data from DynamoDB
- Generates embeddings using Bedrock
- Indexes to OpenSearch
- Processes job data from DynamoDB
- Generates embeddings using Bedrock
- Indexes to OpenSearch
- Calculates similarities between resume and job chunks
- Updates DynamoDB with matching results
- Provides comprehensive similarity scores
- Various utility scripts for data analysis
- OpenSearch integration helpers
- Similarity calculation utilities
- Install dependencies:
pip install -r requirements.txt- Set up OpenAI API key:
export OPENAI_API_KEY='your-api-key-here'- Configure AWS credentials:
aws configure-
Ensure sample resume files exist in
data/resume/:- ds_sample1.json
- mle_sample1.json
- pm_sample1.json
-
Run the generation script:
python generate_resumes.py- Generated data will be saved in
data/generated_resumes/:- ds_generated_1.json to ds_generated_30.json
- mle_generated_1.json to mle_generated_30.json
- pm_generated_1.json to pm_generated_30.json
- Deploy all Lambda functions:
cd aws-lambda
chmod +x deploy_all.sh
./deploy_all.sh- Or deploy individually:
# Resume Lambda
cd aws-lambda/resume-lambda && ./deploy.sh
# Jobs Lambda
cd aws-lambda/jobs-lambda && ./deploy.sh
# Resume-Jobs Embedding Matching Lambda
cd aws-lambda/resume-jobs-embedding-matching-lambda && ./deploy.sh- Execute the matching script:
python src/resume_jobs_matching.py- Check results in DynamoDB:
resume-jobs-similarity: Raw similarity scoresembedding-filtered-resume-test: Enhanced resume data
-
DynamoDB Tables:
benson-haire-parsed_resume: Source resume datahaire-jobs: Source job dataresume-jobs-similarity: Similarity resultsembedding-filtered-resume-test: Enhanced resumes
-
OpenSearch Indices:
haire-vector-db-resume-chunks-embeddings: Resume embeddingshaire-vector-db-jobs-chunks-embeddings: Job embeddings
-
AWS Lambda Functions:
resume-opensearch-indexing: Process resume datajobs-opensearch-indexing: Process job dataresume-jobs-embedding-matching: Calculate similarities
OPENAI_API_KEY: OpenAI API key for content generationAWS_DEFAULT_REGION: AWS region (default: ap-southeast-1)
cloud_gen_hiring/
├── aws-lambda/ # AWS Lambda functions
│ ├── resume-lambda/ # Resume processing Lambda
│ ├── jobs-lambda/ # Job processing Lambda
│ ├── resume-jobs-embedding-matching-lambda/ # Similarity matching Lambda
│ ├── deploy_all.sh # Deployment script
│ └── README.md # Lambda documentation
├── src/ # Source code and utilities
│ ├── resume_jobs_matching.py # Similarity matching script
│ ├── analyze_resumes.py # Resume analysis utilities
│ └── ... # Other utility scripts
├── data/ # Data files
│ ├── resume/ # Sample resume files
│ ├── generated_resume/ # Generated resume data
│ └── ... # Other data files
├── generate_resumes.py # Resume generation script
├── requirements.txt # Python dependencies
└── README.md # This file
- Each Lambda function has independent CloudWatch Log Group
- Check logs for execution status and errors
- Timeout Errors: Increase Lambda timeout or memory
- Permission Errors: Check IAM role permissions
- OpenSearch Errors: Verify collection capacity and permissions
- API Key Errors: Ensure OpenAI API key is properly configured
- API Keys: Store sensitive keys as environment variables
- IAM Permissions: Follow principle of least privilege
- Data Privacy: Ensure no personal information is exposed
- VPC Configuration: Consider placing Lambda in VPC for enhanced security
- Memory Configuration: Adjust based on data volume
- Timeout Settings: Optimize for processing requirements
- Batch Processing: Consider processing multiple records
- Caching: Implement caching for frequently accessed data
- Follow the existing code structure
- Add appropriate documentation
- Test thoroughly before deployment
- Update README files for any new features
This project is for internal use only. Please ensure compliance with all applicable data protection and privacy regulations.