CVEE is an AI-powered job matching system that leverages embeddings to find the most relevant job opportunities based on uploaded CVs. It integrates data from the France Travail API, processes job descriptions using sentence transformers, and stores embeddings in a PostgreSQL database with pgvector for efficient similarity searches.
End-to-end CV parsing and real-time vector matching
- Job Data Ingestion: Fetches job listings from the France Travail API and stores them in a structured format.
- Embedding Generation: Uses the BAAI/bge-small-en model to create 384-dimensional embeddings for job descriptions and CVs.
- Vector Search: Performs cosine similarity searches on embeddings to match CVs with jobs.
- Web Interface: A Streamlit-based UI for uploading PDFs and viewing top matching jobs.
- Scalable Deployment: Kubernetes manifests for containerized deployment with PostgreSQL, API, and UI services.
- Data Pipeline: Automated workflows using CronJobs for data ingestion and synchronization.
The system is built on a modular microservices architecture:
| Component | Technology | Description |
|---|---|---|
| API | FastAPI | Handles CV text extraction, embedding generation, and search queries. |
| UI | Streamlit | Frontend for user interaction and result display. |
| Database | PostgreSQL + pgvector | Stores structured job data and high-dimensional vectors. |
| Processing | Jupyter/Databricks | Notebooks for Silver/Gold data layers (ingestion & embedding). |
| Storage | AWS S3 | Remote storage for raw and processed datasets. |
| Orchestration | Kubernetes | Manages container lifecycles and service scaling. |
Data Pipeline and System Architecture
- Python 3.11+
- Docker
- Kubernetes cluster (e.g., Minikube for local testing)
- AWS account with S3 bucket
- Databricks account
- France Travail API credentials
-
Clone the repository:
git clone https://github.com/timotheeCloup/CVEE.git cd CVEE -
Install dependencies:
pip install -r requirements.txt
-
Environment Variables: Set up environment variables in a
.envfile (see example insync_s3_to_postgres.py).
-
Apply secrets and namespace:
kubectl apply -f k8s/namespace.yaml kubectl apply -f k8s/secrets.yaml
-
Deploy services:
kubectl apply -f k8s/
-
Access the UI: Open
http://<node-ip>:30081in your browser.
- Upload a PDF CV via the Streamlit UI.
- The API extracts text, generates embeddings, and queries the database for top matches.
- View results with direct links to job postings on France Travail.
Contributions are welcome! Please open issues or submit pull requests to help improve the portability and features of CVEE.