This project enhances strategic workforce planning (SWP) by utilizing AI for resume classification and skill extraction, focusing on analyzing employee resumes to predict job titles and recommend skills. It automates the identification of relevant job titles and extracts skills using Named Entity Recognition (NER). Additionally, it offers predictive analytics to suggest future skill needs based on current trends, facilitating proactive skill gap planning.
- AI-Powered Resume Analysis: Utilizes a pre-trained
spaCymodel for skill extraction. Classifies job titles using Random Forest, XGBoost, and spaCy models, with spaCy offering superior performance. - Skill Gap Identification: Compares extracted skills to role-specific requirements and provides personalized skill recommendations for development.
- Dashboard for HR Managers: Enables data-driven insights for proactive workforce planning and strategic decision-making.
- Python 3.8+: Ensure Python is installed.
- Python Libraries: Install required libraries, including
PyPDF2,pdfplumber,NLTK,spaCy, andscikit-learn. - Optional Software: Jupyter Notebook, PyCharm, or VS Code for development.
-
Clone the Repository:
git clone https://github.com/CoderSoham/Job-Recommendation-using-ML.git cd Job-Recommendation-using-ML -
Install Dependencies: pip install PyPDF2 pdfplumber nltk spacy scikit-learn
-
Download spaCy Model: python -m spacy download en_core_web_sm
-
Prepare Dataset: Organize PDF resumes in folders, each representing a job category.
-
Data Link: https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset
-
Data Preprocessing: Extract resume text using extract_text_from_pdf, clean it with clean_text, and identify skills through extract_user_skills. This step also identifies skill gaps with identify_skill_gap, recommends roles based on skill matches using recommend_roles, and fetches job listings through fetch_jobs_from_adzuna.
-
Training the Models: Train Random Forest, XGBoost, and spaCy models on labeled resumes. Execute my_train_model.py to train models and evaluate their performance. This loads a pre-trained spaCy NER model (skill_ner_model) to detect skills from resumes and role-specific skills from a role_skills.json file, which must be correctly formatted to avoid errors.
-
Testing: Execute my_test.py to perform the following: load a sample PDF resume, process it, and extract skills. It also accepts a target_role input to match extracted skills against role-specific requirements.
-
Output: Displays extracted, required, and missing skills, and recommends suitable roles based on skill overlap. It retrieves relevant job listings from Adzuna for the target role, if available.