Building Data-Driven Occupation Taxonomies: A Bottom-Up Multi-Stage Approach via Semantic Clustering and Multi-Agent Collaboration
- python 3.10
- for arabic text cleaning, you need to install pyarabic and camel-tools (https://github.com/CAMeL-Lab/camel_tools)
conda create --name climb python=3.10
conda activate climb
pip install -r requirements.txt- rename the .env.example to .env and fill in the API keys
unzip the data.zip and put it in the data folder
run src/0.palestine.ipynb
run src/1.botswana.ipynb
run src/2.usa.ipynb
run src/taxonomy_evaluation.py
- prepare the data
- run src/same_occupation_job_pair_sampling.py
- run src/same_occupation_classification.py