Personal projects using NLP techniques.
Project: Pitchfork Rating Prediction
The following project requires python 3.10 and R 4.3 to be installed.
To run the python code in this project, you will first need to install the relevant dependencies. This can be done by executing the following command from the project root:
pip install -r requirements.txt
Also, since this project contains a custom utilities library myutilpy, this must be installed to your environment. To do this, run the following command from the project root:
pip install -e myutilpy
This project contains a few R language jupyter notebooks. To execute these, your R environment must have the dependencies specified in requirements_R.txt installed. This can be done manually for each listed dependency.
Directory: notebooks_pitchfork_ratings
This sequence of notebooks utilizes a Pitchfork reviews dataset of approximately 20K album reviews (mattismegevand/pitchfork). The notebooks cover the following steps:
- Data preprocessing (
01_initial_data_prep). Loading, cleaning, and preprocessing of data. - Exploratory data analysis (
02_data_explore). Visualization and summary statistics of processed dataset. - Model fitting and prediction (
03_rating_pred). Model fitting and saving of model parameters. Also, collection and save-out of performance metrics and test set predictions. - Results analysis (
04_fit_analysis). Post-fit investigation of model performance on test data.