This repository contains the solution to a binary classification problem for the GST Hackathon. The project includes data preprocessing, exploratory data analysis (EDA), multiple machine learning models, and evaluation metrics used for binary classification. The notebook contains a step-by-step explanation of the process and methodology followed during the project.
This project involves developing machine learning models to classify data from the GST Hackathon dataset. Multiple models, including Logistic Regression, Random Forest, Boosting models (XGBoost, LightGBM, CatBoost), and Deep Learning models (MLP and TabNet), were implemented.
-
Data Preprocessing:
- Handling missing values.
- Feature scaling and imputation.
-
Exploratory Data Analysis (EDA):
- Class distribution visualization.
- Correlation matrix.
- Feature distribution plots.
-
Machine Learning Models:
- Logistic Regression.
- Random Forest (with feature selection and PCA).
- Boosting models: XGBoost, LightGBM, CatBoost.
- Voting classifiers (hard and soft).
- Deep Learning models (MLP and TabNet).
-
Model Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1 Score
- AUC-ROC
- Confusion Matrix
To get started with this project, you will need to install the required packages.
- Python Version: 3.8+
- Libraries: Listed in the
requirements.txtfile.
To install the required dependencies, you can use the following command:
pip install -r requirements.txt- Clone the repository:
git clone <repository-url>
- Install the required packages:
pip install -r requirements.txt
- Run the Jupyter notebook:
jupyter notebook GST_Hackathon.ipynb
- XGBoost: Achieved an accuracy of 97.63% with an AUC-ROC of 0.9940.
- LightGBM: Achieved an accuracy of 97.66% with an AUC-ROC of 0.9941.
- CatBoost: Achieved an accuracy of 97.53% with an AUC-ROC of 0.9938.
- TabNet: Achieved an accuracy of 97.46% with an AUC-ROC of 0.9924.
- MLP: Results showed the deep learning model’s performance was suboptimal compared to boosting models.
├── GST_Hackathon.ipynb # Jupyter Notebook with the entire analysis and models
├── README.md # Project overview and instructions
├── requirements.txt # List of dependencies
Siddharth(GitHub handle: @siddharth7113)
This project is licensed under the MIT License.