GST Hackathon Project

This repository contains the solution to a binary classification problem for the GST Hackathon. The project includes data preprocessing, exploratory data analysis (EDA), multiple machine learning models, and evaluation metrics used for binary classification. The notebook contains a step-by-step explanation of the process and methodology followed during the project.

Project Overview

This project involves developing machine learning models to classify data from the GST Hackathon dataset. Multiple models, including Logistic Regression, Random Forest, Boosting models (XGBoost, LightGBM, CatBoost), and Deep Learning models (MLP and TabNet), were implemented.

Key Steps:

Data Preprocessing:
- Handling missing values.
- Feature scaling and imputation.
Exploratory Data Analysis (EDA):
- Class distribution visualization.
- Correlation matrix.
- Feature distribution plots.
Machine Learning Models:
- Logistic Regression.
- Random Forest (with feature selection and PCA).
- Boosting models: XGBoost, LightGBM, CatBoost.
- Voting classifiers (hard and soft).
- Deep Learning models (MLP and TabNet).
Model Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1 Score
- AUC-ROC
- Confusion Matrix

Getting Started

To get started with this project, you will need to install the required packages.

Prerequisites

Python Version: 3.8+
Libraries: Listed in the requirements.txt file.

Installation

To install the required dependencies, you can use the following command:

pip install -r requirements.txt

How to Run the Notebook

Clone the repository:
```
git clone <repository-url>
```
Install the required packages:
```
pip install -r requirements.txt
```
Run the Jupyter notebook:
```
jupyter notebook GST_Hackathon.ipynb
```

Results

XGBoost: Achieved an accuracy of 97.63% with an AUC-ROC of 0.9940.
LightGBM: Achieved an accuracy of 97.66% with an AUC-ROC of 0.9941.
CatBoost: Achieved an accuracy of 97.53% with an AUC-ROC of 0.9938.
TabNet: Achieved an accuracy of 97.46% with an AUC-ROC of 0.9924.
MLP: Results showed the deep learning model’s performance was suboptimal compared to boosting models.

Directory Structure

├── GST_Hackathon.ipynb     # Jupyter Notebook with the entire analysis and models
├── README.md               # Project overview and instructions
├── requirements.txt        # List of dependencies

Contributors

Siddharth(GitHub handle: @siddharth7113)

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GST Hackathon Project

Project Overview

Key Steps:

Getting Started

Prerequisites

Installation

How to Run the Notebook

Results

Directory Structure

Contributors

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
GST_Hackathon.ipynb		GST_Hackathon.ipynb
README.md		README.md
requirements.txt		requirements.txt

siddharth7113/GST-Hackathon

Folders and files

Latest commit

History

Repository files navigation

GST Hackathon Project

Project Overview

Key Steps:

Getting Started

Prerequisites

Installation

How to Run the Notebook

Results

Directory Structure

Contributors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages