Skip to content

siddharth7113/GST-Hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

GST Hackathon Project

This repository contains the solution to a binary classification problem for the GST Hackathon. The project includes data preprocessing, exploratory data analysis (EDA), multiple machine learning models, and evaluation metrics used for binary classification. The notebook contains a step-by-step explanation of the process and methodology followed during the project.

Project Overview

This project involves developing machine learning models to classify data from the GST Hackathon dataset. Multiple models, including Logistic Regression, Random Forest, Boosting models (XGBoost, LightGBM, CatBoost), and Deep Learning models (MLP and TabNet), were implemented.

Key Steps:

  1. Data Preprocessing:

    • Handling missing values.
    • Feature scaling and imputation.
  2. Exploratory Data Analysis (EDA):

    • Class distribution visualization.
    • Correlation matrix.
    • Feature distribution plots.
  3. Machine Learning Models:

    • Logistic Regression.
    • Random Forest (with feature selection and PCA).
    • Boosting models: XGBoost, LightGBM, CatBoost.
    • Voting classifiers (hard and soft).
    • Deep Learning models (MLP and TabNet).
  4. Model Evaluation Metrics:

    • Accuracy
    • Precision
    • Recall
    • F1 Score
    • AUC-ROC
    • Confusion Matrix

Getting Started

To get started with this project, you will need to install the required packages.

Prerequisites

  1. Python Version: 3.8+
  2. Libraries: Listed in the requirements.txt file.

Installation

To install the required dependencies, you can use the following command:

pip install -r requirements.txt

How to Run the Notebook

  1. Clone the repository:
    git clone <repository-url>
  2. Install the required packages:
    pip install -r requirements.txt
  3. Run the Jupyter notebook:
    jupyter notebook GST_Hackathon.ipynb

Results

  • XGBoost: Achieved an accuracy of 97.63% with an AUC-ROC of 0.9940.
  • LightGBM: Achieved an accuracy of 97.66% with an AUC-ROC of 0.9941.
  • CatBoost: Achieved an accuracy of 97.53% with an AUC-ROC of 0.9938.
  • TabNet: Achieved an accuracy of 97.46% with an AUC-ROC of 0.9924.
  • MLP: Results showed the deep learning model’s performance was suboptimal compared to boosting models.

Directory Structure

├── GST_Hackathon.ipynb     # Jupyter Notebook with the entire analysis and models
├── README.md               # Project overview and instructions
├── requirements.txt        # List of dependencies

Contributors

Siddharth(GitHub handle: @siddharth7113)

License

This project is licensed under the MIT License.

About

Kaggle file for GST Hackathon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published