🌌 Pulsar Candidate Classification from Scratch

This repository documents an academic project focusing on the classification of pulsar candidates from astronomical survey data. The goal is to differentiate true pulsars from spurious signals (Radio Frequency Interference/Noise).

🌟 Project Goal & Methodology

The objective is to accurately perform binary classification on highly imbalanced astronomical data. The methodology includes:

Z-Normalization: All eight input features are standardized to handle differing scales, means, and variances.
Comparative Modeling: A wide range of statistical and machine learning classifiers were implemented and rigorously tested.
Cost-Sensitive Evaluation: The models are evaluated primarily using the Minimum Detection Cost Function (minDCF) across various effective prior probabilities (π), recognizing the high cost associated with missing a true, rare pulsar (a false negative).

📊 Dataset Overview

The project uses the HTRU2 dataset, which contains statistical features derived from pulsar candidates collected during the High Time Resolution Universe Survey.

Feature	Description
Profile Statistics (4)	Mean, Standard Deviation, Excess Kurtosis, and Skewness of the integrated pulse profile.
DM-SNR Curve Statistics (4)	Mean, Standard Deviation, Excess Kurtosis, and Skewness of the DM-SNR curve.

The dataset is highly imbalanced, with 16,259 spurious examples (RFI/noise) and 1,639 real pulsar examples.

🧠 Implemented Classification Models

The following models were developed and analyzed for their performance, particularly focusing on their efficacy in low-prior conditions (π=0.1):

Multi-Variate Gaussian (MVG) Classifiers: Tested various covariance types, including the robust Tied Full Covariance model.
Linear Logistic Regression (LR): Implemented with different regularization strengths (λ) to control model complexity and overfitting.
Support Vector Machines (SVM): Utilized a Linear Kernel with balancing techniques and hyperparameter tuning (C) to handle the skewed dataset.
Gaussian Mixture Models (GMM): Explored models with complex structures (e.g., Full Covariance with 8 components) to capture non-linear class boundaries.

📈 Key Results and Findings

The analysis successfully identified the most effective classifiers for minimizing the detection cost in a rare-event search scenario.

Model	minDCF (π=0.5)	minDCF (π=0.1)	minDCF (π=0.9)
MVG Tied Full Cov	0.109	0.207	0.590
Linear LR (λ=0)	0.107	0.198	0.542
Linear SVM (C=0 bal.)	0.104	0.197	0.530
GMM Full Cov 8	0.105	0.197	0.535

Overall Best Performance: Linear SVM with C=0 (balanced) and GMM Full Covariance (8 components) consistently achieved the lowest Minimum Detection Cost across all tested prior probabilities, demonstrating superior robustness and accuracy for this challenging, imbalanced dataset.

🛠️ Technology Stack & Requirements

Language: Python 3.x
Core Libraries: numpy, pandas, matplotlib

📄 Full Analysis

A complete explanation of the mathematical models, parameter choices, experimental procedures, and conclusions is available in the project's presentation:
pulsar.pdf

🤝 Contributing

This is a completed academic project; however, feedback on the methodology or analysis is welcome.
Project by Gabriele Cassetta

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code		code
img		img
README.md		README.md
img1.png		img1.png
img2.png		img2.png
pulsar.pdf		pulsar.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌌 Pulsar Candidate Classification from Scratch

🌟 Project Goal & Methodology

📊 Dataset Overview

🧠 Implemented Classification Models

📈 Key Results and Findings

🛠️ Technology Stack & Requirements

📄 Full Analysis

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

merhametsize/pulsar

Folders and files

Latest commit

History

Repository files navigation

🌌 Pulsar Candidate Classification from Scratch

🌟 Project Goal & Methodology

📊 Dataset Overview

🧠 Implemented Classification Models

📈 Key Results and Findings

🛠️ Technology Stack & Requirements

📄 Full Analysis

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages