Bioinformatics grad student @ Northeastern University · Building computational tools to bridge biology and data science.
I work at the intersection of genomics, machine learning, and reproducible pipelines — from analyzing sequencing data to building interactive tools that make complex biological results accessible. Currently a Teaching Assistant for Computational Biology at Northeastern.
EGFR Bioactivity Prediction — Ensemble ML pipeline (Random Forest, XGBoost, Neural Net) achieving 94.5% ROC-AUC on 20K+ compounds. Deployed as a Streamlit app with batch prediction for non-technical users.
COVID-19 snRNA-seq Lung Atlas — Integrated 81K nuclei across 27 lung samples using Scanpy & scvi-tools. Identified immune cell subpopulations and 150+ dysregulated genes linked to inflammatory pathways. Fully reproducible with Docker + SLURM.
β-Cell scRNA-seq in Type 2 Diabetes — End-to-end Nextflow pipeline (QC → STAR → featureCounts → DESeq2) profiling β-cell transcriptomes and revealing stress response signatures in T2D.
GWAS Interactive Dashboard — R Shiny dashboard for exploring GWAS results on 282 varieties with Manhattan plots, QQ plots, and population structure analysis for interdisciplinary teams.