503Project

Cervical Cancer Risk Prediction

This project is a part of the ADS-503 course in the Applied Data Science Program at the University of San Diego.

Project Status

Active

Installation

This project was completed in RStudio using .Rmd and .R files. To reproduce or run this project:

Clone this repository from GitHub:

Open the R project or .Rmd file in RStudio

Install the required packages (e.g., tidyverse, caret, corrplot, pROC)

Run the script section by section to explore the data, preprocess, split, and train models

Project Intro/Objective

This project has a primary objective of developing a predictive model that predicts the risk of cervical cancer based on behavioral, demographic and clinical risk factors. This work aims to help with early identification and preventive care through the prediction of those who may be able to undergo screening in a timely manner.

We are utilizing the Cervical Cancer Risk Factors dataset from the UCI Machine Learning Repository, which consists of features that include: age, number of sexual partners, contraceptive use, STD history and smoking.

Partner(s)/Contributor(s)

Tanya Ortega
Cynthia Portales-Loebell
Lei Lin

Each member contributed to data cleaning, modeling, evaluation, and documentation. Final responsibilities will include presenting the results and submitting a technical report and summary slide deck

Methods Used

Data Cleaning & Wrangling
Exploratory Data Analysis (EDA)
Predictive Modeling
Logistic Regression
Decision Tree
Random Forest
XGBoost
Model Evaluation (AUC, Accuracy, Recall, etc.)
Data Visualization
Data Splitting (Train/Test)

Technologies

R
RStudio
Tidyverse
ggplot2
caret
pROC

Project Description

We are working with the Cervical Cancer (Risk Factors) Data Set containing 858 records and 36 variables. The dataset includes a variety of binary, categorical, and numerical predictors tied to cervical cancer risk

Dataset Source:

UCI Machine Learning Repository: https://archive.ics.uci.edu/dataset/383/cervical+cancer+risk+factors

Target Variable:

biopsy (1 = positive diagnosis, 0 = negative)

Analysis Focus:

Perform EDA to understand distributions and variable importance
Handle missing values and impute where appropriate
Train classification models to predict positive cervical cancer diagnosis
Compare model performance using common evaluation metrics

License

MIT License: https://creativecommons.org/licenses/by/4.0/legalcode

Acknowledgments

Special thanks to Professor An Tran for guidance and support throughout ADS 503, and to our classmates for their collaboration. Dataset courtesy of the UCI Machine Learning Repository, originally compiled by Hospital Universitario de Caracas.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Final Code.Rmd		Final Code.Rmd
README.md		README.md
cervical cancer data exploration.Rmd		cervical cancer data exploration.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

503Project

Cervical Cancer Risk Prediction

Project Status

Installation

Project Intro/Objective

Partner(s)/Contributor(s)

Methods Used

Technologies

Project Description

Dataset Source:

Target Variable:

Analysis Focus:

License

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

tanyaort/503Project

Folders and files

Latest commit

History

Repository files navigation

503Project

Cervical Cancer Risk Prediction

Project Status

Installation

Project Intro/Objective

Partner(s)/Contributor(s)

Methods Used

Technologies

Project Description

Dataset Source:

Target Variable:

Analysis Focus:

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages