This project implements a deep learning approach to estimate blood pressure (Systolic Blood Pressure - SBP and Diastolic Blood Pressure - DBP) from Photoplethysmography (PPG) signals. The workflow is divided into two main stages, each implemented in a dedicated Jupyter notebook.
- Project Structure
- Pipeline Overview
- Stage 1: Data Preparation & Preprocessing
- Stage 2: Model Training, Evaluation & Analysis
- Data Organization
- File Structure
- How to Reproduce
- Notes
├── segment_to_cycle_loader.ipynb # Stage 1: Data preprocessing
├── three_channel_ppg_peak2peak_subjects.ipynb # Stage 2: Model training & analysis
├── data/
│ ├── dataframes/ # Processed DataFrames from Stage 1
│ └── preprocessing_ready_set/ # Pre-processed datasets for direct use
├── models/ # Saved model checkpoints
├── results/ # Evaluation outputs
└── README.md
- Data Preparation & First Stage Preprocessing →
segment_to_cycle_loader.ipynb - Second Stage Preprocessing, Model Training, Evaluation & Analysis →
three_channel_ppg_peak2peak_subjects.ipynb
Notebook: segment_to_cycle_loader.ipynb
- Input: Segmented PPG/ABP data from
.pklfiles, organized by subject and segment - Function:
load_segments_from_directory - Operations:
- Apply length filters (min/max duration)
- Optimize datatypes for memory efficiency
- Merge segments into dictionary:
segments_by_subject_merged
- Purpose: Remove noise and baseline drift from PPG signals
- Method: Bandpass filter application
- Function:
apply_filter_to_segments - Output: Filtered PPG signals replacing original data
- Tool: NeuroKit2 library
- Features: Heart Rate Variability (HRV) metrics from PPG signals
- Storage: Extended segments with
data(DataFrame) andinfo(metadata)
- Metric: Mean
PPG_Qualityscore - Threshold: 0.92 (configurable)
- Action: Remove segments failing quality check
- Criteria: Physiological plausibility of RR intervals
- Range: 0.4s ≤ RR ≤ 1.5s
- Validity: At least 80% valid intervals
- Output:
cleaned_segments_by_subject
- Purpose: Identify valley (bottom) indices in PPG waveform between peaks
- Storage: Indices stored in segment's
infodictionary
- Features: Plot ABP and PPG signals with marked peaks and bottoms
- Purpose: Visual inspection and quality assessment
- Function:
extract_beats_with_raw_and_norm - Process:
- Extract individual beats (peak-to-peak windows)
- Resample PPG windows to fixed length (120 samples)
- Extract SBP (max) and DBP (min) from corresponding ABP window
- Store raw ABP waveform (optional)
- Output Columns:
ppg_norm_120,ppg_raw_120,sbp,dbp,segment_id,abp_raw
- Format: Pickle file in
data/dataframes/directory - Naming: Encodes number of subjects, segments, and rows
Notebook: three_channel_ppg_peak2peak_subjects.ipynb
- Load processed DataFrame from Stage 1
- Support for concatenating multiple DataFrames
- Method: Remove rows with mean ABP outside specified confidence interval
- Purpose: Reduce impact of outliers on model training
- Target: Fixed number of windows per subject (e.g., 1000–1001)
- Purpose: Ensure balanced representation across subjects
- Categories: Normal, Elevated, Stage 1, Stage 2, etc.
- Method: Custom classification rules
- Analysis: Class balance visualization and statistics
- Strategy: Subject-wise splitting to prevent data leakage
- Splits: Train, Validation, Test sets
- Post-processing: Trim splits to match target class distribution
- Structure:
ppg_train,ppg_val,ppg_test: Raw PPG windowsabp_train,abp_val,abp_test: [SBP, DBP] pairs
- Cleaning: Remove NaN values
- Randomization: Shuffle with fixed seeds while maintaining alignment
- Channels:
- PPG: Original signal
- VPG: First derivative (Velocity PPG)
- APG: Second derivative (Acceleration PPG)
- Output Shape:
(N, 3, 120)
- Method: Z-score normalization
- Scope: Each channel normalized independently
- Dataset: Custom
PPGABPDatasetclass - DataLoader: Efficient batching for training and evaluation
PPGtoABPRegressor:
├── Input: 3-channel PPG tensor (3, 120)
├── Conv1D layers with BatchNorm and ReLU
├── Dropout for regularization
├── Flatten and Linear layers
└── Output: 2 values (SBP, DBP)- Optimizer: Adam
- Loss Function: MAE (L1 Loss)
- Monitoring: Training and validation loss tracking
- Visualizations:
- MAE distribution histograms
- Bland–Altman plots (MAP, SBP, DBP)
- Scatter plots (predicted vs. true)
- Calibration: Optional global linear calibration for bias correction
- Models: Saved to
models/directory - Results: Predictions and evaluations saved to
results/
Contains processed DataFrames generated by Stage 1 preprocessing:
- Format:
.pklfiles with encoded naming convention - Naming Pattern:
df_subject_{N}_segment_{M}_row_{R}_peak_by_peak_120_sampled.pklN: Number of unique subjectsM: Number of segments processedR: Total number of beat windows/rows
- Example:
df_subject_5_segment_5_row_8721_peak_by_peak_120_sampled.pkl
Contains pre-processed datasets ready for immediate use in Stage 2 (the results mentioned dataframe for the validation proceed from that dataset):
- Purpose: Skip time-intensive Stage 1 preprocessing
- Not Available Dataset:
df_137_sampled_peak_by_peak_rows_fixed_1000.pkl- 137 subjects with balanced sampling
- Fixed 1000 beat windows per subject
- Please contact with author for using that dataset
- Usage: Load directly in Stage 2 notebook for model training
├── segment_to_cycle_loader.ipynb # Stage 1: Data preprocessing
├── three_channel_ppg_peak2peak_subjects.ipynb # Stage 2: Model training & analysis
├── data/
│ ├── dataframes/ # Generated by Stage 1
│ │ └── df_subject_5_segment_5_row_8721_peak_by_peak_120_sampled.pkl
│ └── preprocessing_ready_set/ # Pre-processed datasets
│ └─ about.txt # Dataset information
|
├── models/ # PyTorch model checkpoints
├── results/ # Evaluation outputs and predictions
└── README.md
The segmentation and preprocessing pipeline considered local subject dataframes under saved_subjects_30/31/32 folders (not included in this export). Due to the extensive data processing requirements, we mentioned for that first data preparation with toy segments which choosen from saved_subjects_32. It's possible to use the pre-processed dataset in data/dataframes/ to start quickly training/validation. However, instead of running 5 segmented subjects, we recommended to run the complete segment_to_cycle_loader.ipynb pipeline, which can produce 19 subject included dataframe for the future use. This will be give complete idea behind how the dataframes are going to proceed.
- Python 3.8+
- CUDA-capable GPU (recommended)
- Required packages (please consider requirements.txt)
-
Load Pre-processed Dataset
import pandas as pd df = pd.read_pickle('data/dataframes/df_subject_5_segment_5_row_8721_peak_by_peak_rows_120_sampled.pkl')
-
Run Stage 2 - Model Training & Analysis
jupyter notebook three_channel_ppg_peak2peak_subjects.ipynb
-
Prepare Raw Data
- Place raw segment
.pklfiles in appropriate directories
- Place raw segment
-
Run Stage 1 - Data Preprocessing
jupyter notebook segment_to_cycle_loader.ipynb
- Follow notebook cells sequentially
- Outputs processed DataFrame to
data/dataframes/
-
Run Stage 2 - Model Training & Analysis
jupyter notebook three_channel_ppg_peak2peak_subjects.ipynb
- Load processed data from Stage 1
- Train model and generate evaluations
-
Review Outputs
- Check
models/for saved model checkpoints - Review
results/for predictions and analysis
- Check
- All processing steps maintain subject-wise separation to prevent data leakage
- Pipeline is modular - parameters can be adjusted for different datasets
- Reproducibility ensured through fixed random seeds
- Memory optimization techniques used for large datasets
- Quality thresholds can be adjusted based on dataset characteristics
- Window lengths and model architecture are configurable
- BP categorization rules can be customized for different clinical standards
- Use CUDA-enabled GPU for faster training
- Adjust batch sizes based on available memory
- Consider data augmentation for smaller datasets
- Recommended: Use pre-processed data from
preprocessing_ready_set/for faster iteration and re-train the overall model architecture via gather same results which mentioned in resulst part.
- Pre-processed datasets in
preprocessing_ready_set/have been validated - metnioned detailed results and validation metrics in evaulation part of
three_channel_ppg_peak2peak_subjects.ipynb - Custom processed data should be validated against known benchmarks