An intelligent workout tracking system that uses computer vision and machine learning to automatically detect and count exercise repetitions in real-time videos.
WorkoutTracker combines MediaPipe pose detection with a Temporal Convolutional Network (TCN) to perform two main tasks:
- Exercise Classification: Identify the type of exercise (push-ups, squats, pull-ups, dips, no-exercise)
- Repetition Segmentation: Detect and count individual repetitions within the exercise
- Raw Videos → MediaPipe pose extraction → Joint angle features
- Manual Labels → CSV files with repetition start markers
- Gaussian Augmentation → Smooth temporal labels around rep markers
- Multitask Dataset → Combined features + classification + segmentation labels
- TCN Backbone: 8-layer temporal convolutional network with residual connections
- Multi-Head Attention: Captures global temporal dependencies
- Dual Outputs: Classification head + segmentation head
- Balanced Training: Focal loss with class balancing for better recall
WorkoutTracker/
├── data/
│ ├── raw/ # Original exercise videos
│ │ ├── push-ups/ # Push-up videos
│ │ ├── squats/ # Squat videos
│ │ ├── pull-ups/ # Pull-up videos
│ │ ├── dips/ # Dip videos
│ │ └── no-exercise/ # Non-exercise videos
│ ├── labels/ # Manual CSV labels
│ │ ├── push-ups/ # Push-up labels
│ │ ├── squats/ # Squat labels
│ │ ├── pull-ups/ # Pull-up labels
│ │ ├── dips/ # Dip labels
│ │ └── no_exercise/ # No-exercise labels
│ └── processed/ # Generated datasets
│ └── multitask_dataset.npz
├── models/ # Trained models
│ └── main/ # Current best model
│ ├── main.keras # Model weights
│ └── training_history.npy
├── src/
│ ├── core/ # Dataset building
│ │ ├── dataset_builder.py
│ │ └── improved_dataset_builder.py
│ ├── training/ # Model training
│ │ ├── trainer.py
│ │ ├── model.py
│ │ └── balanced_generator.py
│ ├── demo/ # Demo applications
│ │ ├── demo.py
│ │ └── live/
│ └── utils/ # Utilities
│ ├── video_labeler.py
│ └── csv_format_converter.py
├── demo_output/ # Demo results
└── requirements.txt
# Clone the repository
git clone <repository-url>
cd WorkoutTracker
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtPlace your exercise videos in the appropriate directories:
data/raw/push-ups/- Push-up videosdata/raw/squats/- Squat videosdata/raw/pull-ups/- Pull-up videosdata/raw/dips/- Dip videosdata/raw/no-exercise/- Non-exercise videos
Use the video labeling tool to create CSV files with repetition markers:
python src/utils/video_labeler.py data/raw/push-ups/video1.mp4This creates data/labels/push-ups/video1.csv with frame-by-frame labels.
Create the dataset from your labeled videos:
python build_dataset.pyThis generates data/processed/multitask_dataset.npz with:
- Features: (N, 30, 25) - 30-frame windows of 25 joint angles
- Classification labels: (N,) - Exercise type (0-4)
- Segmentation labels: (N, 30) - Per-frame repetition probability
Train the model:
python train.pyThis creates models/main/main.keras with the trained model.
Analyze a video and generate results:
python src/demo/demo.py --video data/test_videos/test0.mp4 --output demo_outputReal-time exercise detection from webcam:
cd src/demo/live
./start_live_demo.shThe multitask_dataset.npz is created by:
- Feature Extraction: MediaPipe pose detection → 25 joint angles per frame
- Temporal Windowing: 30-frame sliding windows (1 second at 30 FPS)
- Label Augmentation: Gaussian smoothing around repetition markers
- Class Balancing: Includes no-exercise samples for better generalization
Instead of binary 0/1 labels, we use Gaussian augmentation:
- Center (rep start): Probability = 1.0
- ±4 frames: Probability = 0.5
- ±12 frames: Probability ≈ 0.1
- Creates smooth temporal patterns for better training
- Total Sequences: ~46,700
- Window Size: 30 frames (1 second)
- Features: 25 joint angles per frame
- Classes: 5 (push-ups, squats, pull-ups, dips, no-exercise)
- Positive Samples: ~9.9% (repetition frames)
- Input: (batch_size, 30, 25) - 30 frames × 25 joint angles
- TCN Backbone: 8 residual blocks with dilated convolutions
- Attention: Multi-head attention for global temporal dependencies
- Outputs:
- Classification: 5 classes (softmax)
- Segmentation: 30 probabilities (sigmoid)
- Optimizer: Adam (lr=5e-4)
- Loss: Focal Loss (γ=1.0, α=0.5) + Binary Crossentropy
- Balanced Sampling: 20% positive, 80% negative samples
- Augmentation: Gaussian label smoothing
- Regularization: Dropout (0.25), Early stopping (patience=20)
Analyzes pre-recorded videos and generates:
- Annotated video with detected repetitions
- Analysis plot showing exercise classification and repetition detection
- Repetition count and confidence scores
Usage:
python src/demo/demo.py --video path/to/video.mp4 --output output_directoryReal-time exercise detection from webcam:
- Live pose detection and skeleton overlay
- Real-time repetition counting
- Exercise type classification
Usage:
cd src/demo/live
./start_live_demo.sh
## 📈 Performance
### Current Model Metrics
- **Classification Accuracy**: ~99.9%
- **Segmentation Precision**: ~97%
- **Segmentation Recall**: ~38%
- **AUC**: ~0.93
### Model Comparison
- **Robust Model**: Better generalization, fewer false positives
- **Gaussian Filtered**: Improved temporal consistency
- **Improved Recall**: Better detection of repetitions
## 🔧 Configuration
### Dataset Building
Edit `build_dataset.py` to modify:
- Window size (default: 30 frames)
- Gaussian augmentation parameters
- No-exercise ratio
### Model Training
Edit `src/training/trainer.py` to modify:
- Model architecture (filters, layers, dropout)
- Loss function parameters
- Training hyperparameters
### Demo Settings
Edit `src/demo/demo.py` to modify:
- Model path
- Output format
- Visualization settings
## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.