Human detection in videos

The objective of the project is to show human presence in a given Youtube video (e.g. the Dior - Eau de Parfum commercial) by drawing bounding boxes around them on each frame.

The retained solution uses the ImageAI library and more specifically its video detection class. This library enables quick usage of several pre-trained Deep Learning models for object detection such as RetinaNet which is found to perform best for this task (especially better than YOLOv3 and its lightweight variant tiny-YOLOv3). Note that such pre-trained models are released by ImageAI at https://github.com/OlafenwaMoses/ImageAI/releases.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. Its execution will generate a copy of the "Dior - Eau de Parfum" video in which detected humans are annotated on each frame.

Prerequisites

There are two options available to get started:

Recommended: use Anaconda 3 and follow Installing with conda
Alternatively, install Python 3.7.6 and pip in a virtual environment and follow Installing with pip

In both cases, installation instructions must be performed at the root of a local copy of this repository:

git clone https://github.com/pauldmk/human_detection_video.git
cd human_detection_video

Installing with conda

Create an environment with all requirements:

conda env create -f requirement.yml

Activate this environment:

conda activate video_detection_aive

Installing with pip

Activate the previously created virtual environment and install requirements using pip:

pip install -r requirements.txt

Execution

The code execution is done in one line:

python src/video_detection.py

It performs the following steps:

Download pre-trained model for image instance detection (RetinaNet with ResNet50 backbone by default).
Download local copy of video
Perform object detection, using GPU if machine has a CUDA enabled GPU available (otherwise it will run on CPU)

Results

RetinaNet and a ResNet50 backbone are found to perform best, and perform annotation in about an hour on a basic CPU.

Several detection threshold were attempted. A threshold of 60% gives visually satisfying results, but it is highly dependent on the use case. Besides, even with this manually tuned threshold, some frames feature false positives, as well as false negatives under challenging conditions (unusual human posture, hidden body parts, distant shot) which would be caught using a lower detection threshold. On the upside, the annotation is overall of great quality, and even performs better than my human eye on some frames (e.g. at 0:17 with a blurred person in the background).

Built with

ImageAI to perform simple video detection
pytube to download the video

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
output		output
src		src
.gitignore		.gitignore
README.md		README.md
requirement.yml		requirement.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human detection in videos

Getting Started

Prerequisites

Installing with conda

Installing with pip

Execution

Results

Built with

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Human detection in videos

Getting Started

Prerequisites

Installing with conda

Installing with pip

Execution

Results

Built with

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages