Update

About

I am a Research Associate at the University of Bristol and a member of the MaVi research group. My current research is on 4D video understanding, aiming to develop systems that can perceive and reason about dynamic 3D scenes over time.

Previously, as a PhD researcher of Computer Vision at the University of Bristol, supervised by Prof. Dima Damen, my research focus was on ging multimodal data for egocentric video understanding. This included topics such as audio-visual deep learning, action recognition/detection, predicting object-interactions using eye-gaze and 3D annotations, and long-term 3D multi-object tracking. During this time, I was also a PhD intern with the Visual Representation Learning team at Naver Labs Europe.

Prior to my PhD, I earned a First Class Honours MEng in Computer Science from the University of Bristol, where my dissertation on "Video GANs for Human-Object Interactions" was highly graded. Alongside research, I've gained teaching experience across multiple undergraduate modules, contributing to both coursework design and lab-based support.

I've worked across a range of projects involving large-scale multimodal datasets and model development, contributing to research outputs such as HD-EPIC, EPIC-Sounds, TIM, and OSNOM (see Research). My experience spans dataset construction, multimodal model design, and open-source codebase development.

My technical strengths lie in deep learning, computer vision, and multimodal modelling, with extensive experience in Python (PyTorch) and capabilities with C++ and Javascript.

Email: jacob.chalk@bristol.ac.uk


News


Research

Current list of all research projects:

HD-EPIC: A Highly-Detailed Egocentric Video Dataset
Toby Perrett*, Ahmad Darkhalil*, Saptarshi Sinha*, Omar Emara*, Sam Pollard*, Kranti Parida*, Kaiting Liu*, Prajwal Gatti*, Siddhant Bansal*, Kevin Flanagan*, Jacob Chalk*, Zhifan Zhu*, Rhodri Guerrier*, Fahd Abdelazim*, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, Dima Damen
*: Equal Contribution
Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[Webpage] [arXiv] [Code]
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
Chiara Plizzari, Shubham Goel, Toby Perrett, Jacob Chalk, Angjoo Kanazawa, Dima Damen
International Conference on 3D Vision (3DV), 2025
[Webpage] [arXiv] [Code]
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk*, Jaesung Huh*, Evangelos Kazakos, Andrew Zisserman, Dima Damen
*: Equal Contribution
Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Webpage] [arXiv] [Code]
EPIC-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh*, Jacob Chalk*, Evangelos Kazakos, Dima Damen, Andrew Zisserman
*: Equal Contribution
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
[Webpage] [arXiv] [Code]


Teaching


Miscellaneous

Presentations
Conference Reviewer
Journal Reviewer
Honours and Awwards