Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism

Mohaiminul Islam, Gedas Bertasius

March, 2022

Abstract

This report describes our submission (TarHeels) for the Ego4D Object State Change Classification Challenge. We use a transformer-based video recognition model and leverage the Divided Space-Time Attention mechanism for classifying object state change in egocentric videos. Our submission achieves the second-best performance in the challenge. Furthermore, we perform an ablation study to show that identifying object state change in egocentric videos requires temporal modeling ability. Lastly, we present several positive and negative examples to visualize our model’s predictions. The code is publicly available.

Type

Report

Publication

In Ego4D Workshop, CVPR 2022

Mohaiminul Islam

PhD Student UNC Chapel Hill Research Scientist Intern Meta AI

My research interests include computer vision, video understanding, and multi-modal deep learning.