Mohaiminul Islam

Mohaiminul Islam

PhD Student UNC Chapel Hill Research Scientist Intern Meta AI

Biography

I am a final-year PhD student in the Department of Computer Science at UNC Chapel Hill, where I have the privilege of working with Professor Gedas Bertasius. My research focuses on computer vision, video understanding, and multimodal deep learning, with an emphasis on developing efficient models for long-range video analysis. My work has been published in ECCV'22, CVPR'23, ECCV'24, EMNLP'24, and CVPR'25 conferences.

I have interned twice at FAIR, Meta AI and once at Comcast AI, where I worked on multimodal large language models, video agents, and efficient models for long-range video understanding. Before starting my PhD program at UNC, I gained valuable industry experience as a Software Engineer at Samsung R&D Institute.

Download my resumé.

Interests
  • Computer Vision
  • Video Understanding
  • Natural Language Processing
  • Multi-modal large language models
  • Efficient Video Models
Education
  • PhD in Computer Science, 2021-Present

    UNC Chapel Hill

  • MSc in Computer Science, 2021-2023

    UNC Chapel Hill

  • BSc in Computer Science and Engineering, 2014-2018

    Bangladesh University of Engineering and Technology

Recent News

Experience

 
 
 
 
 
FAIR, Meta AI
Research Scientist Intern
May 2024 – Aug 2024 New York
Advisor - Lorenzo Torresani, Tushar Nagarajan, Huiyu Wang
Topic - Multimodal Large Language Models, Efficient Long Video Understanding
Publication - BIMBA (CVPR 2025)
 
 
 
 
 
FAIR, Meta AI
Research Scientist Intern
May 2023 – Aug 2023 Menlo Park, California
Advisor - Xitong Yang, Tushar Nagarajan, Huiyu Wang, Kris Kitani
Topic - Video Agents, Procedural Learning
Publication - VidAssist (ECCV 2024, Oral)
 
 
 
 
 
Comcast AI
Machine Learning Intern
May 2022 – Aug 2022 Virtual
Advisor - Mahmudul Hasan, Tony Braskich
Topic - Scene Detection, Efficient Long-Range Video Models
Publication - TranS4mer (CVPR 2023)
 
 
 
 
 
Lecturer
Apr 2019 – Dec 2020 Bangladesh
 
 
 
 
 
Software Engineer
Nov 2018 – Mar 2019 Bangladesh

Recent Publications

Quickly discover relevant content by filtering publications.
(2024). Video ReCap: Recursive Captioning of Hour-Long Videos. In CVPR 2024.

Cite ArXiv Website Code Dataset HuggingFace

(2024). A Simple LLM Framework for Long-Range Video Question-Answering. In ArXiv 2024.

Cite ArXiv Code

(2024). RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos. In ArXiv 2024.

Cite ArXiv Code

(2023). Efficient Movie Scene Detection using State-Space Transformers. In CVPR 2023.

Cite ArXiv Code

(2022). Long Movie Clip Classification with State-Space Video Models. In ECCV 2022.

Cite ArXiv Code

(2022). COVID-DenseNet: A Deep Learning Architecture to Detect COVID-19 from Chest Radiology Images. In ICDSA 2022.

Cite ArXiv Code

Contact

  • mmiemon@cs.unc.edu
  • Raleigh, North Carolina