Mohaiminul Islam

Mohaiminul Islam

PhD Student UNC Chapel Hill Research Scientist Intern Meta AI

Biography

I'm on the job market, looking for industry Research Scientist position! Feel free to connect with me via email!

I am a final-year Ph.D. student in the Department of Computer Science at UNC Chapel Hill, advised by Professor Gedas Bertasius. My research focuses on computer vision, video understanding, and multimodal deep learning, with a particular emphasis on efficient vision-language models, multimodal large language models (MLLMs), and long-range video analysis. My work has been published in top-tier conferences, including ECCV 2022, CVPR 2023, ECCV 2024, EMNLP 2024, and CVPR 2025.

I have completed two research internships at FAIR, Meta AI and one at Comcast AI, where I worked on multimodal large language models, video agents, and efficient models for long-range video understanding. Prior to my Ph.D., I gained valuable industry experience as a Software Engineer at Samsung R&D Institute.

Download my resumé.

Interests
  • Computer Vision
  • Video Understanding
  • Natural Language Processing
  • Multi-modal large language models
  • Efficient Video Models
Education
  • PhD in Computer Science, 2021-Present

    UNC Chapel Hill

  • MSc in Computer Science, 2021-2023

    UNC Chapel Hill

  • BSc in Computer Science and Engineering, 2014-2018

    Bangladesh University of Engineering and Technology

Recent News

Experience

 
 
 
 
 
FAIR, Meta AI
Research Scientist Intern
May 2024 – Aug 2024 New York
Advisor - Lorenzo Torresani, Tushar Nagarajan, Huiyu Wang
Topic - Multimodal Large Language Models, Efficient Long Video Understanding
Publication - BIMBA (CVPR 2025)
 
 
 
 
 
FAIR, Meta AI
Research Scientist Intern
May 2023 – Aug 2023 Menlo Park, California
Advisor - Xitong Yang, Tushar Nagarajan, Huiyu Wang, Kris Kitani
Topic - Video Agents, Procedural Learning
Publication - VidAssist (ECCV 2024, Oral)
 
 
 
 
 
Comcast AI
Machine Learning Intern
May 2022 – Aug 2022 Virtual
Advisor - Mahmudul Hasan, Tony Braskich
Topic - Scene Detection, Efficient Long-Range Video Models
Publication - TranS4mer (CVPR 2023)
 
 
 
 
 
Lecturer
Apr 2019 – Dec 2020 Bangladesh
 
 
 
 
 
Software Engineer
Nov 2018 – Mar 2019 Bangladesh

Recent Publications

Quickly discover relevant content by filtering publications.
(2025). BIMBA: Selective-Scan Compression for Long-Range Video Question Answering. In CVPR 2025.

Cite ArXiv Website Code HuggingFace Demo

(2025). ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos. In CVPR 2025.

Cite ArXiv Code

(2024). Video ReCap: Recursive Captioning of Hour-Long Videos. In CVPR 2024.

Cite ArXiv Website Code Dataset HuggingFace

(2024). Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos. In ECCV 2024 (Oral).

Cite ArXiv Website Video

(2024). RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos. In ECCV 2024.

Cite ArXiv Code

(2024). A Simple LLM Framework for Long-Range Video Question-Answering. In EMNLP 2024.

Cite ArXiv Code

(2023). Efficient Movie Scene Detection using State-Space Transformers. In CVPR 2023.

Cite ArXiv Code

(2022). Long Movie Clip Classification with State-Space Video Models. In ECCV 2022.

Cite ArXiv Code

(2022). COVID-DenseNet: A Deep Learning Architecture to Detect COVID-19 from Chest Radiology Images. In ICDSA 2022.

Cite ArXiv Code

Contact

  • mmiemon [at] cs [dot] unc [dot] edu
  • Raleigh, North Carolina