Mohaiminul
Mohaiminul
Home
News
Experience
Publications
Contact
CV
Light
Dark
Automatic
1
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Mohaiminul Islam
,
Tushar Nagarajan
,
Huiyu Wang
,
Gedas Bertasius
,
Lorenzo Torresani
Cite
ArXiv
Website
Code
HuggingFace
Demo
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan
,
Mohaiminul Islam
,
Jindong Gu
,
Thomas Seidl
,
Gedas Bertasius
Cite
ArXiv
Code
Video ReCap: Recursive Captioning of Hour-Long Videos
Mohaiminul Islam
,
Ngan Ho
,
Xitong Yang
,
Tushar Nagarajan
,
Lorenzo Torresani
,
Gedas Bertasius
Cite
ArXiv
Website
Code
Dataset
HuggingFace
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman
,
Mohaiminul Islam
,
et al
Cite
ArXiv
Website
Blog
Video
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Mohaiminul Islam
,
Tushar Nagarajan
,
Huiyu Wang
,
Fu-Jen Chu
,
Kris Kitani
,
Gedas Bertasius
,
Xitong Yang
Cite
ArXiv
Website
Video
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
,
Mohaiminul Islam
,
Thomas Seidl
,
Gedas Bertasius
Cite
ArXiv
Code
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
,
Taixi Lu
,
Mohaiminul Islam
,
Ziyang Wang
,
Shoubin Yu
,
Mohit Bansal
,
Gedas Bertasius
Cite
ArXiv
Code
Efficient Movie Scene Detection using State-Space Transformers
Mohaiminul Islam
,
Mahmudul Hasan
,
Kishan Shamsundar Athrey
,
Tony Braskich
,
Gedas Bertasius
Cite
ArXiv
Code
Long Movie Clip Classification with State-Space Video Models
Mohaiminul Islam
,
Gedas Bertasius
Cite
ArXiv
Code
COVID-DenseNet: A Deep Learning Architecture to Detect COVID-19 from Chest Radiology Images
Mohaiminul Islam
,
Tanveer Hannan
,
Laboni Sarker
,
Zakaria Ahmed
Cite
ArXiv
Code
Cite
×