Publications

(2025). BIMBA: Selective-Scan Compression for Long-Range Video Question Answering. In CVPR 2025.

Cite ArXiv Website Code HuggingFace Demo

(2025). ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos. In CVPR 2025.

Cite ArXiv Code

(2024). Video ReCap: Recursive Captioning of Hour-Long Videos. In CVPR 2024.

Cite ArXiv Website Code Dataset HuggingFace

(2024). Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos. In ECCV 2024.

Cite ArXiv Website Video

(2024). RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos. In ECCV 2024.

Cite ArXiv Code

(2024). A Simple LLM Framework for Long-Range Video Question-Answering. In EMNLP 2024.

Cite ArXiv Code

(2023). Efficient Movie Scene Detection using State-Space Transformers. In CVPR 2023.

Cite ArXiv Code

(2022). Long Movie Clip Classification with State-Space Video Models. In ECCV 2022.

Cite ArXiv Code

(2022). COVID-DenseNet: A Deep Learning Architecture to Detect COVID-19 from Chest Radiology Images. In ICDSA 2022.

Cite ArXiv Code