1

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
A Simple LLM Framework for Long-Range Video Question-Answering
Efficient Movie Scene Detection using State-Space Transformers