Featured Projects
MTA: Multimodal Task Alignment
MTA is a novel multimodal task alignment framework that boosts BEV perception and captioning. MTA enforces alignment through multimodal contextual learning and cross-modal prompting mechanisms. Arxiv.
PaPr: Patch Pruning for Faster Inference
PaPr is a novel background patch pruning method that can seamlessly operate with ViTs for faster inference (>2x). PaPr is a training-free approach and can be easily plugged into existing token pruning methods for further efficiency. ECCV 2024.