Vaibhav Agrawal

I am a computer vision researcher and an incoming pre-doctoral researcher at CISPA Helmholtz Center, advised by Adam Kortylewski. I am interested in predictive vision models (e.g., generative models, JEPA, etc.). Currently, my research focuses on introducing control mechanisms in diffusion models, and leveraging them for perceptual tasks. My work in these areas has been presented as first-author papers at top-tier CV conferences.

Previously, I spent beautiful years at CVIT, IIIT Hyderabad , where I was fortunate to be advised by Ravi Kiran S and co-advised by Venkatesh Babu R (from IISc Bengaluru).

When I am not working, I am usually listening to music or playing the piano. I love receiving emails and messages! Hence, feel free to reach out to me in case you have any questions or suggestions about research.

Vaibhav Agrawal

Updates

Publications

* denotes equal contribution, † denotes equal advising.

PartConcepts PartConcepts PartConcepts
PartConcepts: A Unified Mechanism for Fine-Grained Part-Level Localization and Generation
Under peer review, selected for presentation at CVPR 2026 GCV workshop

Finetuning a T2I diffusion model's text encoder for strong attention localization of part-level concepts (e.g., 'left-front leg'); this leads to accurate part-level instance segmentation and generative control.

SeeThrough3D SeeThrough3D SeeThrough3D
SeeThrough3D: Occlusion Aware 3D-Control in Text-to-Image Generation
CVPR 2026, also selected for presentation at CVPR 2026 GCV and MUSI workshops, and as a CVPR 2026 demonstration

A 3D bounding box layout consisting of translucent boxes effectively models occluding scene regions; this representation is used to spatially condition a T2I model to enable occlusion aware 3D control.

Compass Control Compass Control Compass Control
Compass Control: Multi-Object Orientation Control for Text-to-Image Generation
CVPR 2025

A continuous textual token is learnt to model generalized object orientation, and multiple such tokens can be combined to enable disentangled multi-object control, using attention based disentanglement.

LineTR LineTR LineTR
LineTR: Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts
Vaibhav Agrawal, Niharika Vadlamudi, Muhammad Waseem, Amal Joseph, Sreenya Chitluri, Ravi Kiran Sarvadevabhatla
ICPR 2024

A line-based parametrization over pixel-wise segmentation enables highly accurate text-line prediction. Further, a context-adaptive patching scheme enables generalization to arbitrary documents.

Towards Global Localization Using Multi-Modal Object-Instance Re-Identification
Aneesh Chavan, Vaibhav Agrawal*, Vineeth Bhat*, Sarthak Chittawar*, Siddharth Srivastava, Chetan Arora, K Madhava Krishna
Advances in Robotics 2025

A multi-modal (visual + depth) re-identification improves SLAM performance by effectively leveraging visual cues.