Vaibhav Agrawal

I am a computer vision researcher. I am currently a master's student at CVIT (Center for Visual Information Technology), IIIT-Hyderabad, where I am fortunate to be advised by Ravi Kiran S (from CVIT) and co-advised by Venkatesh Babu Radhakrishnan (from Vision and AI Lab, IISc Bangalore). I am interested in computer vision and generative models. Recently I have focused on controlling generative models. I am also interested in the broader applications of generative models for various perceptual tasks in computer vision.

When I am not working, I am usually listening to music or playing the piano. I love receiving emails and messages! Hence, feel free to reach out to me in case you have any questions or suggestions about research.

Research

I am interested in computer vision, broadly. Recently I have been working on controlling generative models. I am also interested in the broader applications of generative models for various perceptual tasks in computer vision.

* denotes equal contribution.

Compass Control: Multi-Object Orientation Control for Text-to-Image Generation

Rishubh Parihar*, Vaibhav Agrawal*, Sachidanand VS, Venkatesh Babu Radhakrishnan

CVPR 2025

project page arXiv

A method for multi-object orientation control in T2I generation. We learn an encoder that can map an input 3D pose to a special token to control an object, and show that the attention maps of these tokens can be constrained spatially to enable disentangled multi-object control.

LineTR: Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts

Vaibhav Agrawal, Niharika Vadlamudi, Muhammad Waseem, Amal Joseph, Sreenya Chitluri, Ravi Kiran Sarvadevabhatla

ICPR 2024

project page paper

Treating text-line detection as a segmentation problem discards various inductive priors, and hampers OOD generalization. We parametrize text-lines as geometric structures, and learn a DETR-style network to predict these parameters instead.

Towards Global Localization Using Multi-Modal Object-Instance Re-Identification

Aneesh Chavan, Vaibhav Agrawal*, Vineeth Bhat*, Sarthak Chittawar*, Siddharth Srivastava, Chetan Arora, K Madhava Krishna

Advances in Robotics 2025

paper

We train a multi-modal (depth + RGB) network to perform the object re-identification task. Modality dropout ensures robustness to failure of one of the modalities. We demonstrate application of the model in a downstream SLAM pipeline.

Vaibhav Agrawal

Updates

Research