Sahil Shah

I'm a Master's student at The University of Texas at Austin studying Computer Engineering and Robotics. I am currently advised by Sandeep Chinchali at Swarm Lab working on video understanding. I have also interned at NVIDIA, Tesla, and AWS, working on machine learning and computer architecture.

Email  /  CV  /  Scholar  /  Github  /  LinkedIn

profile photo

Research

I'm interested in developing systems that integrate vision-language models with formal logic to improve the reliability and interpretability of video-based reasoning. My research focuses on constructing pipelines to reason about temporal event sequences for video understanding, generation, and agents by combining deep learning with automata-theoretic methods.

NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning
Sahil Shah, S P Sharan*, Harsh Goel*, Minkyu Choi, Mustafa Munir, Manvik Pasula, Radu Marculescu, Sandeep Chinchali
AAAI, 2026
arXiv

Training-free pipeline that identifies logical event sequences in video, boosting VQA accuracy by over 10% on causal and multi-step reasoning tasks.

A Challenge to Build Neuro-Symbolic Video Agents
Sahil Shah, Harsh Goel, Sai Shankar Narasimhan, Minkyu Choi, S. P. Sharan, Oguzhan Akcin, Sandeep Chinchali
NeuS, 2025
paper / arXiv / code

Combining neuro-symbolic reasoning with video perception enables agents that can interpret, predict, and act on temporal events—not just recognize them.

We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback
Minkyu Choi*, S. P. Sharan*, Harsh Goel, Sahil Shah, Sandeep Chinchali
Submitted to ICLR, 2026
arXiv

Neuro-symbolic feedback enables zero-shot refinement of generated videos, boosting temporal and semantic alignment by nearly 40% without retraining.

Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
S. P. Sharan*, Minkyu Choi*, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali
CVPR, 2025
project page / paper / arXiv / code

Formally verifying a video against a temporal logic specification yields a text-to-video metric that aligns 5× better with human judgment than existing scores.

COFFEE: a High-Performance Approach to Convex Optimization for Thermodynamic Equilibrium Computations
Fu-Yao Yu*, Sahil Shah*, Yash Mittal, Paul Bessler, Aamir Mohsin, Jeffrey Geng, Arnav Vats, David Soloveichik
SIEDS, 2025   (Best Paper)
project page / paper / code

A trust-region convex solver for molecular equilibrium runs 2× faster and 10⁷× more accurately than prior tools, scaling to large biochemistry datasets.

Real-Time Privacy Preservation for Robot Visual Perception
Minkyu Choi*, Yunhao Yang*, Neel P Bhatt*, Kushagra Gupta, Sahil Shah, Aditya Rai, David Fridovich-Keil, Ufuk Topcu, Sandeep Chinchali
TMLR, 2025
arXiv

Blurring objects based on logical specifications enables real-time video privacy with 95%+ compliance, provable guarantees, and seamless robot deployment.

Towards Neuro-Symbolic Video Understanding
Minkyu Choi, Harsh Goel*, Mohammad Omama*, Yunhao Yang, Sahil Shah, Sandeep Chinchali
ECCV, 2024   (Oral Presentation)
project page / paper / arXiv / code

Decoupling perception and temporal reasoning with TL-based state machines boosts long-range event identification by up to 15% on self-driving datasets.


Designed and based on Jon Barron's Website.