Sahil Shah

I'm a Master's student at The University of Texas at Austin studying Computer Engineering and Robotics. I am currently advised by Sandeep Chinchali at Swarm Lab working on video understanding. I have also interned at NVIDIA, Tesla, and AWS, working on machine learning and computer architecture.

Email / CV / Scholar / Github / LinkedIn

Research

I'm interested in developing systems that integrate vision-language models with formal logic to improve the reliability and interpretability of video-based reasoning. My research focuses on constructing pipelines to reason about temporal event sequences for video understanding, generation, and agents by combining deep learning with automata-theoretic methods.

	NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning Sahil Shah, S P Sharan, Harsh Goel, Minkyu Choi, Mustafa Munir, Manvik Pasula, Radu Marculescu, Sandeep Chinchali AAAI, 2026 arXiv Training-free pipeline that identifies logical event sequences in video, boosting VQA accuracy by over 10% on causal and multi-step reasoning tasks.
	A Challenge to Build Neuro-Symbolic Video Agents Sahil Shah, Harsh Goel, Sai Shankar Narasimhan, Minkyu Choi, S. P. Sharan, Oguzhan Akcin, Sandeep Chinchali NeuS, 2025 paper / arXiv / code Combining neuro-symbolic reasoning with video perception enables agents that can interpret, predict, and act on temporal events—not just recognize them.
	We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback Minkyu Choi, S. P. Sharan, Harsh Goel, Sahil Shah, Sandeep Chinchali Submitted to ICLR, 2026 arXiv Neuro-symbolic feedback enables zero-shot refinement of generated videos, boosting temporal and semantic alignment by nearly 40% without retraining.
	Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification S. P. Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali CVPR, 2025 project page / paper / arXiv / code Formally verifying a video against a temporal logic specification yields a text-to-video metric that aligns 5× better with human judgment than existing scores.
	COFFEE: a High-Performance Approach to Convex Optimization for Thermodynamic Equilibrium Computations Fu-Yao Yu, Sahil Shah*, Yash Mittal, Paul Bessler, Aamir Mohsin, Jeffrey Geng, Arnav Vats, David Soloveichik SIEDS, 2025 (Best Paper)* project page / paper / code A trust-region convex solver for molecular equilibrium runs 2× faster and 10⁷× more accurately than prior tools, scaling to large biochemistry datasets.
	Real-Time Privacy Preservation for Robot Visual Perception Minkyu Choi, Yunhao Yang, Neel P Bhatt, Kushagra Gupta, Sahil Shah, Aditya Rai, David Fridovich-Keil, Ufuk Topcu, Sandeep Chinchali TMLR*, 2025 arXiv Blurring objects based on logical specifications enables real-time video privacy with 95%+ compliance, provable guarantees, and seamless robot deployment.
	Towards Neuro-Symbolic Video Understanding Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali ECCV, 2024 (Oral Presentation) project page / paper / arXiv / code Decoupling perception and temporal reasoning with TL-based state machines boosts long-range event identification by up to 15% on self-driving datasets.

Designed and based on Jon Barron's Website.