
Technology
|
Computer Vision
|
YC W26
|
Valuation:
Undisclosed

Last Updated:
March 24, 2026

Builds ultra-low-latency AI infrastructure that enables developers to run vision-language models on live video streams via a simple API, achieving sub-200ms inference for real-time applications in robotics, gaming, security, and sports.
Overshoot has publicly announced support for multiple vision-language models (Qwen3-VL, InternVL3), a TypeScript/JavaScript SDK (MIT-licensed), structured JSON output schemas, and both clip-mode and frame-mode processing. Documentation details stream leasing, keepalive mechanisms, and multi-stream concurrency for enterprise-grade reliability and scalability.
GitHub activity and SDK architecture suggest active development of additional model integrations and agentic vision workflows. The model-agnostic API design signals plans to rapidly onboard new frontier VLMs. LiveKit/WebRTC transport layer investment suggests future edge inference or hybrid cloud-edge deployment. Prompt-as-program paradigm and runtime prompt updates point toward a low-code/no-code visual AI builder product.
Real-time security and anomaly detection on live video feeds using vision-language models with sub-200ms latency.
Instead of a human guard staring at 50 screens and missing things, an AI watches every camera simultaneously and instantly flags anything unusual in plain English.
It's like replacing a sleepy security guard with an eagle-eyed AI that never blinks, never takes a coffee break, and can read a "No Trespassing" sign in 47 languages.
AI-powered real-time sports and fitness form analysis delivering instant structured feedback to athletes and coaches via live video.
A virtual coach watches you exercise through your camera and instantly tells you if your squat form is off—no personal trainer required.
It's like having an Olympic coach who watches your every move through your phone, except this one never yells and always has time for you.
Automated real-time visual data extraction and structured labeling from live video streams to power downstream ML pipelines and analytics.
Instead of paying hundreds of people to watch videos and label objects by hand, an AI instantly converts what it sees into clean, organized data ready for analysis.
It's like hiring a thousand interns who can perfectly label every object in a video at superhuman speed—except they never misspell anything or accidentally label a cat as a dog.
The founders built real-time, high-throughput inference systems at Uber and Meta and shipped a computer vision startup acquired by Intel, giving them rare combined expertise in both low-latency distributed systems and production vision AI that most infra teams simply don't have.