Real-time AI threat detection in live surveillance for law enforcement and security.
Using real-time video threat detection with deep learning, multimodal vision-language search for investigations, and RL-based alert triage.

|
Public Safety Intelligence
|
YC W26

Last Updated:
March 19, 2026

Builds a real-time AI-powered video intelligence platform that uses deep learning, multimodal vision-language models, and reinforcement learning to detect and escalate threats in live surveillance feeds for law enforcement and large-scale security operations.
Protent has publicly announced pilot deployments with police departments in Atlanta, Chicago, and St. Louis, focusing on real-time threat detection in live video feeds. Their YC W26 Demo Day profile highlights proactive surveillance intelligence, natural language video querying, and early escalation pattern detection as core platform capabilities. No public SDK, API, or developer documentation has been released.
Behind the scenes, the team's lean size (2,3 technical founders, no public job postings) signals a heads-down, pilot-driven development phase tightly coupled to law enforcement feedback loops. Founder Srihan Balaji's prior work deploying RL-optimized video intelligence at Lockheed Martin suggests proprietary reinforcement learning techniques for alert prioritization are being refined internally. The absence of patent filings or open-source contributions indicates IP is being kept close. GitHub and conference activity is minimal, pointing to a stealth posture. Strong indicators of near-term expansion to adjacent verticals (stadiums, transit, corporate campuses) and likely development of integration APIs for existing VMS/PSIM platforms to accelerate enterprise adoption.
<p>Real-time detection of weapons, fights, and medical emergencies in live surveillance video feeds using deep learning object detection and activity recognition models.</p>
AI watches thousands of security cameras simultaneously and instantly alerts officers when it spots a weapon, a fight breaking out, or someone collapsing.
Protent's core use case applies deep convolutional neural networks and transformer-based architectures to perform continuous, frame-by-frame analysis of live video feeds from surveillance cameras deployed across urban environments. The system is trained to recognize a taxonomy of threat indicators—including visible weapons (firearms, knives), physical altercations, crowd surges, and medical emergencies such as collapses or seizures. Unlike traditional motion-detection systems that generate overwhelming false positives, Protent's models are trained on diverse, real-world footage to distinguish genuine threats from benign activity (e.g., someone opening an umbrella vs. brandishing a weapon). When a high-confidence detection occurs, the platform generates a prioritized alert with a visual snapshot, confidence score, camera location, and recommended response protocol, delivered directly to dispatch or command center dashboards. This is currently deployed in pilot programs with police departments in Atlanta, Chicago, and St. Louis, where it augments human monitoring of camera networks that would otherwise require dozens of analysts to watch in real time.
It's like giving every security camera its own tireless, eagle-eyed guard who never blinks, never takes a bathroom break, and immediately radios for backup the moment something looks wrong.
<p>Natural language querying of live and archived surveillance video using multimodal vision-language models, enabling operators to search footage by describing events in plain English.</p>
Instead of scrubbing through days of security footage, an officer can just type "person in red jacket near the east entrance at 2pm" and the system finds it instantly.
Protent leverages multimodal foundation models—combining computer vision encoders with large language model decoders—to enable natural language search over both live and archived video feeds. Security operators and investigators can issue free-text queries such as "show me all instances of an unattended bag near Gate 12 in the last 4 hours" or "find the person in a blue hoodie who entered through the south door," and the system retrieves relevant video clips ranked by semantic relevance. This capability transforms the investigative workflow from painstaking manual scrubbing (which can take hours or days across multi-camera networks) into a near-instantaneous retrieval process. The underlying architecture likely uses contrastive vision-language pretraining (similar to CLIP or SigLIP) fine-tuned on surveillance-domain data, combined with temporal indexing that maps semantic embeddings to specific timestamps and camera locations. This is a significant differentiator over legacy video management systems that rely solely on metadata tags or basic motion detection for search, and it positions Protent as a next-generation investigative tool for law enforcement and security teams managing large camera networks.
It's like having a Google search bar for your security cameras—except instead of searching the internet, you're searching everything every camera has ever seen, and it actually understands what you're asking.
<p>Reinforcement learning-based alert prioritization system that dynamically ranks and filters threat detections to minimize false positives and surface the highest-risk incidents first for human review.</p>
AI learns from every alert an officer acts on or dismisses, getting smarter over time about which threats are real and which are just someone waving a selfie stick.
Drawing directly from founder Srihan Balaji's experience deploying reinforcement learning-optimized video intelligence at Lockheed Martin, Protent applies RL techniques to solve one of the most persistent problems in automated surveillance: alert fatigue. Raw detection models, no matter how accurate, generate a high volume of alerts across large camera networks—many of which are low-priority or false positives. Protent's RL-based prioritization engine acts as an intelligent triage layer that sits between the detection models and the human operator. The RL agent learns a policy for ranking alerts based on multiple signals: detection confidence scores, contextual factors (time of day, location risk profile, crowd density), historical incident patterns, and critically, operator feedback (which alerts were acted upon, escalated, or dismissed). Over time, the system adapts its prioritization policy to each deployment environment, learning that a detected object in a school zone at 3pm warrants higher priority than the same detection in a warehouse at 2am. This closed-loop learning approach means the system continuously improves with use, reducing cognitive load on operators and ensuring that genuinely critical incidents are surfaced immediately. This RL layer is a key technical differentiator rooted in the founding team's defense-sector experience and is difficult for competitors relying on static rule-based filtering to replicate.
It's like training a really smart assistant who learns that when your boss emails you at 11pm it's urgent, but when the office newsletter lands at 9am it can wait—except the stakes are a lot higher than inbox management.
Protent combines rare production experience deploying reinforcement learning for video intelligence in classified defense environments (Lockheed Martin) with cutting-edge UC Berkeley systems research, giving them a unique ability to build ML models that work reliably on messy, real-world surveillance feeds,not just academic benchmarks.