Autonomous AI coworker for video editors that searches, labels, and assembles footage.
Using semantic video search and indexing, agentic sequence assembly from natural language, and automated media logging with frame-level understanding.

Technology
|
Video Editing
|
YC W26

Last Updated:
March 20, 2026

Builds an AI-powered desktop application that acts as an autonomous coworker for professional video editors, using semantic video indexing, natural language commands, and agentic sequencing to automate searching, labeling, organizing, and assembling footage into native Adobe Premiere Pro project files,all processed on-device for speed and privacy.
Wideframe has publicly launched a generally available Mac app (Apple Silicon) with semantic video indexing, natural language search, agentic timeline sequencing, and native .prproj export. Founder Daniel Pearson actively solicits user feedback on Product Hunt and social channels, signaling rapid iterative feature development. Public messaging emphasizes automating ~75% of editing prep work and expanding adoption among brands and agencies producing high-volume video content.
GitHub and tech stack signals (PyTorch, Hugging Face Transformers, SpeechBrain, MLFlow) suggest active investment in fine-tuning on-device vision-language models for frame-accurate understanding. Product Hunt feedback and early user reviews hint at upcoming collaboration features (shared projects, team asset management) and broader NLE integrations beyond Premiere Pro (e.g., DaVinci Resolve, Final Cut Pro). Job-adjacent signals and YC batch networking suggest potential hires in applied ML and video understanding. The "no synthetic content" positioning implies a deliberate strategy to build trust with professional editors wary of generative AI hallucinations, likely a wedge into enterprise post-production workflows.
AI-powered semantic indexing of raw footage enabling editors to search and retrieve clips using plain English descriptions instead of filenames or manual tags.
Instead of scrubbing through hours of footage to find the right shot, editors just type what they're looking for and the AI instantly finds it.
Wideframe's semantic video indexing pipeline ingests raw footage and runs frame-level deep learning analysis to extract visual, auditory, and contextual meaning from every clip. Custom vision-language models—built on PyTorch and fine-tuned with Hugging Face Transformers—generate dense semantic embeddings for each frame and scene, which are stored in a local vector index on the editor's machine. When an editor types a natural language query (e.g., "close-up of woman laughing near a window with warm light"), the NLP layer parses the intent and performs similarity search against the indexed embeddings to surface the most relevant clips ranked by semantic match. Speech content is transcribed on-device using models akin to OpenAI Whisper via SpeechBrain, enabling search across dialogue and narration as well. Because all processing runs locally on Apple Silicon, editors get sub-second search results with zero cloud latency and full data privacy—critical for agencies handling pre-release brand content or confidential footage. This eliminates the traditional workflow of manually logging, tagging, and bin-organizing footage, which typically consumes 30–50% of an editor's total project time.
It's like having a photographic-memory intern who watched every second of your footage and can instantly pull the exact clip you're thinking of when you describe it out loud.
AI agent that interprets high-level editorial intent from natural language instructions and autonomously assembles rough-cut timelines exported as native Adobe Premiere Pro project files.
You describe the video you want to make in plain English, and the AI builds a rough cut on your timeline using only your real footage—ready to open in Premiere Pro.
Wideframe's agentic sequencing engine represents the most technically ambitious layer of the product. When an editor provides a natural language brief (e.g., "Build a 60-second product launch video: open with the drone shot of the office, cut to CEO interview highlights about the new feature, intercut with screen recordings, and close with the logo animation"), a multi-step AI agent decomposes the instruction into discrete editorial tasks. The NLP layer—powered by transformer-based language models—parses narrative structure, pacing cues, and shot-type preferences. The agent then queries the semantic video index to retrieve candidate clips for each segment, ranks them by relevance and visual quality, and applies learned editing heuristics (shot variety, pacing rhythm, continuity) to assemble a coherent sequence. The output is serialized into a native .prproj XML file that preserves clip references, in/out points, track assignments, and sequence metadata, so editors can open the rough cut directly in Adobe Premiere Pro and refine it without any format conversion or re-linking. This agentic loop—intent parsing, retrieval, ranking, sequencing, and native export—runs entirely on-device, leveraging Apple Silicon's unified memory architecture for fast inference across vision, language, and sequencing models simultaneously. The system is designed to never hallucinate or generate synthetic footage; every frame in the output timeline corresponds to actual source material, maintaining editorial integrity.
It's like dictating your edit to a brilliant assistant editor who knows every frame of your footage, cuts a solid rough draft in minutes, and hands you a Premiere project file ready for your finishing touches.
AI-driven automated footage preparation that labels, categorizes, and organizes raw media assets into structured bins based on visual content, audio analysis, and contextual metadata—eliminating manual logging workflows.
The AI watches all your raw footage, automatically labels and sorts every clip into organized bins so your team never wastes time manually logging or loses track of a shot again.
For high-volume content teams and agencies producing dozens or hundreds of videos per week, the footage ingestion and organization phase is one of the most error-prone and time-consuming bottlenecks. Wideframe's automated prep pipeline addresses this by running multi-modal analysis on every imported clip: computer vision models detect and classify scene types (interview, b-roll, product shot, aerial, screen recording), identify faces and recurring subjects, and assess technical quality (exposure, focus, stability). Audio analysis models transcribe speech, detect music vs. dialogue vs. ambient sound, and flag usable vs. unusable audio segments. All extracted metadata is used to automatically generate descriptive labels, assign clips to semantically organized bins, and surface potential issues (e.g., corrupted files, duplicate takes, out-of-focus footage) before an editor ever touches the project. This intelligent asset organization layer runs as a background process on-device immediately after footage import, meaning editors sit down to a fully prepped, searchable, and logically structured project rather than a chaotic dump of memory cards. For operations managers overseeing content pipelines, this dramatically reduces the risk of missed footage, mislabeled assets, and duplicated effort across team members—turning what was previously a multi-hour manual task into an automated, reliable, and auditable process.
It's like having a meticulous librarian who instantly catalogs every book the moment it arrives, shelves it in exactly the right section, and flags any damaged copies—before you even walk through the door.
Wideframe combines on-device, frame-accurate video understanding with a natural language agentic interface that outputs native Premiere Pro files,letting professional editors keep their existing tools and workflows while offloading 75% of tedious prep work to AI, a trust-building approach that generative-AI-first competitors cannot replicate.