How Is

Cardboard

Using AI?

Lets teams edit professional video through natural language commands in the browser, no NLE needed.

Using multimodal LLMs for natural language video editing, automated highlight extraction from footage, and semantic search across video libraries.

Company Overview

AI-powered, browser-based video editor that uses multimodal LLMs and agentic AI to let users edit professional video through natural language commands, automated clip selection, and real-time collaboration. Runs natively in the browser via WebGPU and WebCodecs.

Product Roadmap & Public Announcements

Browser-based professional editing with natural language commands, automated captioning, cloud collaboration, and export to Premiere Pro/DaVinci Resolve. Tiered pricing (Creator $60/mo, Pro $150/mo, Teams custom). Highest upvoted Hacker News launch in YC W26.

Signals & Private Analysis

Pricing tiers suggest enterprise/agency sales motion. 'Early access to new models' on Pro tier hints at proprietary model development. Lean engineering team focused on core AI R&D. Likely plugin/agent extensibility architecture.

Cardboard

Machine Learning Use Cases

Natural Language Video Editing
For
Product Differentiation
Product

<p>AI interprets plain-English editing commands and autonomously executes complex timeline operations on video projects.</p>

Layman's Explanation

Instead of clicking through menus and dragging clips on a timeline, you just tell the editor what you want in plain English and it does it for you.

Use Case Details

Cardboard's semantic natural language editing system uses multimodal large language models to parse user intent from free-text prompts and map those instructions to precise timeline operations—cuts, transitions, reordering, audio adjustments, and effects. The system ingests both the text command and the underlying video/audio content, enabling context-aware edits (e.g., "remove all pauses longer than 2 seconds" or "add a zoom-in every time the speaker says 'important'"). This goes beyond simple keyword matching; the model understands temporal, spatial, and semantic relationships within the footage, allowing it to execute multi-step editing workflows from a single prompt. The result is a dramatically lower barrier to entry for professional-quality editing and a significant speed advantage for experienced editors handling repetitive tasks.

Analogy

It's like having a film editor sitting next to you who instantly understands "make it punchier" without you ever touching the Avid.

Automated Highlight Extraction
For
Cost Reduction
Operations

<p>AI analyzes raw footage to automatically identify and extract the best moments, generate highlight reels, and sync edits to music.</p>

Layman's Explanation

The AI watches all your raw footage, picks out the best moments, and assembles a polished highlight reel before you've even finished your coffee.

Use Case Details

Cardboard's automated clip selection engine leverages multimodal LLMs and computer vision to analyze hours of raw video footage—detecting speaker energy, facial expressions, audience engagement cues, audio peaks, and semantic content relevance—to surface the most compelling moments automatically. The system scores each segment on multiple dimensions (visual quality, audio clarity, emotional intensity, topical relevance) and assembles candidate highlight reels ranked by predicted viewer engagement. It can also sync selected clips to uploaded music tracks by aligning cuts to beat patterns and energy curves. For marketing teams and creators producing high volumes of content (podcasts, webinars, live streams), this eliminates the most time-consuming phase of post-production: reviewing and selecting footage. The output is a draft timeline that users can refine, rather than a blank canvas they must build from scratch.

Analogy

It's like hiring an intern who actually watched all 47 hours of your conference footage and somehow picked the only five minutes worth posting.

Semantic Video Search
For
Decision Quality
Data

<p>AI enables users to search across all uploaded footage by describing what happened in the video, not by filenames or timestamps.</p>

Layman's Explanation

Instead of scrubbing through hours of footage looking for "the part where she holds up the product," you just type that and the AI finds it instantly.

Use Case Details

Cardboard's content-based semantic search system indexes all uploaded video assets using multimodal embeddings—combining visual scene understanding, object/person detection, speech transcription, and on-screen text recognition into a unified vector representation. When a user types a natural language query like "the moment the CEO mentions Q3 revenue" or "close-up of the red sneakers," the system performs a semantic similarity search across the entire indexed library and returns timestamped results ranked by relevance. This fundamentally changes how editors and teams interact with large media libraries: instead of relying on manual tagging, folder structures, or memory, they can treat their footage like a searchable knowledge base. For organizations producing hundreds of hours of content monthly, this capability transforms asset management from a bottleneck into a competitive advantage, enabling rapid repurposing and content discovery.

Analogy

It's like Google Search, but instead of searching the internet, you're searching your own chaotic mountain of unedited footage—and it actually works.

Key Technical Team Members

  • Saksham Aggarwal, Co-Founder & CEO
  • Ishan Sharma, Co-Founder & CTO

Saksham and Ishan have known each other for 15 years. Saksham built AI products at Iterate AI and published at ACL. Ishan spent 4.5 years building high-performance web apps at HackerRank and built Hotspoter (5M+ downloads) at age 14. Together they can build the editor core others avoid.

Cardboard

Funding History

  • 2024: Saksham Aggarwal and Ishan Sharma co-found Cardboard
  • 2024 Aug: $2.12M Seed from 3 investors
  • 2026: Y Combinator W26 batch ($500K)
  • 2026: ~$2.62M+ raised to date

Cardboard

Competitors

  • Professional NLEs: Adobe Premiere Pro, DaVinci Resolve, Final Cut Pro
  • AI-Native Editors: Descript, Runway, Kapwing, Opus Clip
  • Browser-Based: Clipchamp (Microsoft), Canva Video, WeVideo
  • Emerging AI: Captions, Vizard.ai
More

Companies
Get Every New ML Use Cases Directly to Your Inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.