SDK that gives any app a TikTok-style video feed with adaptive streaming and AI captioning.
Using predictive video pre-fetching for smooth playback, multilingual video intelligence for captioning, and real-time content moderation.

|
Video Infrastructure
|
YC W26

Last Updated:
March 19, 2026

Provides an ML-powered short-form video SDK that enables consumer apps to embed TikTok-style vertical video feeds with adaptive streaming, AI captioning, auto moderation, and built-in analytics.
Shortkit has publicly launched its core SDK supporting iOS, Android, and Web with feed-aware playback, adaptive bitrate streaming, next-gen codec support, AI-powered captioning in 50+ languages, automated content moderation, in-feed analytics and surveys, native ad integrations, and AI tools to convert long-form video into short-form clips. Documentation signals continued investment in developer experience, rapid integration, and content ingestion from external CMS platforms.
GitHub and job board activity remains minimal, suggesting the two-person team is heads-down on core infrastructure rather than scaling headcount. The absence of public customer logos or case studies implies they are in early design-partner or pilot phase, likely working closely with a handful of YC-network consumer apps. Conference and community signals point toward deeper ML investment in recommendation-driven feed personalization and real-time adaptive quality optimization. The combination of a YC launch with no disclosed external round suggests a fundraise is imminent, likely a seed round in mid-2026 targeting $3,6M to fund engineering hires and go-to-market.
<p>ML models dynamically pre-fetch and pre-buffer upcoming videos based on user swipe patterns and real-time network conditions, delivering instant playback with minimal data waste.</p>
The app figures out which video you'll swipe to next and secretly loads it before you get there, so it plays instantly.
Shortkit's feed-aware player uses ML models trained on aggregated user interaction signals—swipe velocity, watch-through rates, session depth, and time-of-day patterns—to predict which video a user is most likely to view next. These predictions feed a priority queue that orchestrates pre-fetching over the CDN, weighted by real-time network quality estimates (bandwidth, latency, jitter) derived from ongoing connection telemetry. The system also selects the optimal codec and bitrate ladder rung per device, factoring in hardware decoder capabilities and battery state. This multi-signal approach means the player can start rendering the next video's first frame before the user even initiates the swipe, while avoiding wasteful downloads of videos the user is unlikely to reach. The result is a streaming experience that rivals first-party platforms like TikTok and YouTube Shorts, delivered as a drop-in SDK.
It's like a waiter who already knows your next order and has it plated before you even wave them over.
<p>Automated AI captioning in 50+ languages and intelligent long-form-to-short-form video conversion reduce creator effort and expand global reach for every app using the SDK.</p>
The SDK automatically writes subtitles in dozens of languages and chops long videos into the best short clips—so creators don't have to.
Shortkit's content transformation pipeline ingests raw video uploads and runs them through a multi-stage ML workflow. First, an automatic speech recognition (ASR) model transcribes audio and generates time-aligned captions, supporting 50+ languages with punctuation and speaker diarization. The captions are then available for on-screen rendering, accessibility compliance, and search indexing. Second, for long-form source material, a separate model analyzes visual saliency, audio energy, transcript sentiment, and scene-change detection to identify the highest-engagement segments and automatically extract short-form clips with proper framing and transitions. Thumbnail generation uses a frame-scoring model that evaluates facial presence, composition, and color vibrancy to select the most click-worthy still. Together, these capabilities let any app team ship a polished, globally accessible short-form video experience without building or maintaining their own ML pipelines.
It's like having a multilingual editor who watches every video, writes perfect subtitles, and picks the highlight reel—all before your coffee gets cold.
<p>ML-based auto-moderation scans every uploaded video for policy violations across visual, audio, and text signals, keeping feeds safe without manual review queues.</p>
Every video uploaded through the SDK gets automatically screened for harmful content so app teams don't need a room full of human moderators.
Shortkit's built-in auto-moderation system processes each video upload through a cascade of ML classifiers before it enters the public feed. A vision model scans sampled frames for NSFW content, graphic violence, hate symbols, and other policy categories, while an audio classifier detects harmful speech, copyrighted music, and synthetic voice manipulation. OCR and text-classification models analyze on-screen text and overlays for policy violations. Each signal produces a confidence-weighted risk score that is aggregated into a final moderation decision: auto-approve, auto-reject, or escalate to a human review queue exposed via webhook. Thresholds are configurable per customer, allowing apps targeting children to enforce stricter standards than general-audience platforms. The system continuously improves through a feedback loop where human review outcomes are used to fine-tune classifiers, reducing false positives over time. This turnkey moderation layer is a critical differentiator, as building equivalent in-house systems typically requires dedicated trust-and-safety engineering teams that most startups cannot afford.
It's like a bouncer at the door of every video feed who never sleeps, never blinks, and gets smarter every shift.
Michael Seleman literally built the infrastructure behind YouTube Shorts, giving Shortkit first-hand knowledge of the exact technical challenges (buffering, codec selection, feed ranking, global delivery) that every competitor must reverse-engineer from the outside.