Builds the training data supply chain for humanoid robots from real-world human movement.
Using crowdsourced egocentric video with 3D pose annotation, motion foundation model pre-training from diverse human activity, and automated ML quality scoring.

Technology
|
Robotics
|
YC W26

Last Updated:
March 19, 2026

Builds the data infrastructure for humanoid robotics. Workers wear a phone on a lightweight headband while performing daily tasks (cooking, cleaning, organizing), capturing egocentric video that provides the diverse, real-world human movement data that robots need to learn from. Positioning as the supply chain for robotics training data.
Asimov has launched data collection with lightweight headband hardware, a mobile recording app, and global partnerships with businesses. Actively hiring Data Collection Specialists worldwide at $5-30/hr. Per their YC launch, they are already supplying data to major robotics players. Partners can cover worker salaries by having employees wear collection kits during normal work.
Active global hiring for data collectors indicates rapid geographic scaling. The infrastructure-first approach (owning the data supply chain before models) mirrors Scale AI's early playbook. Strong indicators of upcoming partnerships with humanoid manufacturers (Figure, Agility, etc.). Former roommates with a prior startup, suggesting strong co-founder chemistry.
<p>Crowdsourced Egocentric Video Collection & 3D Pose Annotation for Robot Imitation Learning</p>
Workers wear lightweight headbands while doing their normal jobs, and the video gets turned into 3D movement recipes that teach robots how to fold towels, wash dishes, and navigate cluttered rooms.
Asimov deploys proprietary wearable headband devices to workers at partner businesses—cleaning companies, hotels, restaurants, and warehouses—capturing egocentric (first-person) video of authentic human tasks in natural, uncontrolled environments. The raw video is processed through an ML-driven annotation pipeline that extracts 3D human pose estimation, depth maps, object segmentation, and semantic labels. These richly annotated datasets are then formatted for imitation learning frameworks, enabling humanoid robots to learn complex, multi-step manipulation and navigation behaviors by observing diverse human demonstrations. Unlike lab-collected or teleoperation data, this approach captures the full variability of real-world conditions—lighting changes, clutter, unexpected obstacles—producing training data that generalizes far better to deployment environments. The vertically integrated pipeline from hardware to annotation to delivery ensures quality control and rapid iteration.
It's like hiring thousands of cooking show hosts to wear GoPros while making dinner, then turning all that footage into a cookbook that robots can actually follow.
<p>Motion Foundation Model Pre-training from Diverse Human Activity Data</p>
Asimov is building a massive library of how humans move through everyday life so they can train one giant AI brain that lets any robot learn new tasks almost instantly.
Asimov's long-term technical play is to leverage its uniquely diverse, large-scale human movement dataset to pre-train motion foundation models—large neural networks that learn generalizable representations of human activity, physics, and object interaction. By training on data spanning hundreds of environments, thousands of workers, and dozens of task categories (cleaning, cooking, organizing, carrying, navigating), these models capture transferable priors about how humans interact with the physical world. Downstream, humanoid robot manufacturers can fine-tune these foundation models on their specific hardware with minimal additional data, dramatically reducing the time and cost to deploy robots in new domains. This mirrors the language model paradigm where large-scale pre-training on diverse text enables rapid adaptation to specific tasks. Asimov's competitive moat deepens with every hour of data collected, as the foundation model improves with scale and diversity—creating a data flywheel that is extremely difficult for competitors to replicate without equivalent real-world collection infrastructure.
Think of it as teaching a robot the "common sense" of physical movement the same way GPT learned common sense about language—by reading (or in this case, watching) everything humans do.
<p>Automated Data Quality Scoring and Anomaly Detection Across Distributed Collection Network</p>
An AI watchdog automatically checks every video workers upload, instantly flagging blurry footage, blocked cameras, or missing movements so nothing unusable makes it into the final dataset.
As Asimov scales its crowdsourced data collection across hundreds of distributed sites and thousands of workers, maintaining consistent data quality becomes a critical operational challenge. To solve this, Asimov deploys ML-based automated quality scoring models that evaluate each uploaded recording in near real-time across multiple dimensions: video clarity and stability (blur detection, occlusion detection), completeness of captured task sequences (activity recognition to verify expected task was performed), 3D pose estimation confidence scores (flagging frames where pose extraction is unreliable), and environmental consistency checks (lighting, camera angle). Anomalous or low-quality recordings are automatically triaged—either flagged for human review, sent back to the collector with specific corrective instructions, or discarded. This system dramatically reduces the cost of manual QA review, ensures downstream ML models are trained on clean data, and creates a feedback loop that improves collector behavior over time. As the network scales globally, this automated quality gate becomes essential infrastructure that prevents garbage-in-garbage-out problems from undermining the entire data product.
It's like having a strict but fair film editor who watches every take in real time and immediately yells "cut!" if someone left the lens cap on.
Anshul brings Scale AI data pipeline experience and Lyem brings Air Force ML deployment experience. They have been building together since college (former roommates) and previously co-founded a startup to six figures in revenue. Direct labor partnerships with businesses worldwide let them capture authentic movement in diverse environments that simulation-only approaches cannot replicate.