How Is

Human Archive

Using AI?

Captures 8,000 hours/day of multimodal human activity data to train the next generation of robots.

Using multimodal robotics data curation with synchronized video/depth/tactile sensors, world model training pipelines, and tactile sensing data annotation.

Company Overview

Builds and operates large-scale, multimodal data collection infrastructure,capturing synchronized video, depth, tactile, and motion data from real-world human activities,to provide high-fidelity training datasets for robotics and embodied AI companies.

Product Roadmap & Public Announcements

Human Archive has publicly announced scaling multimodal data capture operations across diverse real-world environments (homes, restaurants, hotels, retail, construction, horticulture, industrial settings) in India, with capacity of up to 8,000 hours of annotated data per day. They have disclosed their sensor suite (egocentric RGB, stereo depth, tactile, IMU, wrist cam) and emphasized building the largest multimodal manual labor dataset for robotics foundation models.

Signals & Private Analysis

Job postings for Hardware Engineer Associates in India signal continued investment in custom capture hardware. A 50,000+ contributor network and 30-person ops team in India suggest rapid operational scaling. No public ML/AI research or open-source activity hints at a data-first, infrastructure-first strategy rather than model development. Absence of senior ML hires or AI research roles suggests they are positioning as a data supplier, not a model builder,potentially making them an acquisition target for larger robotics or AI labs seeking proprietary training data.

Human Archive

Machine Learning Use Cases

Multimodal Robotics Data Curation
For
Product Differentiation
Engineering

<p>Providing large-scale, multimodal training datasets to robotics companies building foundation models for dexterous manipulation and physical interaction.</p>

Layman's Explanation

They film thousands of people doing everyday physical tasks with special sensor rigs so robot brains can learn how humans actually move and touch things.

Use Case Details

Human Archive deploys custom multi-sensor hardware rigs—capturing synchronized egocentric RGB video, stereo depth, tactile feedback, IMU, and wrist-cam data—across diverse real-world environments such as homes, restaurants, hotels, retail stores, and construction sites. Their 50,000+ contributor network generates up to 8,000 hours of richly annotated data per day, which is then processed, anonymized, and structured for direct ingestion by robotics companies training foundation models for dexterous manipulation, object interaction, and navigation. This addresses a critical bottleneck: most robotics labs lack the operational infrastructure to collect diverse, high-fidelity, real-world sensorimotor data at scale, and instead rely on simulation or small-scale lab demonstrations. Human Archive's datasets enable these teams to train more generalizable models that transfer better to real-world deployment, dramatically accelerating the path from research prototype to production robot.

Analogy

It's like hiring 50,000 people to wear GoPros and smart gloves while doing chores so robots can binge-watch humanity's how-to playlist.

World Model Training Data
For
Decision Quality
Data

<p>Supplying annotated, environment-diverse datasets to AI labs developing world models and scene understanding for embodied agents.</p>

Layman's Explanation

They capture what dozens of different real-world spaces actually look and feel like so AI agents can understand kitchens, warehouses, and gardens without ever visiting one.

Use Case Details

World models—internal representations that allow AI agents to predict the consequences of actions in physical environments—require vast quantities of diverse, real-world sensory data to generalize beyond narrow lab settings. Human Archive addresses this by collecting multimodal data (video, depth, tactile, motion) across a uniquely broad range of environments: homes, restaurants, hotels, retail, transportation, construction, horticulture, and industrial sites. This environmental diversity is critical because most existing robotics datasets are collected in a handful of controlled lab environments, leading to world models that fail when deployed in novel settings. By providing structured, annotated data from dozens of real-world environment types, Human Archive enables AI labs to train world models with significantly better generalization, reducing the sim-to-real gap and accelerating the development of embodied agents that can operate reliably in the messy, unpredictable real world.

Analogy

It's like giving an AI a passport full of stamps instead of just a postcard from one lab—suddenly it knows what a construction site smells like, not just a kitchen.

Robotics Model Benchmarking
For
Operational Efficiency
Product

<p>Enabling robotics startups to benchmark and evaluate model performance across diverse manipulation tasks and environments using standardized, real-world datasets.</p>

Layman's Explanation

They give robot makers a standardized test using real-world footage so everyone can fairly compare whose robot is actually smarter.

Use Case Details

One of the biggest challenges in robotics AI is the lack of standardized, real-world benchmarks for evaluating model performance across diverse tasks and environments. Most teams evaluate on their own internal data, making it nearly impossible to compare results or track progress across the field. Human Archive's large-scale, multimodal, and environment-diverse datasets are uniquely positioned to serve as a shared evaluation benchmark for the robotics community. By providing consistently captured, richly annotated data spanning dozens of task types (e.g., object manipulation, tool use, navigation) and environment categories, Human Archive can offer robotics startups and research labs a common ground for model evaluation. This reduces duplicated effort, accelerates iteration cycles, and provides a credible, third-party standard for measuring progress in embodied AI—similar to the role ImageNet played for computer vision.

Analogy

It's the SAT for robots—finally, everyone takes the same test instead of grading themselves on their own homework.

Tactile Sensing Data
For
Product Differentiation
Engineering

<p>Providing tactile and force-feedback datasets to companies developing dexterous robotic hands and haptic systems.</p>

Layman's Explanation

They record what it actually feels like to pick up, squeeze, and handle hundreds of different objects so robotic hands can finally learn to not crush the tomato.

Use Case Details

Dexterous manipulation—the ability of a robotic hand to grasp, move, and manipulate objects with human-like precision—is one of the hardest unsolved problems in robotics. A key bottleneck is the scarcity of large-scale, real-world tactile and force-feedback data. Human Archive's custom sensor rigs include tactile sensors that capture the forces and pressures exerted during human manipulation of objects across a wide range of tasks and environments. This data, synchronized with video, depth, and motion streams, provides a uniquely rich training signal for companies developing dexterous robotic hands and haptic feedback systems. By offering datasets that cover hundreds of object types (from fragile produce to heavy tools) and dozens of manipulation tasks (grasping, pouring, assembling, cutting), Human Archive enables these companies to train models that generalize across object properties and task demands—dramatically improving grasp success rates and reducing damage to objects and environments.

Analogy

It's like teaching a robot the difference between shaking hands and crushing a grape—by letting it study thousands of hours of humans doing both.

Key Technical Team Members

  • Rushil Agarwal, Co-founder
  • Samay Maini, Co-founder
  • Shloke Patel, Co-founder
  • Raj Patel, Co-founder

Human Archive's unfair advantage is its massive, operationalized contributor network of 50,000+ real-world data collectors paired with custom multi-sensor hardware rigs deployed across dozens of environment types,creating a proprietary, hard-to-replicate dataset moat for embodied AI that no competitor currently matches in scale, diversity, or annotation depth.

Human Archive

Funding History

  • 2025 | Founded by Rushil Agarwal, Samay Maini, Shloke Patel, and Raj Patel. 2025 | $500K Pre-Seed from Y Combinator (W25 batch). 2025 | Operational launch of data collection infrastructure in India with 30-person ops team. 2025,2026 | Scaled to 50,000+ contributor network and 8,000 hours/day capture capacity. 2026 | No additional funding rounds publicly announced.

Human Archive

Competitors

  • Scale AI (general data labeling/annotation at scale), Labelbox (data labeling platform), Everyday Robots / Google DeepMind (internal robotics data collection), Open X-Embodiment (open-source robotics dataset consortium), Covariant (proprietary robotics data pipelines), Skild AI (foundation models for robotics with proprietary data), Physical Intelligence (π) (robotics foundation models), Bridge Data / RT-2 datasets (academic/open-source robotics datasets)
More

Companies
Get Every New ML Use Cases Directly to Your Inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.