
Technology
|
AI Simulation & Agent Training
|
YC W26
|
Valuation:
Undisclosed

Last Updated:
March 24, 2026

Builds automated simulated worlds for training and evaluating AI agents on complex, long-horizon, multi-tool digital workflows like software engineering, enterprise operations, and knowledge work.
Polymath has publicly released Horizon-SWE, a benchmark simulating an end-to-end software company where agents must plan, code, test, deploy, and monitor software. They've announced automated, human-in-the-loop environment generation and integration with real-world digital tools (Slack, GitHub, Linear, email, browsers). Their public roadmap centers on expanding benchmark coverage across enterprise digital workflows and scaling cloud-native simulation infrastructure.
Job postings for Build & DevOps Engineers, Networking Engineers, Data Pipeline Engineers, and Mechanical Designers suggest investment in robotics-adjacent simulation and distributed cloud infrastructure beyond their current digital-workflow focus. GitHub activity and hiring for Founding Research Engineers point toward differentiable simulation and sim-to-real transfer research. Sales hiring (Account Executives) signals imminent enterprise go-to-market motion.
Automated procedural world generation that creates diverse, realistic digital environments for training AI agents without manual scenario authoring.
Instead of humans hand-building every test scenario for AI agents, Polymath's system automatically generates thousands of realistic digital workplaces—complete with tools, data, and tasks—so agents can learn faster and cheaper.
It's like having an AI dungeon master who instantly conjures up thousands of unique office buildings—each with different teams, tools, and problems—so your AI intern can practice working at all of them before its first real day on the job.
Rigorous, automated evaluation of AI agents on long-horizon, multi-tool software engineering tasks using the Horizon-SWE benchmark with ML-powered verifiers.
Polymath built an automated grading system that tests whether AI coding agents can actually do a software engineer's full job—from reading a ticket to shipping and monitoring code—not just answer trivia questions about programming.
It's like building a full-scale simulated restaurant kitchen where you test whether a robot chef can handle an entire dinner service—not just see if it can chop an onion.
Synthetic interaction data generation from agent-environment sessions used to create high-quality training datasets for improving AI agent capabilities via reinforcement learning.
Every time an AI agent practices in Polymath's simulated worlds, it automatically creates labeled training data showing what worked and what didn't—like a flight simulator that writes its own textbook after every session.
It's like a driving school where every student's practice session automatically writes a new chapter in the driver's ed manual—complete with highlighted mistakes and gold-star moments—so the next class learns even faster.
Polymath's focus on automated world generation for multi-tool digital workflows fills a gap left by robotics-centric simulators like Isaac Sim and AI2-THOR. By building environments that mirror real enterprise toolchains (GitHub, Slack, Linear, browsers), they create the only realistic training ground for the next generation of AI coding and knowledge-work agents, validated by benchmarks where even frontier models score only ~25%.