Polymath

Product & Competitive Intelligence

Builds simulated worlds for training AI agents on complex digital workflows.

Company Overview

Builds automated simulated worlds for training and evaluating AI agents on complex, long-horizon, multi-tool digital workflows like software engineering, enterprise operations, and knowledge work.

Competitive Advantage & Moat

Product Roadmap & Public Announcements

Polymath has publicly released Horizon-SWE, a benchmark simulating an end-to-end software company where agents must plan, code, test, deploy, and monitor software. They've announced automated, human-in-the-loop environment generation and integration with real-world digital tools (Slack, GitHub, Linear, email, browsers). Their public roadmap centers on expanding benchmark coverage across enterprise digital workflows and scaling cloud-native simulation infrastructure.

Signals & Private Analysis

Job postings for Build & DevOps Engineers, Networking Engineers, Data Pipeline Engineers, and Mechanical Designers suggest investment in robotics-adjacent simulation and distributed cloud infrastructure beyond their current digital-workflow focus. GitHub activity and hiring for Founding Research Engineers point toward differentiable simulation and sim-to-real transfer research. Sales hiring (Account Executives) signals imminent enterprise go-to-market motion.

Product Roadmap Priorities

Procedural Environment Generation
Improving
Cost Reduction
Engineering

Automated procedural world generation that creates diverse, realistic digital environments for training AI agents without manual scenario authoring.

In Plain English

Instead of humans hand-building every test scenario for AI agents, Polymath's system automatically generates thousands of realistic digital workplaces—complete with tools, data, and tasks—so agents can learn faster and cheaper.

Analogy

It's like having an AI dungeon master who instantly conjures up thousands of unique office buildings—each with different teams, tools, and problems—so your AI intern can practice working at all of them before its first real day on the job.

Agent Benchmark Verification
Improving
Product Differentiation
Product

Rigorous, automated evaluation of AI agents on long-horizon, multi-tool software engineering tasks using the Horizon-SWE benchmark with ML-powered verifiers.

In Plain English

Polymath built an automated grading system that tests whether AI coding agents can actually do a software engineer's full job—from reading a ticket to shipping and monitoring code—not just answer trivia questions about programming.

Analogy

It's like building a full-scale simulated restaurant kitchen where you test whether a robot chef can handle an entire dinner service—not just see if it can chop an onion.

Synthetic RL Training Data
Improving
Decision Quality
Data

Synthetic interaction data generation from agent-environment sessions used to create high-quality training datasets for improving AI agent capabilities via reinforcement learning.

In Plain English

Every time an AI agent practices in Polymath's simulated worlds, it automatically creates labeled training data showing what worked and what didn't—like a flight simulator that writes its own textbook after every session.

Analogy

It's like a driving school where every student's practice session automatically writes a new chapter in the driver's ed manual—complete with highlighted mistakes and gold-star moments—so the next class learns even faster.

Company Overview

Key Team Members

  • Dylan Ma, Co-Founder
  • Naren Yenuganti, Co-Founder

Polymath's focus on automated world generation for multi-tool digital workflows fills a gap left by robotics-centric simulators like Isaac Sim and AI2-THOR. By building environments that mirror real enterprise toolchains (GitHub, Slack, Linear, browsers), they create the only realistic training ground for the next generation of AI coding and knowledge-work agents, validated by benchmarks where even frontier models score only ~25%.

Funding History

  • 2026 | Dylan Ma and Naren Yenuganti co-found Polymath.
  • 2026 | Accepted into Y Combinator Winter 2026 batch.
  • 2026 | Launches Horizon-SWE benchmark.
  • 2026 | Actively hiring across engineering, research, and sales.

Competitors

  • AI Agent Benchmarks & Simulation: SWE-bench (Princeton/OpenAI), WebArena (CMU), OSWorld (academic).
  • Robotics Simulation: NVIDIA Isaac Sim, Meta Habitat, AI2-THOR, MuJoCo.
  • AI Agent Platforms: Cognition (Devin), Factory AI, All Hands AI (OpenHands).
  • Synthetic Data & Sim Platforms: Synthesis AI, Datagen (Unity), Genesis.