
Technology
|
AI/ML Testing
|
YC W26
|
Valuation:
Undisclosed

Last Updated:
March 24, 2026

Builds an automated testing platform that mimics users in production environments to catch AI agent failures before they reach real users. Uses synthetic test generation and custom ML scorers to evaluate agents across multiple modalities.
Testing platform mimicking users in production environments. Synthetic scenario-driven testing. Multi-modal testing across text, voice, and images.
Team of 2 per YC profile. AI agent testing space growing rapidly. Likely building toward CI/CD pipeline integration.
Uses large language models to automatically generate diverse, realistic, and adversarial synthetic test scenarios that simulate real-world user interactions with AI agents across multiple modalities.
It's like hiring a thousand creative QA testers who never sleep, each dreaming up unique ways to trick and challenge your AI agent.
It's like having an improv comedy troupe that rehearses every possible audience heckle so your AI agent never freezes on stage.
Deploys fine-tuned machine learning models as custom scorers that automatically evaluate AI agent outputs against business-specific quality criteria, replacing subjective human review with consistent, scalable, and configurable evaluation.
Instead of having a room full of experts grade every AI response by hand, Ashr trains a digital judge that scores answers the way your business cares about.
It's like training a food critic who knows exactly what your restaurant's regulars love, so every dish gets scored before it leaves the kitchen.
Orchestrates large-scale, ML-driven swarm tests that simultaneously bombard AI agents with thousands of diverse multi-modal inputs to uncover rare edge cases, performance bottlenecks, and cross-modal failure modes that traditional testing misses.
Imagine unleashing a flash mob of thousands of simulated users—each speaking, typing, uploading, and clicking differently—all at once to see if your AI agent can handle the chaos.
It's like crash-testing a car not just from the front, but from every angle, speed, and weather condition simultaneously—before a single customer gets behind the wheel.
Shreyas Kaps brings 2 years building AI agents in production (finance and DevOps). Rohan Kulkarni (Berkeley EECS) has a successful exit (Ask Geri, acquired). They build testing infrastructure from the perspective of agent builders.