
Technology
|
Cloud Infrastructure
|
YC W26
|
Valuation:
Undisclosed

Last Updated:
March 24, 2026

Builds a serverless, globally aggregated GPU cloud with predictive scheduling, live workload migration, and proprietary inference engines for ultra-fast, cost-efficient AI model hosting, training, and inference.
Serverless GPU cloud with scale-to-zero and pay-per-second billing. Fractional GPU sharing via GPU Credits. IonRouter multi-model inference with IonAttention Engine. NVIDIA GH200 support. NVIDIA Inception member.
CRIU-based GPU workload migration hints at hybrid on-prem + cloud. Custom kernel-level CUDA optimizations. Possible AMD MI300X support. Enterprise tier with SLA guarantees likely.
Predictive GPU scheduling and autonomous workload placement that uses ML agents to forecast resource demand, pre-allocate fractional GPU capacity, and auto-recover failed jobs across a globally distributed compute pool.
An AI dispatcher watches every GPU in the fleet and figures out where to send each job before you even click run, so nothing sits idle and nothing crashes without a backup plan.
It's like having an air traffic controller who not only knows where every plane is, but also predicts turbulence before it happens and reroutes flights while passengers are still sipping their coffee.
IonRouter multi-model inference routing with the proprietary IonAttention Engine, enabling concurrent serving of multiple LLMs with shared KV-cache memory pools and sub-750ms model swaps on a single GPU.
Instead of renting a whole GPU for each AI model, IonRouter lets multiple models share one GPU's memory like roommates splitting rent, swapping in and out so fast users never notice.
It's like a restaurant kitchen where one chef can flawlessly switch between Italian, Japanese, and Mexican dishes mid-service because all the shared ingredients are already prepped and within arm's reach.
Lifecycle-driven GPU memory management system that uses ML-informed heuristics to dynamically allocate, migrate, and reclaim GPU HBM across concurrent workloads, enabling fractional GPU sharing with near-zero performance degradation.
The system watches how each AI job uses GPU memory over its lifetime and automatically shuffles, compresses, and reclaims unused space so multiple teams can share expensive hardware without stepping on each other's toes.
It's like a hyper-efficient hotel concierge who moves your luggage to a nearby storage room the moment you leave for dinner, then has it back in your suite before you return—so the hotel can rent your closet space to someone else in the meantime.
Suryaa Rajinikanth brings hands-on experience building GPU compute platforms (ex-TensorDock) and mission-critical ML infrastructure (ex-Palantir). Veer Shah brings ML expertise from U.S. Space Force and NASA. Together they combine rare GPU platform-building experience with government-grade ML deployment.