Building human-like AI voices that speak, clone, dub, and converse in 70+ languages

Technology
|
Voice AI
|
Series D+

Last Updated:
April 29, 2026

ElevenLabs is building a full-stack audio AI platform spanning three product surfaces: foundation models (TTS, STT, dubbing), developer APIs, and an enterprise agent platform (ElevenAgents). The company has recently extended beyond audio into image and video generation within its ElevenCreative suite, moving it into direct competition with pure-play multimodal providers such as Runway, Pika, and Adobe Firefly.
ElevenLabs has grown from a single voice-cloning tool into a full AI platform, and is now expanding beyond audio into image and video. Everything is organized around three pillars: agents that talk, creative tools that make content, and APIs that developers plug into.
Our research indicates the biggest strategic bet is ElevenAgents, the AI agent platform being positioned as the primary enterprise growth engine. The creative suite has quietly expanded from audio into image and video, a major pivot that puts ElevenLabs in direct competition with Runway, Pika, and Adobe Firefly. The company is also productizing AI safety as a real platform pillar rather than treating it as compliance overhead, while continuing heavy investment in the core voice research that established its category lead.
Post-sales, partnership, and AI safety infrastructure are all being built from the ground up at a moment when the company is simultaneously chasing seven-figure enterprise deals and defending category leadership. Competitive pressure is intensifying from hyperscalers (OpenAI, Google, Microsoft, Amazon) and well-funded specialists (Cartesia, Deepgram) who are actively closing the quality and latency gap.
Revenue has scaled from $200M to $330M+ ARR and valuation has tripled from $3.3B to $11B in roughly 13 months, with 20+ senior roles being hired simultaneously in a classic blitzscale pattern.
Aggressive simultaneous global build-out with native-language Country GMs being installed across Germany, Netherlands, Belgium, and Chile, plus active enterprise expansion across France, Spain, Poland, UAE, Singapore, Australia, Japan, India, and Latin America.
ElevenCreative is an expansion from pure audio into a full multimodal content creation platform spanning voice, music, image, and video generation. The move repositions ElevenLabs as a direct competitor to Runway, Pika, and Adobe Firefly, while leveraging its proprietary audio moat and emotional prosody lead as the anchor differentiator against pure-play multimodal providers.
ElevenLabs is quietly becoming a full creative studio, where you can generate the voice, the music, the images, and the video for an entire piece of content in one place, all in 70+ languages, with the same emotional quality that made their voices famous.
Like a bakery famous for bread deciding to sell the whole meal. They already have the customers walking in hungry, they just need to prove they can cook everything else as well as they bake.
ElevenAgents is a full-stack platform for building production-grade voice and chat agents that handle real-time customer conversations at enterprise scale. It provides visual workflow builders, reliability controls, and cross-sell positioning as the primary enterprise expansion wedge across existing ElevenLabs customers.
An easy-to-use toolkit that lets big companies build AI agents that can hold a natural phone or chat conversation, sound human, and take real actions on a customer's behalf.
Like giving every enterprise the ability to staff an infinite, multilingual call center overnight, except every agent sounds like their best human rep having a good day.
Productized AI Safety Platform: ground-up construction of scalable content moderation, abuse detection, agent guardrails, and content provenance infrastructure. The effort positions safety as a first-class product pillar rather than compliance overhead, protecting the platform from regulatory and reputational risk while serving as a competitive moat against both hyperscalers and regulators.
ElevenLabs is building the AI equivalent of a TSA: automated systems that catch bad voice clones, harmful content, and misbehaving agents in real time, across voice, music, image, and video, at massive scale.
Like installing seatbelts, airbags, and crash sensors in a Formula 1 car while it's still racing at 200 miles per hour. Necessary, urgent, and impossible to do without slowing down somewhere.
ElevenLabs' leadership pairs enterprise deployment experience (ex-Palantir) with frontier ML research (ex-Google). The two founders have known each other since school in Poland, providing deep personal trust at the CEO and CTO level. Their initial traction also came organically from hobbyist creators, meme-makers, and indie game developers virally adopting the voice cloning tool, giving the founders a genuine bottoms-up distribution instinct that few enterprise-focused AI teams have successfully operated.