Deepgram

Roadmap & Position in Voice AI / Speech Recognition

Voice AI infrastructure for real-time speech-to-text, text-to-speech, and voice agents.

Company Overview

Deepgram offers APIs for real-time speech recognition, text-to-speech, and voice agent orchestration, built on proprietary end-to-end deep learning models trained from scratch. Customers include Twilio in contact centers, Jack in the Box in QSR, IBM in enterprise AI, and NASA in defense and government.

What They're Building

The company's public product roadmap & what they're committed to building.

Nova-3 Multilingual Expansion

Extends the flagship STT model to 30+ languages, with keyterm prompting and vocabulary adaptation.

Flux Conversational STT

Purpose-built ASR for voice agents, with end-of-turn detection and code-switching across 10 languages.

Voice Agent API

A unified orchestration layer that combines STT, TTS, and LLMs with barge-in detection and function calling.

Deepgram for Restaurants

Vertical QSR product for voice ordering with POS integration, built on the OfOne acquisition.

Edge and On-Premises Deployment

Air-gapped and embedded deployments for defense and other regulated environments.

Latest Intelligence

Zeitgeist tracks private signals to determine where the company is heading strategically.

Deepgram: Engineers close enterprise deals here not salespeople

May 9, 2026

Confidence:

High

New Intel: Deepgram is splitting into a self-serve developer platform and a high-touch enterprise motion. Technical depth shapes the sale, which is a different model than most AI API companies.

Deepgram: Working toward models that skip text entirely

May 9, 2026

Confidence:

High

New Intel: Deepgram is working on voice models that skip text entirely, going straight from audio-in to audio-out. If it works, today's voice agent stack starts to look slow.

Deepgram: Building a second deployment surface for defense and IoT

May 9, 2026

Confidence:

High

New Intel: Deepgram is building voice AI that can run closer to the edge on Qualcomm chips. That opens defense and IoT markets where cloud-only voice APIs are a weaker fit.

Founder and Key Execs

Scott Stephenson

Co-Founder & CEO (Ph.D. Particle Physics, dark matter research, University of Michigan)

Adam Sypniewski

Co-Founder & CTO (Ph.D. Experimental Cosmology, ML for autonomous vehicles)

Noah Shutty

Co-Founder & CTO (Math & Physics, Stanford/Michigan, early YC W16)

Mehul Patel

VP Product (ex-SoundHound/Houndify)

Founder Force Multiplier

Three physicists who built particle detectors and dark matter experiments before founding the company, then applied the same signal-from-noise discipline to speech recognition. The technical depth shows up in the decision to train models from scratch on owned hardware rather than fine-tune third-party foundations.

Funding History

Notable Open Roles

Research Staff, Voice AI Foundations

Research role tied to Staff, voice ai.

Research Staff, LLMs

Research role tied to Foundation model in strategic role description.

Research Staff, Data Science

Research role tied to Foundation model in strategic role description.

Competitors

Direct:

‍ElevenLabs, OpenAI Whisper, Google Cloud Speech, Amazon Polly, AssemblyAI, Cartesia, PlayHT

Indirect:

Twilio, Sierra, Retell AI, LiveKit

Deepgram

's Moat:

Vertical integration from bare-metal HPC training through a Rust-based inference runtime gives Deepgram a 2-5x cost advantage over hyperscaler-dependent competitors, with end-to-end control over latency.

How They're Leveraging AI

Low-Latency Neural Speech Synthesis

Deepgram is redefining text-to-speech as purpose-built infrastructure for real-time voice agents rather than content narration. The Audio Turing Test — the company's flagship benchmark for human-indistinguishable synthetic speech — is the marketing vehicle for this category claim.

Edge-Deployable Speech Models

Air-gapped, FedRAMP-aligned speech infrastructure for defense, intelligence, and regulated enterprise customers who cannot send audio data through any external cloud. This is the deployment model that Cartesia is also pursuing, but Deepgram has a head start with dedicated federal hiring and cleared personnel.

Domain-Specific Speech Model Fine-Tuning

A repeatable vertical solution model, starting with quick-service restaurants via OfOne, that packages domain-tuned speech models, POS and workflow integrations, and forward-deployed engineering into fleet-scale deployments. The playbook: pick a vertical, build a dedicated model variant, embed engineers in customer environments, scale across the industry.

Real-Time Speech Intelligence

Real-time voice AI pipeline for transcription, text-to-speech, and conversational agents, including turn detection and domain vocabulary handling.

AI Use Overview:

Deepgram trains end-to-end STT, TTS, and speech-to-speech models in-house on its own HPC cluster, then serves them through a Rust inference runtime tuned for real-time transcription, voice agents, end-of-turn detection, keyterm prompting, and QSR ordering automation.

More Similar Companies

ElevenLabs

Building human-like AI voices that speak, clone, dub, and converse in 70+ languages

Having established defensible voice quality and market share through its API, ElevenLabs is now becoming a multimodal generation platform with an enterprise go-to-market engine.

Cartesia

Real-time multimodal voice AI built on State Space Model foundation architecture.

Cartesia owns the SSM architecture its founders invented, a primitive with linear scaling and constant-time inference that compounds in advantage as latency budgets tighten.

AssemblyAI

Speech-to-text and audio intelligence APIs for developers building voice-powered applications.

Voice is the next API primitive after text, and AssemblyAI has an accuracy and developer-experience lead over cloud incumbents with better margins than full-stack voice agent startups carry.

PolyAI

Voice AI platform building conversational agents for customer service call centers

Voice agents are one of the clearest enterprise LLM use cases with measurable ROI, and PolyAI has real logos, real deployments, and Cambridge speech research DNA.

Back To All Companies >