Ishiki Labs

Roadmap & Position in Conversational AI

Builds socially aware AI meeting assistants that know when to speak and when to stay silent.

Company Overview

Builds socially aware, real-time AI meeting assistants (Fern) that use multimodal LLMs and proprietary social intelligence to join professional conversations as invisible copilots or active participants, providing expert opinions, coaching, and task delegation.

What They're Building

The company's public product roadmap & what they're committed to building.

Ishiki Labs has publicly announced Fern, their real-time AI meeting assistant with dual-mode operation (Full Presence and Shadow Mode), real-time expert opinions, and task delegation. Speeds comparable to ChatGPT Voice and Gemini Live. Expansion into sales coaching, deeper meeting platform integrations, and socially aware AI that knows when to speak and when to stay silent.

Latest Intelligence

Zeitgeist tracks private signals to determine where the company is heading strategically.

Competitors

AI Meeting Assistants

Otter.ai, Fireflies.ai, Grain (transcription/summarization-focused).

AI Sales Copilots

Gong, Chorus (now ZoomInfo), Clari Copilot.

Real-Time AI Agents

Google Gemini Live, OpenAI ChatGPT Voice, Recall.ai.

Socially Aware AI

Hume AI (emotion AI), Cogito (behavioral AI for calls).

Ishiki Labs

's Moat:

Social intelligence (knowing when to speak, when to stay silent, reading conversational cues) is a technically harder problem than transcription or summarization. Meta AR/VR and Citadel Securities real-time systems backgrounds give the team rare expertise in low-latency multimodal processing. Meeting transcription is commoditized; live participation is not.

How They're Leveraging AI

AI Use Overview:

Using social-aware multimodal inference from audio and video, real-time sentiment analysis during calls, and multimodal data curation for training.

More Similar Companies

ElevenLabs

Building human-like AI voices that speak, clone, dub, and converse in 70+ languages

Having established defensible voice quality and market share through its API, ElevenLabs is now becoming a multimodal generation platform with an enterprise go-to-market engine.

Deepgram

Voice AI infrastructure for real-time speech-to-text, text-to-speech, and voice agents.

Deepgram controls the full vertical stack from bare-metal training hardware to a Rust inference runtime, a cost and latency moat that API competitors riding hyperscaler infrastructure cannot replicate without years of capex.

Cartesia

Real-time multimodal voice AI built on State Space Model foundation architecture.

Cartesia owns the SSM architecture its founders invented, a primitive with linear scaling and constant-time inference that compounds in advantage as latency budgets tighten.

AssemblyAI

Speech-to-text and audio intelligence APIs for developers building voice-powered applications.

Voice is the next API primitive after text, and AssemblyAI has an accuracy and developer-experience lead over cloud incumbents with better margins than full-stack voice agent startups carry.