Builds socially aware AI meeting assistants that know when to speak and when to stay silent.
Using social-aware multimodal inference from audio and video, real-time sentiment analysis during calls, and multimodal data curation for training.

|
Conversational AI
|
YC W26

Last Updated:
March 19, 2026

Builds socially aware, real-time AI meeting assistants (Fern) that leverage multimodal LLMs and proprietary social intelligence to join professional conversations as invisible copilots or active participants, providing expert opinions, coaching, and task delegation.
Ishiki Labs has publicly announced Fern, their real-time AI meeting assistant with dual-mode operation (Full Presence and Shadow Mode), real-time expert opinions, and task delegation. They have demonstrated speeds comparable to ChatGPT Voice and Gemini Live. Public messaging emphasizes expansion into sales coaching, deeper meeting platform integrations, and socially aware AI that knows when to speak and when to stay silent.
Behind the scenes, the founders' deep Meta Reality Labs and Meta AI Research backgrounds in multimodal LLMs, smart glasses, and AR/VR signal a likely push toward embodied AI and wearable integrations. Their IshikiLabs Capture app for structured audio/visual data collection suggests proprietary dataset building for fine-tuning social awareness models. The two-person team and minimal hiring indicate a lean, research-heavy phase focused on core model differentiation before scaling. GitHub and research publication patterns point to work in variational autoencoders and disentangled representation learning, hinting at next-gen personalization and context modeling. YC W26 participation signals an imminent fundraise (likely Series A in late 2026) and go-to-market acceleration.
<p>Fern acts as an invisible, socially aware AI copilot in live meetings, understanding conversational dynamics to deliver contextually appropriate interventions without interrupting natural flow.</p>
It's like having a brilliant colleague who sits in your meeting, reads the room perfectly, and whispers exactly the right insight at exactly the right moment.
Fern's core meeting copilot mode uses a multimodal deep learning pipeline that fuses real-time audio transcription, speaker diarization, and conversational context modeling to understand not just what is being said, but the social dynamics of how it is being said. A proprietary "Conscious Intelligence" layer—built on transformer-based LLMs augmented with turn-taking prediction and social timing models—determines the optimal moment to surface information, suggest actions, or remain silent. The system operates in "Shadow Mode" (private feedback visible only to the user) or "Full Presence" (active participant), allowing flexible deployment. Low-latency inference pipelines ensure responses arrive within the natural cadence of conversation, making the AI feel like a seamless collaborator rather than a disruptive tool. This is fundamentally different from post-meeting summarization tools because it operates in real time and adapts its behavior to social context.
It's like having a world-class executive assistant who can read the room better than most humans but never steals the spotlight.
<p>Fern provides live, context-aware sales coaching during calls by analyzing prospect sentiment, objection patterns, and conversational momentum to deliver real-time tactical recommendations to sales reps.</p>
It's like having your company's top sales closer whispering winning tactics in your ear during every deal call.
Fern's sales coaching capability extends its socially aware AI framework into the high-stakes domain of live sales conversations. The system ingests real-time audio and applies sentiment analysis, objection detection, and conversational momentum scoring to build a dynamic model of how the call is progressing. Using fine-tuned LLMs trained on successful sales interaction patterns, Fern identifies critical moments—such as when a prospect raises a pricing objection, shows buying signals, or begins to disengage—and surfaces tactical recommendations to the rep in real time via Shadow Mode. Unlike post-call analytics tools like Gong or Chorus that provide retrospective insights, Fern intervenes in the moment, enabling reps to adjust their approach mid-conversation. The system can also delegate tasks in real time, such as pulling up competitive battle cards, pricing calculators, or case studies relevant to the prospect's specific objections, dramatically reducing the cognitive load on the sales rep.
It's like having a GPS for sales calls—instead of telling you where you went wrong after you're lost, it reroutes you in real time before you miss the turn.
<p>The IshikiLabs Capture app collects structured, high-quality audio and visual data from real-world interactions with real-time quality checks, building proprietary training datasets for socially aware AI models.</p>
It's like building a massive library of "how humans actually talk to each other" so the AI can learn social skills the way people do—by watching and listening.
Ishiki Labs' IshikiLabs Capture application represents a novel approach to solving one of the hardest problems in socially aware AI: obtaining high-quality, structured training data that captures the nuances of human social interaction. The app enables controlled collection of audio and visual data from real-world professional interactions, applying real-time quality checks including audio clarity validation, speaker balance verification, and metadata tagging for conversational context (e.g., meeting type, participant roles, emotional tone). This structured data pipeline feeds directly into the training of Ishiki's proprietary social timing models, turn-taking predictors, and multimodal fusion layers. By building a proprietary dataset rather than relying on public corpora, Ishiki Labs creates a compounding data moat—each interaction collected improves model performance, which improves user experience, which drives more usage and data collection. The approach draws on the founders' research background in disentangled representation learning and variational autoencoders, enabling the extraction of separable social features (timing, tone, intent, emotion) from raw multimodal data for more efficient and interpretable model training.
It's like building a flight simulator for social skills—you need thousands of hours of real flight data before the simulator can teach pilots anything useful.
Ishiki Labs combines rare expertise from Meta's most advanced AI and AR/VR programs with real-time systems engineering from Citadel Securities, enabling them to build socially intelligent AI that understands human conversational dynamics at a level competitors focused purely on transcription or summarization cannot match.