Turns doomscrolling into language learning with adaptive video feeds and clickable subtitles.
Using adaptive content recommendation from viewing behavior, multilingual speech-to-text NLP for subtitle generation, and proficiency estimation that adjusts difficulty.

|
EdTech
|
YC W26

Last Updated:
March 19, 2026

Mobile language learning app that transforms doomscrolling into immersive education, using ML-powered adaptive video feeds with clickable explainer subtitles to teach languages through short-form native content.
Adaptive difficulty scaling, personalized video recommendation feed, clickable explainer subtitles. iOS and Android. Iterative improvements to latency and feed recommendations.
Solo-founder iterating rapidly. Pivot from character-conversation app to video-feed model suggests data-informed decisions. Expansion to additional languages and generative AI content creation likely.
<p>ML-powered recommendation engine that personalizes an infinite scroll of native-language video content to each learner's proficiency level, interests, and engagement patterns in real time.</p>
The app figures out exactly which videos will teach you the most while keeping you hooked, like a TikTok algorithm that actually makes you smarter.
Doomersion's core ML use case is its adaptive video feed recommendation system, which functions as the backbone of the entire product experience. The system likely combines collaborative filtering (what similar learners engaged with), content-based filtering (linguistic complexity, topic tags, speech rate), and deep learning sequence models to predict which video a user should see next. As the user scrolls, implicit signals—watch time, replay rate, subtitle tap frequency, skip velocity—feed back into the model to continuously recalibrate difficulty and topic mix. Unlike Duolingo's rigid lesson trees, this creates a truly personalized immersion experience where no two users see the same feed. The progressive difficulty engine ensures learners are consistently operating in their zone of proximal development, maximizing both engagement and acquisition. This is the feature that transforms passive doomscrolling into active language learning without the user consciously "studying."
It's like having a language tutor who secretly replaced your TikTok For You Page with content that's perfectly calibrated to stretch your brain just enough that you don't notice you're learning.
<p>Automatic speech recognition pipeline that transcribes native-language video audio in real time and generates clickable, context-aware explainer subtitles with instant translations and grammar breakdowns.</p>
Every video automatically gets smart subtitles you can tap to instantly understand any word or phrase, like having a patient tutor sitting inside your captions.
Doomersion's clickable explainer subtitles represent a sophisticated ML pipeline that begins with automatic speech recognition (ASR) and extends through multiple NLP layers. The system must first accurately transcribe spoken audio from native-language videos—a non-trivial challenge given diverse accents, colloquial speech, background noise, and varying speech rates. Once transcribed, a tokenization and morphological analysis layer segments the text into meaningful linguistic units (words, particles, conjugated forms) appropriate to the target language's structure. A contextual translation model then generates tap-to-reveal definitions that account for polysemy and idiomatic usage—not just dictionary lookups but context-aware explanations. For agglutinative languages like Japanese, this requires sophisticated word boundary detection and kanji reading disambiguation. The entire pipeline must run with near-zero perceptible latency to maintain the doomscrolling flow state. This feature eliminates the traditional friction of pausing content to look up unknown words, which is the single biggest dropout point in immersion-based language learning.
It's like watching a foreign film where every subtitle is secretly a hyperlink to a mini language lesson, except it all happens so fast you forget you're not fluent yet.
<p>ML-driven learner proficiency model that continuously estimates each user's vocabulary knowledge and grammatical competence across skills, then orchestrates spaced re-exposure of target language elements through naturally occurring video content rather than flashcards.</p>
The app secretly tracks every word you've learned and makes sure you keep bumping into it in new videos right before you'd forget it—spaced repetition disguised as entertainment.
Traditional language learning apps rely on explicit spaced repetition systems (SRS) like flashcard decks, which feel like studying and suffer from high abandonment rates. Doomersion's most novel ML application is likely an implicit spaced exposure system that embeds SRS principles directly into the content recommendation algorithm. The system maintains a probabilistic knowledge model for each user—a dynamic map of estimated vocabulary and grammar competence across thousands of linguistic items, updated with every video interaction. When the model predicts a word or structure is approaching the forgetting curve threshold, it biases the recommendation engine to surface videos that naturally contain that element, creating organic re-exposure without the user ever seeing a flashcard. This requires solving a multi-objective optimization problem: balancing spaced repetition needs, user interest preferences, difficulty progression, and content freshness simultaneously. The clickable subtitle interaction data provides ground-truth signals—if a user taps a word they previously demonstrated knowledge of, the model registers decay and schedules more aggressive re-exposure. This transforms passive consumption into an invisible, scientifically-grounded learning system that feels like entertainment but operates like a precision education tool.
It's like if Netflix secretly rearranged which shows you see next to make sure you never forget a plot point from three seasons ago—except instead of plot points, it's Japanese vocabulary.
Mostafa self-taught Japanese through 6 years of immersion and combines dual engineering/business training from Penn M&T/Wharton. He built a product that authentically replicates the immersion journey he lived.