
Technology
|
LLM Optimization
|
YC W26
|
Valuation:
Undisclosed

Last Updated:
March 24, 2026

Builds an LLM-native context compression API gateway that uses ML-driven relevance filtering and token-level distillation to reduce LLM input costs by up to 76% while improving accuracy.
Context-Gateway API (latte_v1) with coarse and fine-grained compression. 10x compression on SEC filings via FinanceBench. Open-source Context Gateway proxy for Claude Code, Cursor, and OpenClaw. Python SDK (pip install compresr). Web dashboard for session monitoring, spend caps, and Slack notifications.
Zero-latency background compaction, pre-computed summaries, and logging/history compaction. Show HN positioning and agentic proxy architecture suggest expansion into persistent memory management for autonomous agents. Competitive risk from LLMLingua (Microsoft Research) and native model provider compression.
Compresses massive SEC filings (230K+ tokens) down to ~10.5K tokens for LLM analysis, improving accuracy while cutting costs 76%.
It's like hiring a brilliant paralegal who reads a 500-page SEC filing and hands the analyst only the 25 pages that actually matter—faster, cheaper, and somehow more accurate.
It's like Marie Kondo organizing your filing cabinet before your accountant arrives—less clutter, better answers, and your accountant charges by the hour.
Provides real-time, zero-latency background compaction of agent conversation histories and tool outputs to keep autonomous agents within context limits during long-running tasks.
It's like giving an AI agent a photographic memory that automatically forgets the boring parts so it can keep working on hard problems without losing its train of thought.
It's like a road trip GPS that remembers every important turn you need but forgets all the straight highway stretches—so it never runs out of memory mid-journey.
Compresses large codebases and repository context fed to LLM-powered coding assistants in IDEs, enabling more accurate code generation and review within token budgets.
It's like giving your AI coding assistant the ability to skim an entire codebase and pull up only the exact files and functions it needs to write your feature—instead of trying to read the whole repo at once and getting confused.
It's like a librarian who knows exactly which three books on the shelf answer your question, instead of dumping the entire library on your desk and saying "good luck."
Ivan Zakazov (CEO) researched LLM context compression as an EPFL PhD, previously at Microsoft and Philips Research, with published papers at EMNLP-25 and NeurIPS-24 directly on prompt compression. Oussama Gabouj (CTO) researched efficient ML systems and prompt compression at EPFL's DLab and AXA, with EMNLP 2025 paper accepted. Berke Argin (CAIO) is EPFL CS, ex-UBS. Kamel Charaf (COO) holds EPFL Data Science Masters, ex-Bell Labs. All four founders from EPFL provide deep technical credibility.