The Token Company

Product & Competitive Intelligence

Compression middleware cutting LLM costs by 66% while improving accuracy by +1.1%.

Company Overview

Builds compression middleware that uses proprietary bear-1 and bear-1.1 ML models to prune and compress tokens sent to LLMs, reducing API costs by up to 66%, cutting latency, and improving output quality for any model provider.

Competitive Advantage & Moat

Product Roadmap & Public Announcements

The Token Company has publicly positioned its core product as a model-agnostic, API-first compression layer for LLM prompts and outputs, with demonstrated benchmarks of 66% token reduction with +1.1% accuracy improvement. Working with Pax Historia (60K DAU, 15th largest token consumer on OpenRouter, 193B tokens/month), they showed users prefer outputs from compressed inputs and saw a +5% lift in user purchases. They compress 100K tokens in under 100ms. The vision is to optimize every LLM request in the world at the token level before it reaches a model.

Signals & Private Analysis

Job postings and GitHub signals suggest investment in adaptive compression algorithms, including both lossy (semantic pruning, summarization) and lossless (dictionary-based, meta-token) techniques. Community and conference activity hints at upcoming support for multimodal compression (images, structured data), RAG pipeline integration, and agentic workflow optimization. The emphasis on non-generative, ultra-fast ML models (<100ms for 100K tokens) suggests a future enterprise play around real-time, high-throughput LLM cost management. YC partners reached out to Otso directly rather than him applying for the batch.

Product Roadmap Priorities

Context window information density
Improving
Decision Quality
Data

Intelligent Context Window Optimization: ML-driven compression selects and compresses the most semantically relevant retrieved documents to maximize information density within LLM context windows for RAG workflows.

In Plain English

It fits more of the right information into the LLM's reading window so it gives smarter answers from your documents.

Analogy

It's like packing for a trip with a tiny suitcase—instead of leaving your best outfits behind, a genius packer folds everything perfectly so you bring more of what matters.

Real-time token importance scoring
Improving
Cost Reduction
Engineering

Semantic Token Pruning: A proprietary, non-generative ML model identifies and removes redundant or low-importance tokens from LLM prompts in real time, preserving semantic meaning while reducing token count by up to 20%.

In Plain English

It's like a spell-checker for wasted words—automatically trimming the fat from every prompt so you pay less and get answers faster.

Analogy

It's like having a brilliant editor who reads every email you send to your lawyer, crosses out the fluff, and somehow makes your case stronger—all in the time it takes to blink.

Compression-aware cost analytics
Improving
Operational Efficiency
Operations

LLM Cost & Performance Analytics: ML-powered analytics layer that tracks token savings, compression ratios, accuracy impact, and latency improvements across all LLM API calls, enabling data-driven optimization of AI spend.

In Plain English

It's a smart dashboard that shows exactly how much money and time you're saving on every AI call, and where you can save more.

Analogy

It's like getting an itemized electricity bill that not only shows what each appliance costs to run, but also automatically turns off the ones you forgot about.

Company Overview

Key Team Members

  • Otso Veisterä, Founder & CEO

Otso Veisterä is an 18-year-old solo founder, Finnish National Physics Champion, and Z Fellow who combines precocious technical ability with venture ecosystem insight from an EIR stint at Lifeline Ventures. He is building infrastructure that solves a universal LLM cost/performance pain point with a lightweight, non-generative ML model that is orders of magnitude faster than the LLMs it optimizes, a rare technical moat in a space where most competitors rely on prompt engineering heuristics or full LLM-based summarization.

Funding History

  • 2025 | Otso Veisterä founds The Token Company.
  • 2026 | Accepted into Y Combinator Winter 2026 batch (YC partners reached out directly).
  • 2026 | $500K seed round from Y Combinator and Wave Ventures.
  • 2026 | bear-1 and bear-1.1 compression models launched. Actively hiring ML engineers in San Francisco.

Competitors

  • Prompt Compression Tools: LLMLingua (Microsoft Research, open-source), Selective Context (academic), Compresr (YC W26, Context Gateway).
  • API Cost Optimization Platforms: Helicone, Portkey, LiteLLM (routing/caching).
  • LLM Gateway/Middleware: Martian, Not Diamond (model routing).
  • General Token Management: OpenAI Tokenizer, Anthropic prompt caching (native provider features).