Deploys AI mathematicians that formally verify proofs, grounding outputs in truth not guesses.
Using LLM autoformalization into verified Lean 4 code, multi-agent collaborative proof search, and synthetic proof generation for a self-improving training flywheel.

Technology
|
Formal Verification
|
YC W26

Last Updated:
March 19, 2026

Massively scaling formal verification to accelerate scientific discovery. Deploys superhuman AI mathematicians to high-impact applied domains, starting with quantum computing and finance. Uses Lean, a framework for formally verifying mathematical statements, to ground AI in truth and validate discovered tools.
Cajal's Tau multi-agent system uses LLMs and Lean to automate and formally verify mathematical proofs. Starting with quantum computing and finance applications. The platform grounds AI outputs in mathematical truth, addressing the hallucination problem for domains where correctness is critical.
Formal verification is gaining attention as AI systems are deployed in high-stakes domains. The Lean ecosystem is growing rapidly with contributions from Microsoft Research and the mathematics community. Likely targeting quant finance firms and quantum computing companies first.
<p>Tau uses LLMs to automatically translate informal mathematical statements and proofs into formally verified Lean 4 code, eliminating manual formalization bottlenecks.</p>
An AI reads a math textbook written in plain English and rewrites every proof in a language a computer can check for absolute correctness.
Cajal's autoformalization pipeline is the foundational ML use case powering the Tau system. Large language models are fine-tuned to parse natural language mathematics—from research papers, textbooks, and informal notes—and generate syntactically and semantically correct Lean 4 formal statements and proof sketches. The system uses a generator-verifier loop: the LLM proposes candidate formalizations, and the Lean kernel immediately checks them for correctness, providing feedback that is used to iteratively refine outputs. This approach dramatically reduces the bottleneck of manual formalization, which traditionally requires deep expertise in both the mathematical domain and the proof assistant's type theory. By automating this translation layer, Cajal enables entire corpora of applied mathematics (e.g., quantum information theory, stochastic calculus for finance) to be made machine-checkable at scale, unlocking downstream applications in verified software, certified algorithms, and reproducible science.
It's like having a bilingual translator who can instantly convert a novelist's messy handwritten draft into perfectly formatted, grammar-checked legal documents—except the "grammar checker" is a mathematical proof engine that never makes mistakes.
<p>Tau deploys multiple specialized AI agents that collaboratively search for novel mathematical proofs, coordinating exploration strategies to discover results no single agent could find alone.</p>
Instead of one AI trying to solve a hard math problem alone, a team of AI specialists each tackle different angles and share clues until they crack it together.
Cajal's multi-agent proof search system is the most novel ML use case in their stack. Tau orchestrates multiple LLM-based agents, each specialized in different proof strategies—such as forward reasoning, backward chaining, lemma generation, counterexample search, and analogy-based conjecture. These agents operate in parallel, sharing intermediate results and partial proofs through a coordination layer that allocates computational resources dynamically based on progress signals. The Lean proof assistant serves as the ground-truth arbiter: any candidate proof that passes Lean's kernel is guaranteed correct, while failed attempts generate structured feedback that agents use to refine their strategies. This multi-agent architecture enables Tau to explore the proof search space far more efficiently than monolithic models, and to discover non-obvious proof paths by combining insights from different reasoning paradigms. The system is designed to scale with compute, meaning that adding more agents and GPU resources directly translates to broader and deeper mathematical exploration—a key differentiator against single-model competitors.
It's like assembling a heist crew where one agent picks the lock, another disarms the alarm, and a third cracks the safe—except the "heist" is solving an unsolved math theorem and the "safe" is a Lean proof kernel that only opens when the answer is provably correct.
<p>Tau generates synthetic formal proof datasets by systematically mutating and recombining verified proofs, creating a self-improving training flywheel that continuously enhances its own theorem-proving capabilities.</p>
The AI creates its own practice problems and answer keys from proofs it already knows are correct, then uses them to train itself to solve harder problems it's never seen before.
Cajal's synthetic data generation pipeline is a critical ML flywheel that underpins Tau's continuous improvement. Starting from verified proofs in existing formal libraries (such as Lean's Mathlib), the system applies a suite of transformations—generalization, specialization, lemma extraction, proof refactoring, and controlled perturbation—to generate large volumes of new, formally verified training examples. Each synthetic example is guaranteed correct because it is either derived from or verified by the Lean kernel. These examples are then used to fine-tune and improve the LLM agents within Tau, creating a self-reinforcing loop: better agents produce more diverse proofs, which generate richer synthetic data, which in turn produce even better agents. This approach addresses one of the fundamental bottlenecks in AI for mathematics—the scarcity of high-quality formal training data—and gives Cajal a compounding data advantage over competitors who rely solely on static, human-curated datasets. The pipeline also enables domain adaptation: by seeding the generator with proofs from a target domain (e.g., quantum error correction), Cajal can rapidly bootstrap Tau's capabilities in new scientific areas.
It's like a chess engine that plays millions of games against itself to get better, except instead of chess moves it's generating and solving math proofs—and every single one is guaranteed to be correct by an unforgiving robot referee.
The combination of ML research from Oxford/Cambridge/UCL with AI safety training gives the team both the technical chops to build theorem-proving AI and the safety mindset to ensure correctness. Lean-based formal verification provides a unique guarantee that AI outputs are mathematically sound.