HELM (Stanford CRFM), BIG-Bench (Google), MMLU/MMLU-Pro, GPQA.
MIRI, Anthropic (interpretability).
Kaggle, MLPerf, SWE-bench.
Turing Test variants, MathOlympiad benchmarks.
Chollet created the only benchmark every frontier lab treats as the AGI litmus test. Controlling the next version (ARC-AGI-2, ARC-AGI-3) means controlling what the industry optimizes for. No competitor can replicate that position without Chollet's credibility.
Through abstract reasoning benchmarks that test fluid intelligence, crowdsourced competitions surfacing novel approaches, and adaptive test design that stays ahead of AI advances.
Autonomous AI engineer that discovers better algorithms than DeepMind at a fraction of the cost.
Beat DeepMind's AlphaEvolve on an NP-hard problem for under $10 in compute. If that result generalizes, Aemon sells automated R&D to quant funds and biotech labs at a fraction of what they spend on human researchers.
Turns doomscrolling into language learning with adaptive video feeds and clickable subtitles.
Duolingo gamified language learning. Doomersion replaces doomscrolling with it. Adaptive video feeds of native content with clickable subtitles, built by a founder who self-taught Japanese through 6 years of immersion and understands how acquisition actually works.
Gives the 98% of schools without a library system an AI-powered cataloging and search platform.
98% of schools worldwide lack a proper library system. Librar collapses cataloging from weeks to hours using camera-based bulk scanning and a self-healing data backend. The niche is small but uncontested, and the data asset (structured literary metadata) compounds.
AGI lab building hybrid deep learning and program synthesis for autonomous scientific discovery.
Chollet (Keras creator, ARC-AGI inventor) and Knoop (Zapier co-founder) building hybrid deep learning and program synthesis for autonomous scientific discovery. The thesis is that current LLM scaling will plateau and learning efficiency through program synthesis is the path to AGI. ARC-AGI-3 launching as the next benchmark.