How Is

ARC Prize Foundation

Using AI?

Defines how the world measures progress toward artificial general intelligence.

Through abstract reasoning benchmarks that test fluid intelligence, crowdsourced competitions surfacing novel approaches, and adaptive test design that stays ahead of AI advances.

Company Overview

A non-profit foundation that maintains the ARC-AGI benchmark, a test designed to measure genuine machine intelligence by evaluating fluid reasoning and abstraction. They run annual competitions with significant cash prizes and have become the go-to benchmark for frontier AI labs evaluating progress toward AGI.

Product Roadmap & Public Announcements

ARC-AGI-2 was released in 2025 as a harder version resistant to brute force. The 2024 competition on Kaggle drew 1,454 teams with top score around 55.5% (well below human performance of approximately 85%). Over $125,000 in prizes were awarded. The foundation plans ARC-AGI-3 and is building an academic network and frontier AI lab coalition for collaborative benchmarking.

Signals & Private Analysis

Chollet's conference talks hint at formal taxonomies of intelligence beyond pattern matching. The foundation is building long-term institutional credibility as the definitive arbiter of AGI progress. OpenAI CEO Sam Altman signaled intent to partner on future benchmarks in December 2024.

ARC Prize Foundation

Machine Learning Use Cases

Abstract Reasoning Evaluation
For
Product Differentiation
Product

<p>Maintains and evolves the ARC-AGI benchmark to evaluate whether AI systems can perform genuine abstract reasoning on novel tasks they've never seen before.</p>

Layman's Explanation

It's an IQ test for AI that checks if machines can actually think creatively instead of just memorizing answers.

Use Case Details

The ARC-AGI benchmark consists of a curated set of visual grid-based puzzles that require test-takers (AI systems) to identify abstract patterns and apply them to novel inputs — tasks that are trivially easy for humans but extremely difficult for current large language models and deep learning systems. Each task presents a few input-output example pairs, and the system must infer the underlying transformation rule and apply it to a new input. The benchmark is specifically designed to resist memorization and brute-force approaches, requiring genuine program synthesis and fluid reasoning. ARC Prize Foundation maintains private holdout test sets to prevent data contamination, publishes open leaderboards, and runs annual competitions with significant cash prizes (over $1M) to incentivize the global research community. The benchmark has become the de facto standard for measuring progress toward AGI-like reasoning capabilities, with results consistently showing that even the most advanced frontier models (GPT-4, Gemini, Claude) score well below average human performance, highlighting the gap between pattern matching and true intelligence.

Analogy

It's like giving a genius parrot a Rubik's Cube — sure, it can repeat everything you've ever said, but can it actually solve a new puzzle it's never seen before?

Incentivized Research Crowdsourcing
For
Decision Quality
Strategy

<p>Runs large-scale open competitions with $1M+ prize pools to crowdsource novel algorithmic approaches to general intelligence that go beyond current deep learning paradigms.</p>

Layman's Explanation

They're offering a million-dollar bounty to anyone who can build an AI that's actually smart, not just well-read.

Use Case Details

ARC Prize Foundation operates annual global competitions — hosted primarily on Kaggle — where researchers, engineers, and hobbyists submit AI systems that attempt to solve the ARC-AGI benchmark tasks. The competition structure is deliberately designed to reward novel approaches: since standard LLM fine-tuning and scaling alone cannot achieve high scores, participants are pushed toward program synthesis, neuro-symbolic methods, discrete search, and other non-mainstream ML techniques. The prize structure includes both a grand prize for crossing the human-level threshold and tiered prizes for incremental progress, ensuring broad participation. The foundation publishes all winning solutions as open-source, creating a growing public repository of cutting-edge reasoning algorithms. This crowdsourcing model has proven remarkably effective — the 2024 competition attracted thousands of teams and produced solutions combining test-time training, program search, and ensemble methods that significantly advanced the state of the art. By making the competition open and the results public, ARC Prize accelerates the entire field's understanding of what works (and what doesn't) for achieving general intelligence.

Analogy

It's like DARPA's Grand Challenge but for building a brain — throw enough prize money at the world's smartest nerds and eventually someone figures out how to make a car drive itself, or in this case, how to make AI actually reason.

Adaptive Benchmark Design
For
Risk Reduction
Engineering

<p>Develops progressively harder benchmark versions (ARC-AGI-2 and beyond) that adapt as AI capabilities improve, ensuring the benchmark remains a meaningful measure of intelligence rather than becoming saturated like previous AI tests.</p>

Layman's Explanation

They keep making the test harder so AI companies can't just claim their chatbot is a genius because it aced last year's easy exam.

Use Case Details

One of the biggest problems in AI evaluation is benchmark saturation — tests like ImageNet, SuperGLUE, and even MMLU have been "solved" by frontier models, making them useless for measuring further progress. ARC Prize Foundation addresses this by treating the benchmark as a living, evolving artifact. ARC-AGI-2, released in 2025, introduced significantly harder tasks that require deeper levels of abstraction, multi-step reasoning, and compositional generalization. The design philosophy ensures that tasks remain easy for humans (preserving the "common sense" baseline) while being dramatically harder for machines. The foundation employs rigorous task design principles: each puzzle must be solvable by a typical human in under a few minutes, must not require specialized knowledge, and must resist solution by memorization or statistical shortcutting. By continuously raising the bar and maintaining strict anti-contamination protocols (private test sets, novel task generation), ARC Prize ensures that their benchmark remains the most credible and challenging measure of machine intelligence available — a critical service as AI companies increasingly race to claim AGI milestones.

Analogy

It's like how the SAT keeps getting redesigned because prep companies crack the old version — except here the stakes are whether we actually achieve artificial general intelligence or just build really convincing fakers.

Key Technical Team Members

  • François Chollet , Co, Founder & Creator of ARC-AGI Benchmark
  • Mike Knoop , Co, Founder & Executive Director
  • Greg Kamradt , Head of Operations

Chollet created the benchmark that exposed reasoning limitations of modern LLMs and is one of the most cited AI researchers (Keras creator, Xception paper with 18,000+ citations). Knoop brings operational scale from Zapier. The benchmark is already adopted by OpenAI, Anthropic, Google DeepMind, and xAI, giving the foundation unmatched credibility and network effects.

ARC Prize Foundation

Funding History

  • 2019: Chollet publishes the original ARC paper
  • 2024 Jun: ARC Prize competition launched with $1M+ grand prize
  • 2024: Competition runs on Kaggle, 1,454 teams, top score ~55.5%
  • 2024 Nov: Chollet leaves Google after 9+ years
  • 2025 Jan: ARC Prize Foundation formally announced as non-profit
  • 2025: ARC-AGI-2 released
  • 2026: Foundation continues as leading AGI benchmark organization

ARC Prize Foundation

Competitors

  • Benchmarks: HELM (Stanford CRFM), BIG-Bench (Google), MMLU/MMLU-Pro, GPQA
  • AGI Research: MIRI, Anthropic (interpretability)
  • Competition Platforms: Kaggle, MLPerf, SWE-bench
  • Intelligence Tests: Turing Test variants, MathOlympiad benchmarks
More

Companies
Get Every New ML Use Cases Directly to Your Inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.