Research · AI

AI

You have probably already caught one of these tools red-handed. It cited a case that does not exist, or stated a fact with total composure and was flatly wrong. The unsettling part was never the mistake. It was the confidence: the answer read clean, every line plausible, and nothing on its face told you to doubt it.

Now run that forward. Point one of these systems at a hard open problem and ask it for a proof. It returns one. The steps look right. It is the kind of write-up a tired referee skims at midnight and waves through. If the proof has to compile in a system like Lean before anyone trusts it, the bluff dies on contact: the machine either closes every step or it does not, and no amount of fluent prose moves that verdict. Where no such gate exists, nobody catches it, and the wrong result travels until someone, much later, builds on it.

That is the gap that should worry you, because you live on the wrong side of it. You are the one who acts on the answer. The machine pays nothing if it was wrong.

So the question this page is built around is synthetic discovery: AI that conducts inquiry, not AI that answers questions. The work below clusters around it, from the operational evaluation framework outward through to the formal kernels that let any of it be checked.


Centerpiece · Synthetic Discovery

Two preprints define the programme, and both attack the same gap from the hero: a confident answer you cannot check. One asks how you would tell whether an AI is conducting discovery; the other asks what a reward signal must look like for the answer to mean anything.

The signal we train on, and the process we evaluate.
Process Discovery tetrad
a recoverable causal core; a process-level test.
Reward Signed compression progress
endpoint audit improvement, no horizon games.
Σ rt = ℰ(θ0) − ℰ(θT) Telescoping identity · Goodhart-resistant by construction

Goodhart-resistant by construction means the machine cannot win by trying harder at the wrong thing. The only way to raise the score is to actually improve what the audit measures.


Foundations · Learning theory, formalized

The centerpiece carries weight only if the learning theory underneath it actually holds. These are the pieces that make the formal layer load-bearing rather than decorative.

Null Measurability at the Symmetrization Interface in VC Learning

D. Gupta · arXiv:2604.25028, April 2026

The hypothesis "the loss is Borel-measurable," repeated through decades of learning-theory proofs, is stronger than necessary. The right object is null-measurability on the symmetrization side. A strict separation witness shows that the relaxation is not cosmetic: there are concept classes whose bad event is null-measurable but not Borel.

Practically, this is what removes a measurability lemma from the path between an architecture and a PAC guarantee.

arXiv:2604.25028 arXiv → Founder mirror →

Formal Learning Theory Kernel Synthetic Discovery

21,728 lines of Lean4, machine-checked with no gaps.

A self-contained Lean4 kernel of formal learning theory, organized around the five-way equivalence at the heart of PAC theory: learnability iff finite VC dim iff bounded compression scheme iff vanishing Rademacher iff polynomial growth function. The Borel-analytic separation result above is one of the kernel's loadbearing lemmas; constructive MWU compression, PAC-Bayes (first Lean4 formalization on record), and Choquet capacitability (previously absent from Mathlib4) sit alongside it.

Formalization is the claim, checked by the Lean4 compiler. Portions are upstreamed where the maintainers' quality bar is met. The compiler is the referee the opening scene was missing: it cannot be charmed, flattered, or rushed, and every one of those 21,728 lines had to satisfy it.

Transformer Learning Theory

Measurability-theoretic foundations for attention architectures.

Lean4 results for attention as a measurable router: attention-routing measurability; the softmax / argmax equivalence under appropriate limits; a universality statement (every measurable router is attention); and a null-measurable necessity result that ties the transformer back to the symmetrization-interface paper above.

Version Space Abstraction in Lean4 cslib

D. Gupta · cslib PR #592, merged 2026-05-29

First upstream contribution from the kernel to Lean4's cslib: a version-space abstraction over PAC learning, following Mitchell (1977) and Angluin (1980). The piece carries the standing of a kernel module that meets cslib maintainers' quality bar, which is the operational criterion we use to decide what counts as finished.


Frontier · Where the harness meets the world

Two strands at the edge: a practical proof-search harness, and an epistemic-theory line that the kernel formalises end to end.

LLM Harness · First Proof Challenge Synthetic Discovery

Operational pipeline against Harvard's benchmark; commodity compute.

An LLM-driven proof-search harness reaching 4 of 10 on Harvard's First Proof Challenge using $32 of commodity API compute on a single MacBook. The headline number is not the point. The shape of the failures is. Where the harness succeeds is where formalization gates exist; where it fails is where they are absent. That is the whole argument of this page shown in one cheap experiment: the harness wins exactly where a gate forces honesty, and loses exactly where nothing does.

Spekkens 2007 · Hidden Capacity Synthetic Discovery

31 machine-checked theorems extending the toy-theory of epistemic quantum states.

A Lean4 formalization of Spekkens's 2007 epistemic toy-theory of quantum states, together with HC(κ): the first continuous measure of interpolation failure in epistemic theories of mechanics. HC = 0 iff the theory's admissible configurations are rectangular. The four entangled states admit HC = 2 bits; the three separable states admit HC = 0; only private-evidence conditioning collapses nonzero HC.


The founder mirror on dhruv.html carries the full deep-dive versions of these cards, including theorems, figures, and Lean snippets.