Process-Level Evaluation of Synthetic Discovery: The Discovery Tetrad Synthetic Discovery
D. Gupta · Zenodo, May 2026 · v1 preprint
There are two ways a result fools you. It looks right and is, but for reasons that will not hold next time. Or it is faked and still reads clean. A final answer cannot tell those apart, which is why final answers in AI science are easy to game. The Discovery Tetrad moves the evaluation upstream to the process, and asks the four questions a real investigator's notebook would have to answer: what was the system trying to find (goal structure), what would count as having found it (witness structure), why does the answer hold (explanation), and what would have proved it wrong (refutation pathway)?
The result is a process-level test that does not care whether the answer happens to be right; it cares whether the discovery was reproducible. The construction is small enough to fit on a slide and strict enough to fail systems that look impressive on aggregate benchmarks.