A discovery agent has a model that fits every run it has logged. It can propose the experiment that most reduces its uncertainty, and it is good at that. The move it cannot make on its own is the one that matters: the variable that would explain the residual is not in its vocabulary, so the question that would settle things cannot yet be posed. It keeps running experiments inside a frame that will never contain the answer. The decisive act is not collecting more evidence. It is changing what can be asked, in the place where the current frame is blind.
That move has a name on this page. It is an attempt to make discovery engineerable. We treat ignorance as a first-class object with internal geometry, and inquiry as a controllable operator that changes what can be expressed, tested, and retained. The question to carry through: what would change your mind, and what is the smallest experiment that would settle it?
The lab exists because most real systems fail not from lack of computation or data, but from fragile understanding under drift: teams can plan once they have a model, and they can verify once they have disciplined measurement, but they repeatedly stall at the same upstream bottleneck of constructing a living, local world model that stays usable as reality changes.
AI should not output "answers"; it should solve real problems. An answer is the fluent paragraph the chatbot hands you, the one that reads perfectly and is sometimes flatly wrong. A solved problem is when something in the world is different afterward, and you can show why it worked and what would have told you it was failing:
The point of AI is not to eliminate uncertainty; the point is to keep uncertainty structured so execution remains safe, fast, and revisable.
(as we engineer it)
We define general intelligence operationally as a loop over three objects:
Construct and maintain an internal world model that explains what is happening and what could happen. This is not "fitting a curve"; it is choosing variables, mechanisms, and representations that make the situation legible.
Use the world model to synthesize actions that either (a) change the world, or (b) collapse uncertainty cheaply.
Check claims against traces; detect drift; keep outputs auditable and corrigible.
Here is the part that carries the weight. Humans do not merely adapt their internal model to the world; they repeatedly try to reshape the world to match internal models. The hunter does it. Every founder does it. That one capability is what produces disproportionate impact, and also disproportionate failure.
The primary object is a local theory-state: what the system currently believes is going on, what it is holding fixed, what it can express, and what it has actually observed. Internally we represent a theory-state in a compact, typed format (URS), and we track not just "uncertainty," but structured ignorance. In plain words: a running picture of what the system currently believes, what it is holding fixed for now, and what it has actually seen with its own eyes versus what it has only assumed.
"Knowledge is what remains after inquiry. Ignorance is not a residual error term; it is the generative substrate. If you can represent the geometry of ignorance, you can engineer inquiry."
[the agent-independent operators]
The calculus of discovery is the agent-independent part of our work: a disciplined way to go from traces to mechanisms and back again, without collapsing into either storytelling (pure narrative) or brute learning (fit-first). It treats inquiry as a sequence of explicit operators:
Propose new variables and mechanisms; move unarticulated regularities into articulated questions.
Design minimum interventions that collapse uncertainty cheaply. This is the agent's single decisive probe: the cheapest experiment that rules a live branch in or out.
Retype traces into stable representations; keep language consistent with data and provenance.
Import external knowledge as traces and representations without forcing it into the model as unquestioned axioms.
What are the four classes for? They tell you what to do next, what would change your mind, and which parts of the model are stable under stress. The goal is not "truth." It is a world model with explicit ignorance.
| Class | Definition |
|---|---|
| Plausibility | Could a claim be made consistent with current structure? A candidate mechanism still fits every run logged so far, so the agent keeps it on the table. |
| Verification | Is there evidence or proof for this claim against the current traces and representations? One held-out probe would confirm the mechanism or break it. Until it runs, fits-the-data is all it is. |
| Invariance | How robust is a claim under future coherent challenges (threat sets, stress tests)? The model held across every regime it was trained on. What happens on the first distribution it was never shown? |
| Efficiency | Validated novelty per unit resource: how much externally checkable knowledge obtained per budget. Compute and labelled runs are finite. Which single experiment buys the most settled knowledge per unit of both? |
[the agent-dependent counterpart]
The agent-dependent part: not only "what is known," but the structure of the agent that is doing the knowing. Representation biases, inductive priors, and the capacity to invent new representations when existing ones cannot compress the situation.
Some structure is learned, but that depends primarily on the agent, not on the task. Two agents can face the same traces and diverge wildly because they factor the world differently. An agent with a compatible internal structure for a task performs disproportionately well; an agent that can discover new structure (new variables, new grammars, new decompositions) is what we would call "genius" in practice.
Accumulates bedrock content. Tends to constrain you to what is repeatedly experienced and safely generalized.
Ability to posit a structure cheaply from thin traces and then act to make it real. Allows a local world model to be created quickly, even if imperfect, and then enforced on reality.
Ignorance is not a scalar. It has types and boundaries.
What is explicitly known and supported.
What is explicitly known to be unknown.
What is implicitly present in traces and regularities but not yet structured into the current representation. This is the model whose residuals carry a structure no feature captures: the regularity is sitting in the data, and no variable in the current representation names it yet.
What is outside the current language and measurement system. This is the question you could not even ask before the periodic table existed: the property of an element nobody had yet discovered.
A discovery system fails when it hides these distinctions behind a single score. Our approach keeps them explicit so the system can choose the minimum next question.
(each diagram is the compressed form of a real situation)
Takeaway: Stalled action is usually a missing constraint; small probes surface it, and then decisions become obvious.
Every run so far fits the model. The next experiment it would propose only shaves uncertainty inside the variables it already has. The mechanism it is missing does not live among them.
Takeaway: The decisive move is not the next measurement. It is noticing the question the current vocabulary cannot pose, and growing the vocabulary until it can.
One attempt. The signs in the dirt are partial and might be lying. A wrong read means the group does not eat this week.
Takeaway: When failure is too expensive for trial-and-error, thin traces steer a world model; the plan reshapes the environment.
Takeaway: Breakthroughs come from changing representation, not collecting more examples; structure choice controls what you can see next.
Discovery can be engineered: it is a sequence of operators under explicit ignorance, not a hunch you cannot reproduce.
Robustness is a stronger primitive than confidence: what survives future coherent challenges is what you can safely build on.
Representation is not a formatting choice: it is a generative grammar that controls what mechanisms can be discovered.
A serious system earns the right to plan: it outputs minimum next actions tethered to evidence, and it knows what would change its mind.
If you want to understand what we do, do not ask "what model do you use?" Ask:
Zetesis is built for domains where failure has a cost, drift is real, and justification chains matter.