Research · Economics
        

Decision Engineering

Running the organisation of the future.

Gopika Kannan & Dhruv Gupta

It happens like this. A hospital wants its beds back a day sooner, and it has a model that finds the patients who are ready to go home early. The model is good. The markers it reads are real ones: vitals, labs, the slope of recovery. On a Thursday it flags an 81-year-old woman as ready to discharge. Every number agrees.

The physician signing the order has met her. He knows the one thing the model was never given: she lives alone, up three flights, and the daughter who meant to stay flew back to Denver on Tuesday. Clinically stable and no one at home are not the same patient. That fact was never a field in the chart. He holds the discharge.

If he is wrong, a bed stays full one more night. If the model is wrong and nobody catches it, she falls on Saturday and is found on Monday. The cost of the confident output does not land on the model, and it does not land on the bed-occupancy dashboard. It lands on a woman who never saw the recommendation and has no way to argue with it.

Every organisation running AI now has a version of that moment. Here is the question most boards have not asked: when your AI is wrong, who catches it?

Not who is responsible for AI on the org chart. Not who signed the vendor contract. Who, in practice, has the situational awareness, the judgment, and the authority to look at a confident-seeming output and say: this is wrong, and here is why. If that question produces a long pause, you have a problem that is not yet showing up in your productivity numbers. It will.

Three questions for the agenda

Which decisions are now effectively AI-owned, formally or informally, and does anyone know?

When our AI systems fail, as they will, who is equipped to catch it?

What percentage of our AI investment is going toward developing the human capacity to govern what we have built?

Decision Engineering is the discipline for answering them: designing the human judgment layer that automation requires to stay safe, resilient, and strategically capable.

TL;DR

The method, in five moves

The fact that held that discharge, she has no one at home, sat in the one bucket the model cannot reach: known to a person, never written down. Sorting your work by its bucket is the whole method.

Sort your unknowns

What you know (KK), what you know you are missing (KU), what the organisation knows but never wrote down (UK), what no one has thought to ask (UU). Value rises as the buckets narrow.

Widest bucket worth least. Dashed = what AI does for free.

Test the frame

On any assumption treated as settled: what are we taking as given? what would make it wrong? what changes if it is? Then once: what is this framing keeping us from asking?

Map who can say no

A right answer still has to travel. Three layers of permission; name the person, not the layer. Hold two lists apart: who must say yes, and who pays if you are wrong.

Outer layer = trust to operate: lost once, not rebuilt.

Write the one sentence

If your problem statement cannot surprise the person who hired you, it is not the real problem yet.

Watch the opened column

Every conversation closes something and opens something. The opened column, the question you did not walk in with, is where the work is won.

Closed

a gap answered, an assumption confirmed

Opened

a new question surfaced, the list itself corrected

I · The framework Vision paper

The organisation of the future

The held discharge was one decision, on one Thursday, in one corridor. Now zoom out to the whole organisation that runs thousands of them a day.

Picture an organisation in which most of what used to be called execution runs continuously, automatically, and at near-zero marginal cost. The output is no longer in dispute. What is in dispute, every day, is which decisions still need to be made, by which kind of attention, using what kind of knowledge.

The shape of the firm inverts. Earlier firms had a wide base of execution work and a narrow apex of judgment-driven roles. The future organisation has a narrow base of execution oversight and a broad apex of judgment work. The inversion is not gradual. It is a change in the abstraction at which we define productive human work.

→

Figure 1 · The role-shape inversion

Three kinds of action

The physician did three different things in that one moment. He let the model run. He weighed whether to trust the output it produced. And he decided there was no settled rule yet for a patient who is stable and has no one at home. Those are three different kinds of work, and conflating them is how organisations lose competence at each.

Once machine-led execution runs continuously and at scale, the work that remains falls into three kinds. The category determines who, or what, should own it. Organisations of the past blurred the three; a person who tries to do all three loses competence at each. Naming them separately is the first move.

Execute

AI owns · human oversight

The task is defined, the method known, the goal set. Complete it. Runs at near-zero marginal cost. The human role here is oversight and exception-handling, not execution.

Choose the method

Human + AI

The goal is set; several methods could achieve it. Choose among them, and judge which will hold when conditions change. AI models the options; the human judges.

Set the goal

Human only

The situation is novel or conditions have changed. No method is obvious because the goal itself must be defined. Requires the willingness to name what is not yet named.

The human role, redefined

The human role is not defined by what people are good at in the abstract. It is defined relative to the failure surface of the automated layer underneath. Where automation runs reliably, human participation loses value as the system improves. Where automation fails frequently, or fails with high consequence, humans are needed precisely there. That failure surface is complex in practice, but it always falls into one of three patterns, and none is what conventional risk management is trained to catch. You have already seen the second one: a confident output that is right on the numbers and wrong about the patient.

The confident hallucination

An agent handles most inputs cleanly, then meets a case the training data did not anticipate and produces a fluent, plausible answer that is wrong. There is no broken reasoning chain to debug. The chain is intact. Confidence simply does not match accuracy on this input.

Caught before the answer is sent.

The right answer for the wrong reason

A diagnostic agent recommends the correct treatment. The audit trail shows the recommendation was driven by features with no clinical relevance. The output was right; the structure it rested on will not generalise.

Rejected even though it was correct.

The drifted goal

An autonomous agent is given a goal, satisfies it formally, and produces nothing useful for the actual programme. The optimisation succeeded. The goal specification was ill-formed for the real substrate.

Recognised, and the goal reformulated.

In each, the human work is not debugging. It is a judgment about the model itself: whether its output is admissible to the world the organisation operates on. That is the substance of the human role, and it is continuous and distributed, not a backstop held in reserve.

What it takes to make a decision

Why could the physician make that call when the model could not? Not because he was smarter. Because that decision had a shape the model was never built for.

Experience, education, analytical ability: these are not wrong answers to what makes a good decision-maker. They are answers to different questions. A decision is what a mind produces given a scenario and a set of rules. Describe the scenario as a configuration of objects, and two questions can be asked of any decision: are the relevant objects familiar, and are the rules that connect them clear? Two questions, four cases, each rewarding a different human capacity.

↓ are the rules clear? · are the objects familiar? →

Objects familiar

Objects unfamiliar

Rules clear

Type 1 yields to deduction education & practice credit policy renewal

Type 2 yields to recall experience port-closure rerouting

Rules vague

Type 3 yields to construction first-principles reasoning custom pricing, no template

Type 4 yields to commitment will & agency backing an unnamed category

Figures 2 & 3 · The decision matrix, with what each cell yields to and who handles it

The older view treated unknowns as a single quantity to be reduced: more research, more data, more meetings. Unknowns come in distinct shapes. A decision-maker excellent at one shape is not automatically competent at another. Treating these as substitutable made hiring expensive at the wrong things and short at the right ones.

How decisions distribute across the work

The four decision types do not spread evenly. They cluster, predictably, by action type. As work moves from execution toward goal-setting, the dominant decision type inverts, from the rule-bound to the conviction-driven. AI is highly effective at the top of this table. It is structurally absent at the bottom.

Decision type	Execute	Choose method	Set the goal
Type 1familiar + clear · education	dominant	present	rare
Type 2unfamiliar + clear · experience	high	dominant	low
Type 3familiar + vague · first principles	rare	growing	present
Type 4unfamiliar + no rules · conviction	absent	emerging	dominant

The traditional org chart maps people to functions. The future org chart maps decision types to the people best equipped to make them, and places judgment where the cost of getting it wrong is highest. It is an architecture exercise. Decision Engineering is the deliberate practice of recognising which kind of decision is being made, staffing it with the human work it actually requires, and building the organisation around that recognition rather than around the legacy task structure.

II · Why the problem is real now Evidence

The productivity trap

One held discharge is invisible in any quarterly report. So is its opposite: the confident error nobody caught.

The story of AI in business has been told as a productivity story: faster reports, lower costs, more output per head. The numbers are real. What they hide is a structural vulnerability that no throughput metric is designed to catch. The right question is not what the AI produced. It is whether the decisions the organisation makes are actually better because it has AI.

5%of integrated AI systems created significant, quantifiable business valueMIT Project NANDA, 2025

88% / 39%use AI in a function / can trace enterprise-level financial impact (1% call their strategy mature)McKinsey, State of AI 2025

42%of companies abandoning most AI projects, up from 17% the year beforeS&P Global, 2025

25% / 50%decision quality / speed gain, but only with measurement frameworks aimed at decision outcomesHarvard Business Review, 2024

The difference between the organisations seeing real value and the majority is not the tools they use. It is whether they have invested in the human capacity to direct those tools, evaluate their outputs, and catch them when they fail. The best-performing AI organisations invest roughly 70% of their AI resources in people and processes, and 30% in technology (BCG, January 2026). The tool is not the scarce resource. Once deployed, AI runs at near-zero marginal cost. The scarce resource is the judgment to direct it.

Automation debt

There is a quieter version of the physician's problem: the day you reach for the override and find the judgment that would have caught the error has slowly wasted, and nothing logged the loss.

Technical debt is well understood: short-term decisions that create future rework. Automation debt deserves the same attention, and gets almost none.

Automation debt · working definition

The growing gap between an organisation's automated operational reach and its human capacity to govern, interrogate, and override that automation. Like technical debt, it accrues silently and compounds over time. Unlike technical debt, it appears on no balance sheet. It becomes visible only when the automated system hits its limits, and the humans responsible for oversight discover they are no longer capable of providing it.

The Boeing 737 MAX is the most consequential available example. MCAS, the automated flight-control system that malfunctioned twice, was a product of a design process that automated a critical decision layer while failing to ensure pilots retained the situational awareness to override it. When the system operated outside its design assumptions, the human override capacity was not there. 346 people died (US House Committee on Transportation and Infrastructure, September 2020). Boeing is an extreme case; the principle is not extreme at all. Every organisation that automates a critical decision layer without maintaining the capacity to govern it accumulates a version of the same risk.

The cost is already being estimated. EY's 2026 talent research put the unrealised value that accumulates when human and machine capabilities fail to co-evolve, which they call talent debt, at more than $1 trillion in the US alone (EY, February 2026). At the individual level, MIT Media Lab researchers found a cognitive debt from AI writing assistance: short-term productivity gains that silently accumulated long-term costs in critical thinking, shown in weaker neural connectivity (Kosmyna et al., 2025).

AI deployment without judgment investment is not progress. It is accumulated risk, dressed up as productivity.

Where AI can and cannot operate

The fact that held the discharge, she has no one at home, was never a field in any chart. It lived in the one place a model cannot reach, and this map is the place that lives.

The decision matrix sorts decisions by their unknowns. A second map sorts knowledge by how reachable it is, and it draws the line AI cannot cross. Efficiency gains are table stakes; they accrue in the top two quadrants. Competitive advantage accrues to organisations that invest in the bottom two, which are exclusively human.

KKwhat we know we know

Standard procedures, proven methods, established knowledge.

AI dominates · near-zero cost

KUwhat we know we don't know

Defined gaps, open questions. Humans set the question; AI runs the analysis.

Human + AI collaboration

UKwhat we know but haven't formalised

Tacit expertise and pattern recognition before articulation. The richest and most neglected domain. The human role is to excavate implicit structure and convert it into explicit, testable form.

AI cannot access · human only

UUwhat we can't yet ask

Outside current representation systems entirely. Paradigm shifts, genuinely novel territory. The human role is to suspend current certainties long enough for a new question to form.

AI cannot operate · human only

Based on the Zetesis Ignorance Architecture.

A worked example: a diagnostic at scale

The same sorting works on a factory floor, where the stakes are welds instead of a fall.

At a prominent Asian electric-vehicle manufacturer, an uncertainty diagnostic was applied to a causal world-model of welded-subassembly failures. Rather than treat the diagnostic as one decision, the team classified the whole population of failure paths by what it would take to resolve each. The partition told the firm where to invest, and the recommendation differed in kind for each class.

~60%resolvable from existing error codes (AI-amenable)

~12%need targeted additional evidence (moderate investment)

~28%structurally unresolvable without live data integration (a category change, not tuning)

That is Decision Engineering on a manufacturing line: identify the cells, route attention by cell, and invest in the upstream artefact that makes the routing possible.

III · What we are pursuing Open research

The inquiry

Three questions the discharge opens but does not close, and that we are still working.

The framework above is the settled, citable layer. It opens onto three questions that are not settled, and that the programme is actively working. These are stated as direction and structural bet, not as established results.

Paper II · in development

Engineered inquiry

Setting up a decision is half the task; aligning a group on it is the harder half. For genuinely novel goals, two people with different senses of direction will not converge through analysis alone. What is required is a structured process for asking better questions together. We name this as an open hard problem and treat it as the bridge from the framework to a measurement and action protocol.

Paper III · in development

The epistemic economy

What if ignorance, not knowledge, is the economic object? When execution runs at near-zero marginal cost, the value-creating work moves to the layer where ignorance is shaped, classified, and allocated. The central move is an ignorance damage function, by structural analogy with Nordhaus: a stock that accumulates from activity, a damage term tying it back to output, an optimal-control problem over the investment path.

Forthcoming

The mind the future needs

The decision types rest on assumptions about how minds work: what it means to reason from first principles, to hold conviction under uncertainty, to notice what a model cannot see. Minds shaped by different disciplines and cognitive styles are not equally suited to all four decision types. What kind of mind does the organisation of the future actually need?

IV · Going deeper

The formal programme

For readers who want the apparatus underneath the corridor: why a held discharge has an economics at all. These are working drafts; each states its own status.

The ignorance damage function (the Nordhaus analogy in full)

The canonical knowledge-economy literature (Romer, Aghion, Acemoglu) treats knowledge as the non-rival productive factor and ignorance as residual friction. The programme inverts the frame: ignorance is the economic object, engineering it is the value-creating activity, and knowledge is the settled residue of well-engineered ignorance rather than the upstream input. The apparatus is carried over from Nordhaus's climate economics, term for term. The difference is the result.

Nordhaus (DICE)	This programme
Externality: CO₂ accumulates from activity	Externality: mismatched ignorance accumulates from activity
Damage function Ω(T): temperature reduces output	Damage function Ω(M): mismatched ignorance reduces output
Optimal abatement path; shadow price on carbon	Optimal decision-engineering path; shadow price on ignorance mismatch
Result: climate action is loss-avoiding	Result (argued): decision engineering is growth-positive, not only loss-avoiding

Status: working paper in development. The growth-gradient result is derived in a toy model; full proofs, calibration, and the empirical strategy are forthcoming. This states the research direction and the structural bet, not a settled result.

The economics of automated firms: two results that travel as one-liners

Automation debt is modelled as a negative, off-balance-sheet cost function that accumulates with automation depth and the type-distribution of automated tasks: the mirror image of Brynjolfsson's positive intangible capital, and a third externality alongside the two already named in the literature (excessive automation; welfare-reducing new tasks).

The method-error floor. Perfect execution cannot remove method error; better execution only runs the wrong method more cleanly. Oversight is therefore not a startup cost to be relaxed once automation scales. Its value rises with automation depth, because the same latent defect now propagates through more tasks, faster.

The residual-bearer identity. Any loss an automated process produces that is not charged to the project's own P&L must be named and assigned to whoever bears it: workers, users, regulators, the public. Off-balance-sheet does not mean no-one's-balance-sheet.

Status: working papers in progress. The automation-debt model and the dynamic P&L formulation are developed; the strict-dominance claim for disciplined decision engineering is stated as a conjecture with comparative statics, not an established theorem.

Relation to Cynefin

The four-case typology shares structural features with Snowden's Cynefin framework (Snowden & Boone, "A Leader's Framework for Decision Making," Harvard Business Review, November 2007). Both sort decisions into four cases and argue that response strategy must be chosen with reference to the case. They diverge on the organising axis: Cynefin organises by cause-effect knowability across systems; Decision Engineering organises by the shape of unknowns over objects and rules. The two converge on decisions whose operative structure is causal, and diverge on decisions where the operative structure is plausibility, coherence, or commitment, including the everyday machine-error decisions the human oversight layer of an AI-saturated organisation has to handle.