Calculemus
Why policy has correct answers and nobody wants to find them
I. The Chlorine Test
What is the correct chlorine level for municipal drinking water?
Nobody treats this as a matter of opinion. The answer is computed from epidemiology, chemistry, and toxicology — balancing pathogen elimination against carcinogenic disinfection by-products. The EPA publishes the number. Municipalities implement it. If a politician proposed setting chlorine levels by popular vote, they would be laughed out of the room.
Now substitute "immigration level" or "education spending" or "pension eligibility age." Suddenly the question becomes a matter of values, ideology, and democratic deliberation. One side invokes compassion. The other invokes fiscal responsibility. Pundits debate. Elections turn on it. Nobody computes.
But given any explicit objective function — maximize long-run GDP per capita, maximize civilizational persistence, maximize aggregate wellbeing, even maximize immigrant welfare — every one of these questions becomes an engineering problem with an approximately computable answer. The "value" part is choosing the objective function. The "policy" part is computation. We conflate the two, and the conflation is not an accident. (Most people who disagree on immigration policy agree on the objective — prosperity, safety, stability. They disagree on the mechanism. That's empirical, not axiological. Section IV has the experimental evidence.)
II. We Already Calculate
The claim that policy can be computed is not speculative. We already do it — in every domain where getting the answer wrong kills people fast enough to create feedback.
Aviation safety. After decades of fatal crashes, the FAA and the Commercial Aviation Safety Team transitioned from reactive investigation to prognostic risk computation. Acceptable risk is calculated, not debated. Runway safety areas are dimensioned to cover ninety percent of overrun scenarios. Result: an 83% reduction in commercial aviation fatality risk between 1998 and 2008.
Pharmaceutical approval. Before 1962, drug efficacy was a matter of physician opinion. The Thalidomide disaster triggered the Kefauver-Harris Amendment, which mandated statistical proof through randomized controlled trials. A doctor's clinical intuition was legally subordinated to double-blind biostatistics. The question "does this drug work?" became a computation, not a feeling.
Monetary policy. Before Volcker, interest rates were set by political pressure. Nixon leaned on Fed Chairman Burns to keep rates low before the 1972 election. Burns complied. The result was the stagflation of the 1970s. The remedy: compute interest rates from macroeconomic data (the Taylor Rule), and build institutional architecture — 14-year terms, financial autonomy, statutory mandates — to insulate the computation from political override. Turkey's Erdogan recently demonstrated what happens when you reverse this: fire the central bank governors, set rates by presidential opinion, and watch the currency collapse.
Building codes. After the Great Fire of London, after earthquakes, after each catastrophe that made structural failure undeniable — the acceptable load on a beam, the required fire resistance of a wall, the seismic rating of a foundation became computations, not opinions. A city council cannot vote to repeal the laws of physics.
Water treatment. After cholera epidemics killed visibly and quickly, chlorine levels became a computation. After Flint, Michigan — where state-appointed emergency managers switched water sources to save money and regulators failed to require legally mandated corrosion control — lead poisoning reminded everyone why the computation existed.
The pattern: In every domain where bad policy kills people fast enough to create visible feedback within a political cycle, we compute the answer. In every domain where bad policy kills civilizations over decades, we vote on it.
III. The Boundary
What distinguishes the domains we compute from the domains we vote on?
The standard answer: technical domains have clear objective functions and measurable outcomes, while political domains involve genuine value trade-offs and irreducible uncertainty. Aviation safety is "technical." Immigration policy is "political." The boundary is principled.
This is wrong. The boundary is temporal.
Aviation safety involves enormous complexity — millions of interacting components, human factors, weather, maintenance schedules, organizational culture. But when the computation fails, a plane falls out of the sky on the evening news. The feedback loop is measured in hours. The political cost of defending "my opinion about wing stress tolerance" against a crash investigation is infinite. And crucially: the crash resolves the objective-function dispute. Before the crash, airlines had competing objectives — cost reduction, faster turnaround, passenger comfort. After the crash, everyone agrees on "don't crash." Fast feedback doesn't just enable computation. It forces consensus on what to compute for.
Fiscal policy involves comparable or lesser complexity — revenue, expenditure, demographic projections, debt dynamics. The math is not harder. But when the computation fails, the failure arrives as below-replacement fertility persisting for forty years, as infrastructure depreciation compounding at 7% annually, as pension funds claiming workers who will not exist. The feedback loop is measured in decades. No politician bears the cost. No evening news shows the slow collapse. And because the consequences are diffuse and delayed, competing objectives are never forced to resolve — so the domain remains "political," meaning: uncomputed, and the objective-function dispute serves as permanent justification for not computing.
The boundary between computed and uncomputed policy is not determined by whether a correct answer exists. It is determined by whether failure is visible fast enough to force the political system to both agree on what to optimize and surrender its authority to computation. Where the physics imposes quick consequences, we compute. Where the physics is patient, we opine.
IV. The Mask
"We have different values" is the most effective sentence in political discourse. Once a disagreement is classified as a value disagreement, computation becomes illegitimate. You cannot put a number on human dignity. You cannot optimize compassion. The question exits the domain of engineering and enters the domain of tribal allegiance.
The problem: most policy disputes classified as "value disagreements" are not value disagreements at all.
Economist Bryan Caplan analyzed the Survey of Americans and Economists on the Economy — identical empirical questions posed to voters and PhD economists, not "what should we do?" but "what would happen if we did X?" — and found that voters hold systematically biased empirical models. They don't disagree with economists about values. Everyone wants prosperity. They disagree about mechanisms: how trade works, what causes unemployment, whether the economy is improving or declining. The "value disagreement" is an empirical disagreement in which neither side has computed the answer.
Philosopher Michael Huemer tested this further. If political disagreements were genuinely driven by independent moral values, knowing someone's position on abortion would tell you nothing about their position on gun control or minimum wage — these are logically unrelated moral questions. In reality, the positions cluster perfectly into binary tribal platforms. This clustering is explained by tribal identity driving empirical beliefs, not by coherent moral philosophy generating independent policy conclusions.
Political scientist James Fishkin ran the experiment directly. In Deliberative Polls, a representative sample of citizens is given balanced expert briefing materials and the opportunity to question specialists. The results, replicated across dozens of global experiments: polarization drops, preferences converge, and the statistical "single-peakedness" of policy preferences increases. The disagreement that looked like incommensurable values turned out to be information asymmetry. When citizens share a factual model, they largely agree on policy.
The same pattern appears in medicine. John Wennberg's Dartmouth Atlas documented massive regional variations in medical procedures — variations that drive policy debates about healthcare equity. Health economist Charles Phelps provided the explanation: the variations cannot be explained by different patient values or different local health priorities. They are "disagreements about the production function" among physicians who hold different empirical models of which treatments work. But when these mechanistic disagreements enter the public sphere, they get moralized into debates about "the value of human life."
Welfare economics already formalizes the distinction: the Social Welfare Function (what to optimize) versus the Production Function (how inputs produce outputs). The first is genuinely normative. The second is empirical. Policy actors routinely shift from falsifiable Production Function claims to unfalsifiable Social Welfare Function claims. "This tax policy will increase GDP by 3%" invites scrutiny. "This tax policy reflects our values of fairness" does not. The shift is strategic.
The reclassification: When a policy question is too empirically complex for a citizen to compute, the brain reclassifies it from "hard math" to "different values." Experimental philosophy confirms the direction: people shift from objectivist to relativist intuitions as the social and causal distance of a moral question increases (Sarkissian et al., 2011). The reclassification converts a tractable engineering problem into an intractable tribal allegiance signal. It is not conscious deception — it is a systematic perceptual error that happens to serve those who benefit from the answer remaining uncomputed.
V. The Immune System
If most policy disputes are empirical, why don't we compute them?
"Values are subjective" is the philosophical immune system that prevents computation. The mechanism:
- A policy question arises that has an approximately computable answer given an objective function.
- Computing the answer would reveal that current policy serves incumbent interests, not its stated purpose.
- "Values are subjective" is invoked. The question is reclassified from empirical to axiological.
- Once classified as "values," computation is illegitimate — "you can't put a number on human dignity."
- The uncomputed empirical model survives, protected by its axiological camouflage.
This is not conspiracy. It is selection pressure. The historical record confirms it.
In the 1960s, Soviet mathematicians proposed OGAS — a nationwide computer network for economic planning. The technology was feasible. The mathematics was sound. Soviet bureaucrats killed the project because real-time economic computation would have made their patronage networks transparent. They did not argue against the mathematics. They ensured the project never received funding. Historian Benjamin Peters documented the mechanism: structural resistance from incumbents whose power depended on opacity.
In 1971, Stafford Beer built Project Cybersyn in Chile — a real-time cybernetic network linking factories to central coordination. During the truckers' strike of October 1972, the network's telex system transmitted two thousand messages daily, enabling the government to coordinate distribution of essential goods. The system was partially operational and producing results. It was destroyed in September 1973 — not by computational failure, but as collateral damage of Pinochet's coup. The most ambitious real-time governance computation of the twentieth century never got to finish its experiment.
The pattern across 340 years — from Leibniz's original "calculemus" through Condorcet's social mathematics, Saint-Simon's scientific governance, the 1930s Technocracy Movement, OGAS, Cybersyn, and modern evidence-based policy: every attempt to compute policy answers was terminated before completion. In some cases (OGAS) incumbents killed the computation directly. In others (Cybersyn) the experiment was destroyed by unrelated political upheaval. In all cases, the computation was technically feasible. In no case did the institutional architecture exist to protect it.
Sociologist Linsey McGoey formalized the concept as "strategic ignorance" — the deliberate manufacture of unknowability as a power instrument. When elites benefit from a question remaining unanswered, they do not argue that the answer is wrong. They ensure the question is never asked. "Values are subjective" is the philosophical technology that accomplishes this at civilizational scale.
VI. What Calculemus Requires
The anti-calculemus tradition has two legitimate champions.
Friedrich Hayek argued that distributed tacit knowledge cannot be centralized. A central planner cannot aggregate what millions of market participants know locally. This is correct — against central planning.
James C. Scott documented how high-modernist schemes imposed legibility grids on organic systems and destroyed the local practical knowledge they tried to measure. This is correct — against imposed standardization.
Both arguments demolish a straw man that nobody should be proposing. The mechanist version of calculemus is not central planning. It requires three things:
An explicit objective function. You cannot compute "the right answer" without knowing what you are optimizing for. A bridge engineer needs a load specification before computing beam dimensions. A governance engineer needs an objective function before computing policy. This is what Telocracy provides — not a politically negotiated preference, but a physics-derived constraint: sustained civilizational flourishing over deep time. The objective function is not chosen. It is discovered from the requirements of persistence under entropy.
Designed architecture, not central planning. The answer to Hayek is not "no coordination" but "coordinate at the right layer." Markets already compute — they aggregate distributed knowledge into prices without centralizing it, but only because someone designed the architecture of property rights, contract enforcement, and fraud prevention that makes them work. Prediction markets extend this to policy: they compute the probability that a given policy will achieve a given outcome, using distributed information from participants with skin in the game — but only if someone designs the question specification, resolution criteria, and trading rules. None of this requires a central planner deciding answers. All of it requires centrally designed mechanism architecture within which distributed agents compute. The architecture is engineered; the answers are discovered.
An institution that owns the full mechanism lifecycle. Telocracy reframes the state as a search function over policy-space. But a search function requires an organ that actually runs the search. The Fourth Branch is that organ — a constitutionally independent Mechanism Authority that owns every stage of the computation, from design through evaluation to retirement. Before a bill is drafted, ministries consult the Authority on incentive structure: how will rational agents game this rule? Before parliament votes, every significant bill undergoes formal mechanism audit — not for legal form but for mechanism integrity. After passage, the Authority monitors whether the law produces its stated outcome or its opposite, and publishes post-mortems when mechanisms fail. It prices the cost of inaction — when analysis shows that delaying a decision costs X per year in demographic decline or infrastructure decay, that cost is published and attributed. It monitors emergent mechanisms that nobody legislated but that shape behavior: informal veto networks, bureaucratic equilibria, selection gradients that filter out the people who would fix things. Its scope extends to everything that affects civilizational persistence, not just statutes.
The Authority's power is informational, not executive. It cannot block legislation. But when it flags a bill, parliament must adopt a public override resolution — on the record, permanently. When indicators cross hard thresholds — debt ratios, dependency ratios, infrastructure depreciation rates — automatic triggers force parliamentary consideration without ministerial discretion. Central bank independence demonstrates that this architecture works: long non-renewable terms, financial autonomy, statutory mandates, and a permanent red team funded to find errors in the Authority's own models. The computation is not binding. Ignoring it is expensive. Not a central planner that dictates answers, but a cybernetic governor that closes the feedback loop between policy and outcome.
And one principle from Full Accounting: approximately right beats precisely zero. Current policy computes most answers as zero — no estimate of social capital depreciation, no model of demographic sustainability, no calculation of institutional decay rates. Any computation, however rough, is closer to truth than the zero that governance currently uses. The precision objection — "we can't measure social capital precisely" — is an argument for zero, which is the most precisely wrong number available.
VII. When Computation Fails
Calculemus fails in two ways — when you compute the wrong thing, and when you compute without feedback.
Computing the wrong thing. In the Existential Risk Persuasion Tournament, Philip Tetlock gathered superforecasters and domain experts — gave them shared data, financial incentives, and months of deliberation — and asked them to forecast the probability of AI-caused human extinction by 2100. They failed to converge. Experts estimated 3%. Superforecasters estimated 0.38%. The gap persisted despite every structural incentive to update.
But examine what they were actually disagreeing about. The two groups held different priors (humanity is robust vs. smarter agents dominate), used different epistemic methods (historical base rates vs. inside-view reasoning about unprecedented systems), and modeled social adaptation capacity differently. These are production function disagreements — exactly the kind this essay argues are misclassified as value disputes. The non-convergence was not evidence of a question beyond computation. It was evidence that the study asked the wrong question. No financial incentive can substitute for a physical feedback loop. When the feedback loop is eighty years long, human cognition cannot grade its own homework, and empirical models calcify into tribal positions.
The mechanist response is not to surrender to "value pluralism" at the frontier. It is to change what is being computed. We cannot compute whether AI will cause extinction in 2100 — that question lacks the feedback structure computation requires. We can compute whether the current market architecture incentivizes AI labs to deploy before safety is established. That is a game theory problem, solvable today, from present-day data. We can compute which architectural constraints — liability regimes, compute thresholds, mandatory red-teaming — alter the payoff matrix. When a long-term outcome is uncomputable, you stop predicting the horizon and start engineering the engine. The limit of calculemus is knowing when to shift from outcome prediction to incentive architecture.
Computing without feedback. Australia's Robodebt system automated welfare debt recovery by averaging annual income across fortnights — a model that did not correspond to reality. Precarious workers earn in bursts, not in steady flows. The averaging was wrong as engineering: it produced hundreds of thousands of false positives. But the deeper failure was architectural. The system was given executive authority — it mandated debt recovery rather than flagging discrepancies for human review. It removed every error-correction mechanism: no feedback loop, no human in the loop, no recourse pathway that could catch the model's failures before they compounded. The result: a $1.2 billion settlement and a Royal Commission that called it "a massive failure of public administration."
Both failures are architectural. The XPT computed the wrong thing — a monolithic prediction where mechanism decomposition was required. Robodebt computed without error correction — executive authority with no feedback loop. The lesson is the same: computation without the right architecture is not calculemus. It is numerology with institutional backing.
The lesson from both: computation requires error correction. Every model is wrong at the margins. The question is whether the architecture catches the error before it compounds or after. The Fourth Branch is designed with feedback at every layer — a permanent red team funded to find errors in the Authority's own models, override resolutions that create public records when parliament diverges from the computation, and automatic triggers when indicators cross hard thresholds. The computation closes the feedback loop between policy and outcome. When the computation itself is wrong, the architecture catches it.
The scope: Calculemus works for standard policy — resource allocation, institutional design, regulatory thresholds, fiscal sustainability — where historical data exists, feedback loops can be modeled, and outcomes can be approximately measured. This covers the vast majority of what governments actually do. For the middle zone — domains with moderate uncertainty and multi-decade feedback loops, like education reform or urban planning — the answer is not perfect foresight but probabilistic computation with continuous error correction: compute the best current estimate, measure trailing indicators, adjust as the error bars shrink. The cybernetic governor exists precisely for this. At the frontier of radical uncertainty, you shift targets: stop computing outcomes, start computing incentive architectures. And computation without feedback — executive algorithms with no error correction — is not computation at all. But using these boundary conditions to block informational computation of standard policy is like refusing to build bridges because we cannot predict earthquakes.
VIII. The Price of Not Computing
What does it cost to treat engineering problems as matters of opinion?
Every domain that transitioned from opinion to computation did so after catastrophe made the cost undeniable. Thalidomide before drug trials. Stagflation before central bank independence. Crashes before aviation safety computation. Cholera before water treatment standards. The computation was available in every case before the catastrophe. The political will to implement it was not.
The domains still governed by opinion — fiscal sustainability, demographic planning, institutional design, education architecture — are accumulating their own catastrophes on the same schedule. Below-replacement fertility persisting for forty years is not a surprise — demographers have tracked it for decades. What remains uncomputed is the policy response: which mechanisms caused it, what would reverse it, and what it costs per year to do nothing. Pension insolvency projected for the 2030s is not a surprise. It is a computed answer that the political system refuses to act on because the feedback loop exceeds the election cycle.
The price of not computing is paid in the currency of civilizational capital — consumed without appearing on any ledger, depleted without triggering any alarm, exhausted without any politician bearing the cost. The computation that would make this depletion visible exists. The institutional architecture to protect it from political override has been specified. The objective function that would make the computation meaningful can be derived from physics.
What remains is the oldest obstacle in the history of calculemus: the structural incentive to not compute, held in place by the philosophical claim that computation is inappropriate for questions that are "really about values." Most of them are not about values. They are about mechanisms that nobody has bothered — or dared — to calculate.
Calculemus.
Governance series: Diagnosis → Telocracy → Calculemus → Institution → Full Accounting → Libertarianism Is an Incomplete Solution
Related:
- Telocracy — Where the objective function comes from
- The Fourth Branch — The institutional architecture to protect computation
- Full Accounting — "Approximately right beats precisely zero"
- Values Aren't Subjective — Why "values are subjective" is the immune system
- What "Vote on Values" Actually Does — Why democratic aggregation measures replicator fitness, not correct answers
- Ethics Is an Engineering Problem — Architecture over disposition
Sources and Notes
Historical calculemus lineage:
- Gottfried Leibniz, "calculemus" (1685) — the dream of resolving disputes by computation.
- Nicolas de Condorcet, Essay on the Application of Analysis to the Probability of Majority Decisions (1785) — social mathematics.
- Henri de Saint-Simon (1820s) — "the government of men should be replaced by the administration of things."
- The Technocracy Movement (1930s); William Akin, Technocracy and the American Dream (1977) on its failure to develop a viable political theory.
OGAS and Cybersyn:
- Benjamin Peters, How Not to Network a Nation: The Uneasy History of the Soviet Internet (2016) — bureaucratic resistance to computational governance.
- Eden Medina, Cybernetic Revolutionaries: Technology and Politics in Allende's Chile (2011) — Project Cybersyn.
Strategic ignorance:
- Linsey McGoey, The Unknowers: How Strategic Ignorance Rules the World (2019).
- McGoey, "Strategic unknowns: Towards a sociology of ignorance," Economy and Society (2012).
Value vs empirical disagreements:
- Bryan Caplan, The Myth of the Rational Voter: Why Democracies Choose Bad Policies (2007) — systematic voter biases on empirical economic questions.
- Michael Huemer, "Why People Are Irrational about Politics" (2015) — refutation of the Divergent Values Theory.
- John Wennberg and Alan Gittelsohn, "Small Area Variations in Health Care Delivery," Science (1973) — discovery of unexplained regional medical practice variation. Charles Phelps, Health Economics (2003) — economic interpretation as production function disagreement.
- Dan Kahan, "Ideology, Motivated Reasoning, and Cognitive Reflection," Judgment and Decision Making (2013) — identity-protective cognition driving empirical beliefs.
Deliberative convergence:
- James Fishkin, Democracy When the People Are Thinking: Revitalizing Our Politics Through Public Deliberation (2018) — Deliberative Polling results showing convergence and increased single-peakedness across dozens of global experiments.
Cognitive reclassification:
- Hagop Sarkissian et al., "Is Folk Moral Relativism a Fact?", Social Psychology and Philosophy (2011) — people shift from objectivist to relativist moral intuitions as social and cultural distance increases.
- Geoffrey Goodwin and John Darley, "The Psychology of Meta-ethics," Cognition (2008) — individuals treat moral claims as having objective truth value in familiar contexts.
SWF/PF distinction:
- Standard welfare economics formalization. The Social Welfare Function (Bergson 1938, Samuelson 1947) as the normative component; the Production Function as the empirical component.
- Ronald Dworkin, Law's Empire (1986) — distinguishes "empirical disagreements" from "theoretical disagreements" in jurisprudence; the formal philosophical framework for the value/empirical distinction applied to legal interpretation.
Value pluralism (adversarial):
- Isaiah Berlin, "Two Concepts of Liberty" (1958) — genuine incommensurability of terminal values.
- John Rawls, Political Liberalism (1993) — "burdens of judgment" explaining reasonable pluralism even under full information.
- Chantal Mouffe, The Democratic Paradox (2000) — agonistic democracy: the strongest contemporary critique of "technocratic post-politics," arguing that political contestation is constitutive of democracy, not a failure to compute.
Epistemic jurisdiction:
- Giandomenico Majone, "The Regulatory State and Its Legitimacy Problems," West European Politics (1999) — formalizes when and why policy domains are delegated from democratic deliberation to expert computation, and the legitimacy architecture required.
Computed policy domains:
- Aviation: FAA/CAST prognostic safety analysis; 83% fatality risk reduction 1998–2008 per FAA data.
- Pharmaceuticals: Kefauver-Harris Amendment (1962) mandating RCTs.
- Central banking: John Taylor, "Discretion versus Policy Rules in Practice," Carnegie-Rochester Conference Series on Public Policy (1993) — the Taylor Rule.
- Building codes: progressive formalization from Great Fire of London (1666) through International Building Code.
- Water treatment: EPA Safe Drinking Water Act (1974).
Political override catastrophes:
- Nixon–Burns monetary pressure: Burton Abrams, "How Richard Nixon Pressured Arthur Burns," Journal of Economic Perspectives (2006).
- Erdogan and Turkish monetary policy: widely documented 2021–2023.
- Flint water crisis: documented by Michigan Civil Rights Commission (2017).
Computation failures:
- Australia Robodebt: Royal Commission into the Robodebt Scheme (2023) — $1.2 billion settlement, "massive failure of public administration."
- Existential Risk Persuasion Tournament: Tetlock, Karger, et al., "Improving Judgments of Existential Risk" (2023) — non-convergence under radical uncertainty.
Anti-calculemus tradition:
- Friedrich Hayek, "The Use of Knowledge in Society," American Economic Review (1945) — distributed tacit knowledge.
- James C. Scott, Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed (1998) — high-modernist legibility.
- Paul Tucker, Unelected Power: The Quest for Legitimacy in Central Banking and the Regulatory State (2018) — the democratic deficit critique.
Prediction markets and futarchy:
- Robin Hanson, "Shall We Vote on Values, But Bet on Beliefs?" Journal of Political Philosophy (2013).
- Philip Tetlock, Superforecasting: The Art and Science of Prediction (2015) — calibrated forecasters converge on policy-relevant predictions.