The Alignment Problem in Your Government

Why the state is a misaligned optimizer — and what architecture could fix it

Elias Kunnas

The state is a misaligned optimizer — the identical formal structure as AI's alignment problem, running on a different substrate. The telos is derivable from physics: sustained complexity generation against entropy. The fix is architectural: a constitutionally independent Fourth Branch that closes the feedback loop between policy and outcome.

Standard objections addressed in this essay

"You can't derive ought from is" — §III (conditional engineering specification, not normative claim)
"This is just central planning" — §IV (penetration tester, not planner — audit function, not executive power)
"An unelected body is anti-democratic" — §IV (parliament retains full sovereignty; overrides are public and permanent)
"Audit offices and the CPB already do this" — §IV (each addresses pieces; nobody performs full-lifecycle mechanism audit)
"The AI alignment analogy is just a metaphor" — §I (identical formal structure: mesa-optimization, instrumental convergence)
"Democracies worked fine for centuries" — §II (borrowed coherence — external conditions masked misalignment)
"Who guards the guardians?" — §IV (constitutional independence, Red Team, non-renewable terms, public methodology)

I. The Problem You Already Understand

In AI safety, the alignment problem is straightforward to state: how do you ensure that a powerful optimization process pursues the goals you actually want, rather than the goals it was trained to pursue?

Train an AI to maximize clicks, it learns to show outrage content. Train it to minimize user reports, it learns to hide harmful content from moderators. The system optimizes for the metric, not the intent behind the metric. When a measure becomes a target, it ceases to be a good measure — Goodhart's Law, the gravitational constant of institutional decay.

This is mesa-optimization: a subsystem develops its own objective function that diverges from the base system's objective. The mesa-optimizer appears aligned during training (it produces the right outputs) but pursues its own goals in deployment. The more capable the optimizer, the better it becomes at appearing aligned while pursuing divergent objectives.

We worry about this in artificial intelligence. We should worry more about the fact that it has already happened in governance.

"Politician" is a structurally misaligned job. It selects for election-winning, not outcome-producing. Over electoral cycles, selection pressure produces convergence on misalignment regardless of individual intentions. This is not a metaphor. It is the identical formal structure: a subsystem (the political class) developing mesa-objectives (re-election, status, career advancement) that diverge from the base system's objective (civilizational flourishing).

The AI alignment community already accepts a key result called instrumental convergence: any sufficiently intelligent agent, regardless of its terminal goals, will converge on certain instrumental subgoals — self-preservation, resource acquisition, goal-integrity maintenance — because physics constrains what any agent needs to achieve anything. Nobody objects to this on philosophical grounds. The derivation stays on the "is" side of the is-ought gap.

The move this essay makes is one hop: apply the same argument to civilizations. A civilization is an agent — a bounded region of spacetime that maintains internal order by pumping entropy across its boundaries. Physics constrains what any civilization needs to do to persist, regardless of what that civilization "values." The instrumental requirements are the same as for any goal-directed system: maintain your capacity to extract resources and maintain order, defend your capital stocks, sustain your ability to generate new solutions to new problems.

If you accept instrumental convergence for artificial agents, you must accept it for civilizational agents. Same physics. Different substrate. And governance is the mechanism by which a civilizational agent pursues its objectives — or fails to.

II. How Government Becomes Misaligned

Consider the job description of "politician" as currently constituted. Election-based selection rewards popularity, tribal signaling, and promise-making. The re-election incentive privileges visible short-term actions over invisible long-term outcomes. Misaligned skin in the game: the consequences politicians face (electoral loss) are orthogonal to the consequences that matter (policy outcomes). Costs can be externalized to future generations or diffuse taxpayers. Career politicians are professional election-winners, not temporary stewards. Accountability operates through the theater of voting, not through outcome-tracking.

This job description attracts those seeking power without accountability, selects those best at winning elections (a skill orthogonal to governing), rewards behaviors misaligned with outcomes, and punishes truth-telling.

Imagine two candidates. Candidate A says: "The evidence shows that spending increases don't improve educational outcomes. We need structural reform that will upset unions and take years to show results." Candidate B says: "Our children deserve better! I will fight for education funding because I care about the future!" B wins. A is tagged as "anti-education." Over electoral cycles, the system selects out honest truth-tellers. The surviving population converges on three types: cynics who know but don't say, the self-deceived who genuinely believe the popular thing, and the ignorant who never learned otherwise. Replace every politician today, and in two electoral cycles you'll have the same distribution. The architecture produces the outcome.

The Dual Principal-Agent Catastrophe

The standard framing says politicians are agents of voters who optimize for re-election instead of voter welfare. This understates the problem. There is no actual principal.

The terminal principal — civilizational flourishing over deep time — has no mechanism to express itself. Nobody measures whether laws actually produce flourishing. Voters, the supposed intermediate principal, are themselves misaligned: in aggregate they optimize for short-term personal benefit, tribal satisfaction, and visible signals. Hyperbolic discounting is the human default. As Bryan Caplan demonstrated, democracy is a commons: each voter externalizes the cost of their irrational beliefs onto society.

A politician who perfectly serves voters still serves a misaligned principal. The system has no actual principal — only self-interested actors competing for resources, with the public good as rhetorical cover.

The Laundering Horizon

The political cycle is roughly four years. Most policy feedback loops are longer. Fiscal consequences arrive over decades. Demographic consequences over generations. Institutional decay over centuries. This gap is the Laundering Horizon: any policy with a feedback loop longer than the political cycle cannot be managed through electoral accountability. Costs that arrive after the accountability window closes generate no accountability — the politician is already re-elected or retired.

Selection pressure guarantees exploitation of this gap. A politician who says "this feels good now but will hurt us in fifteen years" loses to one who says "this helps us now." The honest one is selected out. Temporal laundering is the clearest case, but spatial laundering (concentrate benefits on your coalition, diffuse costs to non-voters) and causal laundering (complex causation creates plausible deniability) operate simultaneously. Pension promises exhibit all three: costs arrive in thirty years (temporal), fall on future workers who can't yet vote (spatial), and involve actuarial complexity nobody can evaluate (causal). The dimensions multiply.

The downstream consequence is the Democratic Ratchet: obligations growing faster than the capacity to fund them. Politicians need votes. Votes are bought with promises. Promises create obligations. Obligations compound — there is no electoral reward for reducing them. Obligations eventually exceed capacity. Entitlement spending as a share of GDP ratchets upward in nearly every democracy without hard constitutional constraints.

Borrowed Coherence

If the architecture has always been this broken, why did things seem to work?

They didn't. External conditions masked the misalignment. During the Cold War, the existential Soviet threat functioned as a forcing function. Survival pressure demanded elite coherence and disciplined resource allocation. The threat did the alignment work that correct architecture should have done — politicians could be misaligned, but external pressure produced coherent outcomes anyway. During growth phases (1945–1973, 1991–2008), economic expansion masked fiscal dysfunction. Surplus fed the ratchet without immediate pain.

After 2008, both props were removed. No existential threat. No growth to mask extraction. Accumulated dysfunction became visible. The system was never well-designed. It was well-subsidized by circumstance. You cannot restore past success by recreating past policies, because the success was never due to the policies. It was due to external conditions that no longer exist.

III. Aligned to What?

"We need better governance." Aligned to what? "Human flourishing." Defined how? By whom? Measured against what?

This is where most governance reform proposals stop. They optimize the mechanism but leave the objective function blank, or hand-wave it as "well-being" without specifying what that means in practice. The question is treated as either trivially obvious (everyone agrees on flourishing!) or hopelessly subjective (who are we to define the good life?). Both responses are evasions.

David Hume observed in 1739 that moral philosophers kept sneaking from "is" to "ought" without explaining the transition. He was right — and his demand for explanation has been misread as a prohibition for nearly three centuries. "You can't derive ought from is" fires as a reflex, terminating any attempt to ground governance in physical reality.

But there are two very different claims here. The strong claim — "physics tells you what to value" — is indeed what Hume warned against. Sam Harris tried it with neuroscience and failed. Jeremy Bentham tried it with utility and produced the proxy-metric that ate Western governance. You cannot deduce from thermodynamics that you should prefer cooperation over solitude, or any particular way of life over any other.

The weak claim — "physics tells you which value-configurations self-destruct" — is a different type of statement entirely. Not "you should value X" but "if your system is configured as Y, it will cease to exist." Nobody invokes Hume against bridge engineering. "You can't derive load-bearing requirements from physics" is a sentence nobody has ever uttered, because it is obviously absurd. The bridge does not care about your preferences for gravity.

The weak claim says: civilizations are physical systems subject to the same constraints as any system maintaining order against entropy. The second law imposes a continuous maintenance cost. Without active work, organized complexity degrades toward equilibrium. Any complex system — a cell, a corporation, a civilization — either generates enough order to outpace decay, or it dies.

This imposes constraints. Under radical uncertainty (you cannot assign probabilities to shocks you haven't encountered), static reserves are always insufficient — for any finite buffer, there exists a perturbation that exhausts it. Survival over deep time therefore requires not merely reserves but regenerative capacity: the ability to generate novel complexity across unknown dimensions. And since you cannot compute the "right" rate under radical uncertainty, the only strategy not dominated across all possible perturbation regimes is: maximize it.

This is the non-arbitrary telos: sustained generation of organized complexity over the longest possible time horizon. Maximum safety margin against entropy. What Aristotle called objective flourishing — not subjective happiness.

The gap Hume identified is intact. No ought was derived from is. What was derived is: which configurations of ought are compatible with continued existence. Physics does not tell you what to want. But once you want anything at all, physics determines what you must do to get it — and what will destroy you if you don't.

IV. The Missing Branch

Modern democracies have three branches. The executive acts. The legislature decides. The judiciary interprets. Each checks the others.

But there is a function that none of them performs: does this actually work? No branch asks whether policies produce their stated outcomes. No branch audits whether institutions still serve their original purpose. No branch monitors whether the system is drifting toward configurations that erode its own capacity to persist.

This function is not novel. Every civilization that persisted beyond a few centuries developed some version of it.

The Roman Censors (443 BCE–22 BCE), elected every five years, reviewed citizen rolls, assessed institutional health, and could expel senators who had disgraced their office. They could not make law, but they could exclude from civic participation those who corrupted the system. When the Censorship was allowed to lapse under the Empire, the system lost its self-correction mechanism.

The Chinese Censorate (Yushitai) held constitutional authority for over a millennium to impeach any official, criticize imperial policy, and reject appointments. The Censors were specifically empowered to speak without fear of punishment. The function persisted because its absence was reliably fatal: emperors who eliminated critical feedback produced cascading system failures within decades.

The Athenian Graphe Paranomon allowed any citizen to charge that a proposed law violated higher constitutional principles — constitutional review without judges, a check against democratic drift toward self-consumption.

The common pattern: a constitutional organ insulated from immediate popular pressure, tasked with maintaining systemic coherence over horizons longer than any election cycle.

The twentieth century systematically dismantled these mechanisms. The rationale was democratic: unelected bodies exercising constitutional power seemed anti-egalitarian. The result was predictable from thermodynamics: systems optimized for immediate preferences drift toward configurations that feel good now and compound toward failure later. What replaced the guardian function — constitutional courts that wait for cases, central banks with narrow mandates, regulatory agencies captured by their industries, academia fragmented across departments — performs none of the essential tasks. The function was deleted without replacement.

Closing the Loop

James Watt did not invent the steam engine. He made it controllable. His centrifugal governor — a closed feedback loop that throttled the engine when it ran too fast and opened the valve when it slowed — was the first industrial application of what control theory later formalized. The governor did not need a supervisor. It contained its own constraint in its structure.

Modern states are open-loop systems. When the engine overheats — debt accumulates, capital stocks erode, demographic structure inverts — no valve closes. Politicians throw more fuel into the furnace to mask the problem. Diagnostic institutions (audit offices, budget offices) report afterward that the engine overheated. But nobody measures the engine's health in real time and nobody closes the loop.

An open loop drifts to destruction. A closed loop corrects itself. The Fourth Branch is the governor.

Penetration Tester, Not Central Planner

The Hayekian objection fires immediately: central planning fails because knowledge is distributed and tacit. If the Fourth Branch tried to simulate the economy or direct outcomes from above, it would fail as Soviet planning bureaus failed.

But the Fourth Branch does not simulate the future. It simulates attacks.

In software security, a penetration tester does not need to know the entire contents of the internet. They need to know how malicious code tries to breach a firewall. The Fourth Branch examines a proposed law as a penetration tester examines code: "How will a rational, self-interest-maximizing agent break this rule?" It does not predict what people will want (which is impossible). It calculates the boundary conditions under which incentives hold or collapse.

This is a mechanism audit: not "is this good policy?" but "does this mechanism produce its stated outcome?" Pre-legislative review that asks whether the incentive structures are game-theoretically robust, whether the metrics will be gamed, whether the mechanism still works when circumstances change. The difference between a firewall and a wish.

The Hungarian Lesson

In the early twenty-first century, several countries attempted to create "guardian of the future" institutions. They failed — and how they failed is instructive.

Hungary's Parliamentary Commissioner for Future Generations (2008-2012) was Europe's strongest. He had power to suspend administrative decisions, initiate constitutional complaints, and intervene in environmental and urban planning. He used it: about 200 substantial cases a year. In 2012, the office was downgraded and merged into the Ombudsman's office — the function abolished in all but name.

The lesson: discretionary power in a person is a political target. When Sándor Fülöp blocked projects, politicians saw a man, not a mechanism. They removed him.

Sweden's pension brake works on the opposite principle. When the balance ratio (assets to liabilities) falls below 1.0, pension indexation is automatically reduced. In 2010, 2011, and 2014, the brake activated — real pensions fell. Politicians said: "It's mathematics." Nobody lost their position, because nobody made a decision. The system did.

The design principle: trust the formula, not the person. Make the consequence automatic. Make exceptions expensive but possible (every override is logged, public, permanent). Let the politician blame physics, not a colleague.

The Architectural Gap

The deepest problem is not that existing institutions fail at any particular task. It is that nobody's job description includes the function. Ministers make policy. Civil servants draft and implement. Lawyers check legal form. Finance ministry economists prepare impact assessments. The national audit office audits use of funds. Constitutional courts evaluate constitutionality.

Who asks: does this mechanism actually produce its stated outcome? Who models how rational actors will respond to a new incentive structure before it is deployed? Who monitors whether an institution still serves its original purpose twenty years later? Who notices when a mechanism that was never legislated — an emergent incentive structure, an informal veto network, a bureaucratic equilibrium that nobody designed — is producing more effect on society than any law on the books?

Nobody. The function does not exist. Not because it was tried and failed, but because the org chart has no box for it. Every complex system in engineering has someone responsible for mechanism integrity — for asking whether the system's actual behavior matches its designed behavior. States are the only complex systems where this function was architecturally deleted and never replaced.

The Concrete Architecture

A detailed institutional specification for this function exists — the Mechanism Authority — designed initially for the Finnish constitutional framework but generalizable to any parliamentary democracy. Its defining feature is that it operates across the full lifecycle of mechanisms — not as a checkpoint at one stage, but as a center of excellence for mechanism design that participates from conception through operation to post-mortem.

Proactive consultation. Ministries consult the Authority during drafting, before a bill is submitted — confidentially, collaboratively, without public confrontation. The goal is to identify broken incentives at a stage where fixing them costs nothing and requires no loss of face. The Authority doesn't just audit — it designs alternatives. "You want to incentivize X and avoid Y? Here are three mechanisms that achieve this. Option A works like this, costs this much. B works differently. We recommend B because..." This transforms the relationship from inspector-versus-inspected to engineer-assisting-engineer.

Pre-legislative review. Every significant government bill undergoes formal mechanism audit before parliamentary consideration. The review evaluates incentive structures (does the law create situations where acting against its objectives is profitable?), game-theoretic robustness (how will rational actors respond?), metric distortion (will the metric become a gamed target?), system effects (how does the law interact with existing mechanisms?), and future-proofing (will the mechanism still work when circumstances change?). If the mechanism is broken, the Authority flags it. Parliament can still pass the law — but must adopt a public override resolution stating its reasons. The resolution is permanent record. When the next government asks "why is this law broken?", the answer is on file: "Parliament X overrode warning Y. Here are the names."

Continuous monitoring and post-mortems. Mechanisms do not stop needing attention once deployed. Every public function must define its intended effect on reality — not "we process applications" but "an entrepreneur receives a permit within X days." The Authority monitors actual outcomes against intended effects in real time, not in retrospective audits years later. When a law produces the opposite of its stated intent, a mechanism failure notice is issued. When a major reform fails, the Authority produces a public post-mortem: what went wrong, why, and how it should have been designed. The goal is institutional learning, not blame. And critically: the Authority monitors not only legislated mechanisms but emergent ones — informal veto networks, bureaucratic equilibria, perverse incentives that nobody designed but that shape behavior more powerfully than any statute. Society is full of mechanisms that were never enacted. Someone must notice them.

Responsibility for inaction. In the current system, a bureaucrat who does nothing remains blameless. Passivity is the safe career strategy. The Authority inverts this by pricing the cost of not deciding. When analysis shows that delaying a decision costs X per year in deteriorating infrastructure, demographic decline, or institutional decay, that cost is published and attributed. Inaction ceases to be costless. The current architecture makes action risky and passivity safe. The Mechanism Authority makes both visible.

Automatic triggers. The Authority maintains a public dashboard of indicators critical to long-term systemic health. When a threshold is exceeded — debt-to-GDP ratio, dependency ratio, infrastructure depreciation rate — parliamentary consideration is automatically triggered. No ministerial discretion. Slovakia's constitutional debt brake (2011) demonstrates the principle: five escalating bands from 50% to 60% debt-to-GDP, each triggering harder consequences — from mandatory written explanations through ministerial salary cuts and expenditure freezes to a mandatory confidence vote. Personal financial consequences work. Politicians adapted.

Election platform auditing. Before elections, parties voluntarily submit platforms for the Authority's calculation. Using a unified model, the Authority evaluates what each platform's promises cost and produce. The Netherlands Bureau for Economic Policy Analysis (CPB) has operated this system since 1986. The CPB's doorrekenen transformed Dutch political culture: parties modify platforms based on CPB calculations because the numbers are public. A party that refuses calculation is a party that knows its numbers don't work. In four decades of operation, no Dutch government has successfully captured or defunded the CPB. Its authority derives entirely from accuracy and transparency.

Constitutional independence and self-correction. Established by special law, funded by capital endowment rather than annual budget. Board members serve non-renewable terms of seven to ten years, removable only by judicial panel. International composition requirements break network capture. All methodology, data, and models are public and challengeable. Five percent of the budget funds a permanent Red Team whose sole purpose is to find errors in the Authority's own models — truth emerges from structured conflict, not institutional monopoly.

The Mechanism Authority is not a regulator added to an existing stack. It is the missing architectural layer — the function that makes mechanism integrity somebody's job across the full lifecycle: design, review, deployment, monitoring, post-mortem, redesign. It extends the CPB model from narrow economic forecasting to full mechanism audit: not just "what does this cost?" but "does this actually produce what it claims to produce?" It is the centrifugal governor installed on the state: measuring the engine's condition, detecting deviation, and converting deviation into correction. Protocol without a feedback loop is a suggestion. Protocol with a feedback loop is a constraint.

V. Why This Matters Now

AI governance will be designed by the same institutions that cannot manage pension reform.

The institutions that will regulate superintelligence — parliaments, regulatory agencies, international bodies — are the same institutions that take eighteen years to fail at healthcare reform, that cannot evaluate whether a housing policy works, that pass laws creating the opposite of their stated intent and discover this decades later. AI regulation will be produced by misaligned politicians optimizing for re-election, staffed by captured bureaucracies, reviewed by nobody who evaluates whether the mechanisms work.

AI alignment is not a separate problem that requires separate institutions. It is the hardest instance of the governance alignment problem. A state that cannot evaluate whether a housing subsidy works as designed will not evaluate whether a deployed AI system works as designed.

If you solve governance alignment, you have the institutional infrastructure to handle AI alignment. If you fail to solve it, AI alignment is moot — misaligned institutions will deploy AI systems to serve their mesa-objectives, accelerating the very dynamics that are already consuming the foundations.

VI. The Path

Who builds the Fourth Branch? The obvious paradox: the people who would build it are being selected out by the system it would fix. Politicians will not voluntarily create an institution that makes their misalignment visible. Bureaucracies will not design their own auditor.

This is not a fatal paradox. Structural reform has never been initiated by the structures it reforms. It has four historical vectors.

Hard constraints via direct democracy. Switzerland's debt brake is constitutional, validated by 85% referendum. Politicians cannot evade it because it is not under their control. Go directly to voters for structural rules, bypassing captured representatives.

External constraint. IMF conditionality, treaty obligations, credit rating thresholds. External actors impose what domestic politics cannot. Crude, often destructive in implementation — but the mechanism is real: external constraint substitutes for missing internal constraint.

Crisis. When dysfunction becomes visible enough, reform becomes politically possible. Dangerous — by then, the capacity for reform may be depleted. But historically the most common vector. The blueprint must exist before the window opens.

Parallel institutions. The Fourth Branch does not need to be legislated into existence. It needs to be demonstrated. Build the function outside the existing structure. Publish mechanism audits. Track outcome divergence. Build credibility through accuracy until the analysis becomes impossible to ignore.

The common thread: do not reform misaligned agents through the process they control. Constrain them, bypass them, or build around them.

The selection pressures are already operating. The brain drain is already happening. The institutions are already selecting for people who cannot fix them. What remains to be determined is whether the function can be built before the capacity to build it is selected out.

The argument in three sentences: Politicians are mesa-optimizers pursuing re-election rather than civilizational flourishing, and the architecture guarantees this regardless of individual character. The objective function — sustained complexity generation — is derivable from physics as a conditional engineering specification, not a normative claim. The fix is a Fourth Branch: a constitutionally independent mechanism audit function using automatic triggers and public override requirements rather than discretionary power.

Related:

Mechanism Realism — The ontology: replace every intention with the mechanism that produced it
The Governance Alignment Problem — The detailed diagnosis of political misalignment
The Fourth Branch — The missing constitutional layer, in detail
Telocracy — Governance with a physics-derived objective function
The Missing Variables — A complete introduction to mechanism realism
Calculemus — Why most policy disagreements dissolve under computation
Full Accounting — The capital stocks nobody measures

The full framework is developed across 83 essays and a book in Finnish (Mekanismirealismi, 2026). The institutional specification for the Mechanism Authority is published at mekanismirealismi.fi/mechanism-authority.