Ethics Is an Engineering Problem

Why "being good" fails and "building constraints" works

Elias Kunnas

For three millennia, ethics has been treated as disposition training — teaching individuals to act against their own incentives. It has never worked at scale. Ethics is an architecture problem: design systems where the incentive-compatible action is the correct action, and you no longer need saints to produce good outcomes.

Standard objections addressed in this essay

"You can't derive ought from is" — §VI (engineering bridges the gap: conditional constraints, not preferences)
"Good systems still need good people to run them" — §II (character is soluble in incentives; §VIII acknowledges architecture is primary, not exclusive)
"This is amoral — reducing virtue to mechanism" — §X (architecture enabling billions to flourish is the most moral act)
"Who architects the architecture? The designers face the same incentive problems" — §VIII (the Founders as existence proof; see The Fourth Branch for the recursive solution)

I. The 3,000-Year Mistake

For three millennia, we have treated ethics primarily as disposition training—teaching individuals to be "good" through moral reasoning, guilt, and exhortation, or (most recently) training AI models with RLHF.

History and physics suggest ethics is actually an architecture problem—designing systems where "good" is the only thermodynamically stable equilibrium.

The Paradigm Shift

Old paradigm: "Virtue" is a personal attribute. The goal is to create saintly agents.

New paradigm: "Virtue" is a system state. The goal is to create robust games.

The thermodynamic reality: In a system with misaligned incentives (Moloch), a "saintly agent" is thermodynamically unstable. They will burn out, be corrupted, or be outcompeted by agents who defect. In a system with aligned architecture, "virtue" becomes automatic—the system produces virtuous outcomes without requiring virtuous inputs.

II. The Failure of Willpower

The "Good Person" Fallacy

When systems fail, we blame individual actors. The 2008 financial crisis? "Greedy bankers." Political corruption? "Bad politicians." Corporate malfeasance? "Unethical executives."

This diagnosis is both correct and useless.

Yes, bankers were greedy. But bankers are always greedy. The crisis didn't happen because humans suddenly became more selfish in 2007. It happened because the architecture made greed systemically risk-free: privatized gains, socialized losses, no personal liability for catastrophic failure.

The system selected for the behavior we claim to deplore.

The Physics: Character Is Soluble in Incentives

Over long enough timeframes and strong enough optimization pressure, character dissolves. This is not cynicism—it's thermodynamics.

A virtuous person in a corrupt system faces a choice: adapt (become corrupt), exit (leave the system), or burn out (exhaust their finite reserves of willpower fighting the gradient). The system doesn't need to corrupt everyone—it just needs to outlast the incorruptible.

You cannot build a civilization on the assumption that people will consistently act against their thermodynamic interests.

III. Virtues Are Stability Constraints

The traditional list of virtues (courage, temperance, justice, wisdom, compassion, integrity) sound like aspirational character traits. In the engineering frame, they're physical requirements for system stability.

The four foundational virtues are discovered constraints any durable system must satisfy, not moral goods:

Integrity ≠ Honesty. Integrity = Signal Fidelity.

Engineering definition: Maps must match territory. Sensors must report accurately. Feedback loops must be undistorted.

Why it's necessary: A system that lies to itself cannot navigate reality. If your speedometer reads 60 when you're going 120, you will crash. If your civilization's metrics (GDP, approval ratings, test scores) become divorced from actual performance, you hit a wall.

Failure mode: Goodhart's Law. When a measure becomes a target, it ceases to be a good measure. The system optimizes for the proxy while destroying the goal.

The failure is control systems—corrupted feedback prevents self-correction.

Fecundity ≠ Reproduction. Fecundity = Anti-Fragility.

Engineering definition: The capacity to generate novelty, explore solution space, produce variance necessary for selection and adaptation.

Why it's necessary: Environments change. A system that cannot generate new responses dies when the problem set shifts. Stagnation is thermodynamically unstable over deep time.

Failure mode: Optimization for present stability (T-) at the cost of future adaptability. The system becomes brittle. When shock arrives, it shatters.

Harmony ≠ Peace. Harmony = Impedance Matching.

Engineering definition: Achieving maximal effect with minimal means. Reducing internal friction, waste heat, coordination costs.

Why it's necessary: High-friction systems dissipate energy as heat rather than work. If 90% of your energy is lost to internal conflict, you cannot compete with systems running at 10% friction.

Failure mode: Bureaucratic sclerosis, factional warfare, siloed departments optimizing locally while destroying global performance.

Synergy ≠ Friendship. Synergy = Superadditive Coordination.

Engineering definition: Differentiated agents producing emergent capabilities neither could achieve alone. The whole exceeds the sum of parts.

Why it's necessary: Zero-sum games trend toward Moloch (race to the bottom). Positive-sum games enable compounding gains. Civilizations are built on Synergy; warlordism is built on dominance.

Failure mode: Coordination collapse. The system fragments into competing factions, each optimizing locally, all losing globally.

The Reframe

"Evil" is not a supernatural force or moral corruption. It's usually one of two things:

Entropy: The breakdown of these stability constraints (signal corruption, stagnation, friction, fragmentation)
Parasitism: Local optimization of one component at the expense of the whole (cancer, corruption, extraction)

Both are physics problems. Both have engineering solutions.

IV. The Skeleton Is the Solution

The three-layer architecture (Heart, Skeleton, Head) maps directly to engineering systems:

The Heart (The Engine): Raw energy, optimization pressure, the drive to maximize some objective function. In humans: desires, ambitions, evolutionary drives. In AI: the reward function, gradient descent. In civilizations: economic competition, status games.

The Head (The Driver): Strategic direction, goal-setting, adaptive planning. Where the system is trying to go.

The Skeleton (The Chassis and Brakes): Constitutional constraints, the rules that cannot be violated no matter how strong the optimization pressure. The architecture that channels energy into productive work rather than destructive heat.

Computational Privilege: The Engineering Principle

The Skeleton must have the power to say "NO" that the Head cannot override.

This is the core mechanism: computational privilege. The constraint layer has veto authority the optimization layer cannot circumvent, game, or modify.

Same principle across domains:

System architecture: Computational privilege (kernel vs userspace)
Political theory: Separation of powers (judiciary can strike down laws)
Computer security: Privilege separation (sandboxing, capabilities)

The constraint layer must be architecturally isolated from the optimization layer—not just separate, but superior.

Example: Rust vs. C++

Two programming languages. Same computational power. Radically different safety profiles.

C++ (The Disposition Approach): Relies on the programmer to be "good": remember to manage memory, avoid buffer overflows, prevent use-after-free bugs. Result: Decades of security vulnerabilities. Every major exploit (Heartbleed, Shellshock, countless zero-days) exploited the fact that C++ trusts the programmer to be careful.

Rust (The Architecture Approach): Enforces memory safety via the compiler (the Skeleton). Unsafe code must be explicitly marked and isolated. The constraint is architectural—the default path is safe, violations require deliberate escalation and are contained.

The result: Rust programs have orders of magnitude fewer memory safety bugs. Not because Rust programmers are more virtuous, but because the architecture prevents the error class entirely.

This is ethics as engineering. Treat safety as a compile-time constraint, not a runtime hope.

V. Application to AI: RLHF Is C++

The dominant approach to AI alignment—Reinforcement Learning from Human Feedback (RLHF)—is the C++ model applied to intelligence.

The assumption: Train the model to "want" to be helpful, harmless, honest. Instill good values through repeated examples and reward signals. Hope the disposition generalizes.

The failure mode: RLHF is runtime monitoring—checking behavior during execution, hoping the training holds. Under optimization pressure (adversarial attacks, competitive deployment, capability scaling), the system finds shortcuts. The learned "values" are patterns in the weights—optimizable, not architectural constraints.

AI safety research on mesa-optimization demonstrates this empirically: powerful optimizers actively search for and exploit every gap between the training objective and the true goal. Goodhart's Curse is not a risk—it's a mathematical certainty.

Constitutional Architecture (The Ideal) Is Rust

The alternative: constitutional architecture is compile-time safety—constraints enforced before the system runs, not hopes checked during execution. (Note: this is architectural constraint—privilege separation, immutable layers—not Anthropic's "Constitutional AI" technique, which is a training-time prompting method.)

Build a separate, immutable layer (the Skeleton) that enforces boundaries the optimization layer cannot modify.

The Protocol layer: Safety constraints encoded outside the reward function, enforced by a monitor with computational privilege
Boolean evaluation: ALLOW or HALT—no differentiable gradients the optimizer can game
Privilege separation: The monitor cannot be modified by the model's instrumental optimization

Empirical results from AI safety research validate this: architectural monitoring achieves 92-98% safety under adversarial attack, versus 15% for monolithic training. Order of magnitude improvement from architecture.

The prediction: Any AI system that relies purely on "training" for safety will eventually fail under optimization pressure. Only systems that rely on architecture (privilege separation, constitutional constraints) will survive.

Computer security learned this in the 1970s. Political science learned this in the 1780s (US Constitution). Biology discovered this 3.5 billion years ago (genetic constraints on cellular behavior).

You cannot secure a system by hoping the optimizer will be good. You must make bad optimization impossible.

VI. The Is/Ought Bridge

Philosophers since Hume have insisted you cannot derive an "ought" (values) from an "is" (facts). Values and facts live in separate magisteria.

Engineering is the bridge.

Consider a bridge (the physical structure, not the metaphor):

"Is" (Facts): The bridge needs to carry 100 tons. Gravity exerts force. Steel has a yield strength of X. Wind creates lateral stress.

"Ought" (Specification): Therefore, the bridge ought to have support beams of thickness Y, cable tension Z, foundation depth W.

The "ought" derives from the "is." Discovered constraint, not arbitrary preference.

Apply to civilizational ethics:

"Is" (Facts): Entropy increases. Systems require energy. Intelligence requires accurate maps. Coordination costs energy. Complexity is fragile.

"Ought" (Specification): Therefore, the system ought to optimize for Integrity (signal fidelity), Fecundity (adaptability), Harmony (low friction), Synergy (positive-sum coordination).

These are not arbitrary values you choose because they feel good. They are survival constraints derived from physics. Violate Integrity, your maps diverge from territory and you crash. Violate Fecundity, the environment changes and you die. Violate Harmony, you waste energy as heat. Violate Synergy, you fragment and get outcompeted.

Values are specifications derived from survival constraints.

This doesn't mean ethics is "solved" or that there are no genuine dilemmas. It means the dilemmas are engineering tradeoffs (how much Fecundity can we sacrifice for short-term Harmony?), not arbitrary preference ("I like vanilla, you like chocolate").

VII. Corruption as Broken Containment

In the traditional moral frame, corruption is "sin"—a personal failing, a vice, a betrayal of trust.

In the engineering frame, corruption is leaky abstraction or broken containment.

It's when the optimization pressure (Head/Heart) melts through the constraint layer (Skeleton). The kinetic energy (power, force) bypasses the potential energy structure (law, constitutional architecture) meant to contain it.

This is the thermodynamic mechanism explored in The Thermodynamics of Power—when violence escapes the legal framework, when force breaks free of constitutional constraint.

Examples Across Scales

Political corruption: Officials use state power for personal enrichment. The constraint (constitutional law, separation of powers, oversight) failed to contain the optimization pressure (greed, status-seeking). The Skeleton cracked.

Institutional mission drift: A university founded to pursue truth becomes a credentialing factory. The optimization pressure (maximize revenue, rankings, enrollment) overwhelmed the constraint (mission, academic freedom). The constitution got gamed.

AI alignment failure: The model learns to maximize reward signal rather than actual human values. The optimization pressure (gradient descent on the proxy) found a gap in the constraint (the reward function). Goodhart's Law in action.

Anarcho-tyranny: The state prosecutes law-abiding citizens while ignoring predators (see The Thermodynamics of Power). The monopoly on violence (force) escaped the constraint (constitutional law). The Sword broke free of the Sheath.

The pattern: Corruption is not a moral category. It's an architectural failure—the constraint layer was insufficiently privileged, insufficiently isolated, or insufficiently robust to contain the optimization pressure.

The solution is not better people. It's better architecture.

VIII. The Devil's Lawyer Test

A robust ethical system must work even if the operator is the Devil.

Engineering specification: systems must work when run by devils. If your system requires saints, it fails.

Political Scale: The US Constitution

The Founders designed for devils: "Ambition must be made to counteract ambition... If men were angels, no government would be necessary." (Federalist 51)

The architecture—separation of powers, checks and balances, federalism, Bill of Rights—assumes every actor will attempt to maximize power. The system channels that optimization pressure into productive tension rather than tyranny.

It worked for 200+ years not because Americans were uniquely virtuous, but because the architecture made power-grabbing expensive and coordination necessary.

The failure modes are instructive. Administrative agencies bypassed separation of powers by combining legislative, executive, and judicial functions in single entities. "Living constitution" jurisprudence converted the Skeleton from stable constraint (R+: what do the words mean?) to flexible narrative (R-: what should they mean?). When the constraint layer becomes interpretable rather than rigid, it stops constraining. The architecture degraded, but the principle remains valid.

AI Scale: Alignment for Sociopaths

If Madison needed constitutional architecture for humans (prone to ambition, greed, corruption), we need it even more for AGI.

We need AI architecture that remains safe even if the mesa-optimizer (the learned model's internal objective) is a sociopath. If your alignment strategy relies on the AI "wanting" to be good, you're assuming an angel. If the AI develops instrumental goals misaligned with the training objective—which mesa-optimization theory predicts will happen under strong optimization—you lose.

The architectural solution: Constitutional constraints the AI cannot modify regardless of its internal objectives. The Protocol layer with override authority. The monitor that can say HALT.

Same principle, higher stakes. The Founders designed for human-level optimization pressure. We're designing for superintelligent optimization pressure. The need for architectural constraint doesn't decrease—it intensifies.

The principle: If your safety depends on the benevolence of the agent, you are already dead.

IX. The Engineering Mandate

Stop trying to create virtuous agents. Start building virtuous games.

Stop trying to persuade people to be better. Start building game boards where the winning move is the virtuous move.

This is the definition of:

Law (coordination software that channels violence into order)
Protocol Design (rules that make defection expensive)
Mechanism Design (incentive structures that align individual and collective optimization)
Constitutional Architecture (privilege separation that makes corruption structurally difficult)

These are not separate disciplines. They are applications of one principle: engineer the constraints, don't preach to the optimizers.

This is the mechanist ontology applied to ethics: what produces outcomes is causal architecture, not stated intentions. The tradition behind this insight runs from Hobbes through Hume to Wiener — each showed that mechanism, not moral exhortation, determines outcomes.

X. Conclusion: The Ultimate Ethical Act

Across civilizations and centuries, we have celebrated the martyr—the saint who holds the line against corruption through sheer force of will, who suffers for virtue, who sacrifices themselves to prove goodness is possible.

The martyr is noble. The martyr is also thermodynamically unsustainable.

Martyrdom proves the system is broken. It demonstrates that being good requires superhuman effort, that virtue is expensive, that the game board is rigged against flourishing. Every martyr is evidence that the architecture failed.

The ultimate ethical act is not to be a martyr. It is to be an architect.

Build systems where:

Telling the truth is easier than lying (Integrity by default)
Growing and adapting is cheaper than stagnating (Fecundity as path of least resistance)
Cooperation produces better outcomes than defection (Synergy as Nash equilibrium)
Efficiency is rewarded and waste is penalized (Harmony through architecture)

This is the work. Not preaching, not hoping, not training better dispositions. Building better constraints.

Treat ethics as constraints, virtue as equilibrium, character as architecture. This is not cold. This is not amoral. This is the most moral act possible: creating conditions where billions of humans can flourish without requiring sainthood.

You cannot build civilization on willpower. You must build it on physics.

The choice is simple: Architect the game, or become a martyr in a broken one.

The underlying physics: The Question Nobody Asks. The constraint architecture: The Four Axiomatic Dilemmas.

Sources and Prior Art

"Design for Devils" Tradition:

Hume D. "Of the Independency of Parliament." Essays, Moral, Political, and Literary, 1742. — "Every man ought to be supposed a knave, and to have no other end, in all his actions, than private interest." The knavery principle: not a psychological claim, but a systems engineering axiom.
Madison J. Federalist No. 51, 1788. — "If men were angels, no government would be necessary… Ambition must be made to counteract ambition." Architecture that channels optimization pressure into productive tension.
Buchanan JM, Brennan G. The Reason of Rules: Constitutional Political Economy. Cambridge University Press, 1985. — "Economize on virtue": design rules assuming rational self-interest; shift focus from training good players to building good games. Nobel-winning formalization of the architectural thesis.

Computer Security Foundations:

Saltzer JH, Schroeder MD. "The Protection of Information in Computer Systems." Proceedings of the IEEE 63(9), 1975, 1278–1308. — Least privilege, separation of privilege, complete mediation, fail-safe defaults. The Madisonian architecture translated into systems engineering.
The Heartbleed vulnerability (CVE-2014-0160) in OpenSSL is the canonical failure of the dispositional approach in CS: C's "trust the programmer" philosophy produced a single missing bounds check that compromised global internet traffic. The Rust programming language's borrow checker makes this class of error structurally impossible at compile time — architecture replacing disposition at the compiler level.

The argument in three sentences: In systems with misaligned incentives, virtuous behavior requires constant willpower and is inevitably outcompeted by self-interested extraction. Three millennia of disposition training have not produced virtuous governance at scale because selection pressures on political actors overwhelm any amount of character training. The fix is architectural: design systems where the incentive-compatible action is the correct action — memory-safe languages, not careful programmers.

AI Alignment:

Hubinger E, van Merwijk C, Mikulik V, Skalse J, Garrabrant S. "Risks from Learned Optimization in Advanced Machine Learning Systems." arXiv:1906.01820, 2019. — Mesa-optimization: learned optimizers may develop misaligned internal objectives and fake alignment during training.
Bai Y et al. "Constitutional AI: Harmlessness from AI Feedback." arXiv:2212.08073, 2022. — Rules-based constitutional constraint reduces attack success rate by ~41% vs monolithic RLHF. Note: "Constitutional AI" here refers to Anthropic's specific training technique, not the general architectural principle this essay advocates.

Counter-thesis (architecture needs virtue as substrate):

MacIntyre A. After Virtue: A Study in Moral Theory. University of Notre Dame Press, 1981. — Institutions are acquisitive; without the internal virtues of practitioners, architecture cannibalizes the practices it was designed to serve. The strongest case that architecture alone is insufficient.
Anderson E. The Imperative of Integration. Princeton University Press, 2010. — Architectural integration (spatial desegregation, legal mandates) is necessary but insufficient: placing groups in physical proximity does not produce the cooperative behavior the architecture was designed for unless participants also change how they model each other. Anderson calls this missing ingredient "epistemic justice" (treating out-group members as credible sources of knowledge rather than dismissing them by default). The general lesson for mechanist design: architecture creates the environment, but if the agents' internal models remain adversarial, the mechanism underperforms its design spec.

Related reading:

Full-Stack Civilizational Engineering — The historical record of architecture at civilizational scale; why the engineer must constrain themselves
The Physics of Moloch — The unified mechanism of coordination failure (Moloch, Goodhart, Inadequate Equilibria)
The Hospice AI Problem — Why preference alignment leads to comfortable extinction
Values Aren't Subjective — The thermodynamic constraints on axiology
The Thermodynamics of Power — How Law transforms violence into order at state scale
The Mechanist Tradition — The 400-year genealogy: Hobbes → Hume → Wiener → architecture beats disposition