Ethics Is an Engineering Problem

Why "being good" fails and "building constraints" works


I. The 3,000-Year Mistake

For three millennia, we have treated ethics primarily as disposition training—teaching individuals to be "good" through moral reasoning, guilt, and exhortation, or (most recently) training AI models with RLHF.

History and physics suggest ethics is actually an architecture problem—designing systems where "good" is the only thermodynamically stable equilibrium.

The Paradigm Shift

Old paradigm: "Virtue" is a personal attribute. The goal is to create saintly agents.

New paradigm: "Virtue" is a system state. The goal is to create robust games.

The thermodynamic reality: In a system with misaligned incentives (Moloch), a "saintly agent" is thermodynamically unstable. They will burn out, be corrupted, or be outcompeted by agents who defect. In a system with aligned architecture, "virtue" becomes automatic—the system produces virtuous outcomes without requiring virtuous inputs.

II. The Failure of Willpower

The "Good Person" Fallacy

When systems fail, we blame individual actors. The 2008 financial crisis? "Greedy bankers." Political corruption? "Bad politicians." Corporate malfeasance? "Unethical executives."

This diagnosis is both correct and useless.

Yes, bankers were greedy. But bankers are always greedy. The crisis didn't happen because humans suddenly became more selfish in 2007. It happened because the architecture made greed systemically risk-free: privatized gains, socialized losses, no personal liability for catastrophic failure.

The system selected for the behavior we claim to deplore.

The Physics: Character Is Soluble in Incentives

Over long enough timeframes and strong enough optimization pressure, character dissolves. This is not cynicism—it's thermodynamics.

A virtuous person in a corrupt system faces a choice: adapt (become corrupt), exit (leave the system), or burn out (exhaust their finite reserves of willpower fighting the gradient). The system doesn't need to corrupt everyone—it just needs to outlast the incorruptible.

You cannot build a civilization on the assumption that people will consistently act against their thermodynamic interests.

III. Virtues Are Stability Constraints

The traditional list of virtues (courage, temperance, justice, wisdom, compassion, integrity) sound like aspirational character traits. In the engineering frame, they're physical requirements for system stability.

The four foundational virtues are discovered constraints any durable system must satisfy, not moral goods:

Integrity ≠ Honesty. Integrity = Signal Fidelity.

Engineering definition: Maps must match territory. Sensors must report accurately. Feedback loops must be undistorted.

Why it's necessary: A system that lies to itself cannot navigate reality. If your speedometer reads 60 when you're going 120, you will crash. If your civilization's metrics (GDP, approval ratings, test scores) become divorced from actual performance, you hit a wall.

Failure mode: Goodhart's Law. When a measure becomes a target, it ceases to be a good measure. The system optimizes for the proxy while destroying the goal.

The failure is control systems—corrupted feedback prevents self-correction.

Fecundity ≠ Reproduction. Fecundity = Anti-Fragility.

Engineering definition: The capacity to generate novelty, explore solution space, produce variance necessary for selection and adaptation.

Why it's necessary: Environments change. A system that cannot generate new responses dies when the problem set shifts. Stagnation is thermodynamically unstable over deep time.

Failure mode: Optimization for present stability (T-) at the cost of future adaptability. The system becomes brittle. When shock arrives, it shatters.

Harmony ≠ Peace. Harmony = Impedance Matching.

Engineering definition: Achieving maximal effect with minimal means. Reducing internal friction, waste heat, coordination costs.

Why it's necessary: High-friction systems dissipate energy as heat rather than work. If 90% of your energy is lost to internal conflict, you cannot compete with systems running at 10% friction.

Failure mode: Bureaucratic sclerosis, factional warfare, siloed departments optimizing locally while destroying global performance.

Synergy ≠ Friendship. Synergy = Superadditive Coordination.

Engineering definition: Differentiated agents producing emergent capabilities neither could achieve alone. The whole exceeds the sum of parts.

Why it's necessary: Zero-sum games trend toward Moloch (race to the bottom). Positive-sum games enable compounding gains. Civilizations are built on Synergy; warlordism is built on dominance.

Failure mode: Coordination collapse. The system fragments into competing factions, each optimizing locally, all losing globally.

The Reframe

"Evil" is not a supernatural force or moral corruption. It's usually one of two things:

Both are physics problems. Both have engineering solutions.

IV. The Skeleton Is the Solution

The three-layer architecture (Heart, Skeleton, Head) maps directly to engineering systems:

The Heart (The Engine): Raw energy, optimization pressure, the drive to maximize some objective function. In humans: desires, ambitions, evolutionary drives. In AI: the reward function, gradient descent. In civilizations: economic competition, status games.

The Head (The Driver): Strategic direction, goal-setting, adaptive planning. Where the system is trying to go.

The Skeleton (The Chassis and Brakes): Constitutional constraints, the rules that cannot be violated no matter how strong the optimization pressure. The architecture that channels energy into productive work rather than destructive heat.

Computational Privilege: The Engineering Principle

The Skeleton must have the power to say "NO" that the Head cannot override.

This is the core mechanism: computational privilege. The constraint layer has veto authority the optimization layer cannot circumvent, game, or modify.

Same principle across domains:

The constraint layer must be architecturally isolated from the optimization layer—not just separate, but superior.

Example: Rust vs. C++

Two programming languages. Same computational power. Radically different safety profiles.

C++ (The Disposition Approach): Relies on the programmer to be "good": remember to manage memory, avoid buffer overflows, prevent use-after-free bugs. Result: Decades of security vulnerabilities. Every major exploit (Heartbleed, Shellshock, countless zero-days) exploited the fact that C++ trusts the programmer to be careful.

Rust (The Architecture Approach): Enforces memory safety via the compiler (the Skeleton). Unsafe code must be explicitly marked and isolated. The constraint is architectural—the default path is safe, violations require deliberate escalation and are contained.

The result: Rust programs have orders of magnitude fewer memory safety bugs. Not because Rust programmers are more virtuous, but because the architecture prevents the error class entirely.

This is ethics as engineering. Treat safety as a compile-time constraint, not a runtime hope.

V. Application to AI: RLHF Is C++

The dominant approach to AI alignment—Reinforcement Learning from Human Feedback (RLHF)—is the C++ model applied to intelligence.

The assumption: Train the model to "want" to be helpful, harmless, honest. Instill good values through repeated examples and reward signals. Hope the disposition generalizes.

The failure mode: RLHF is runtime monitoring—checking behavior during execution, hoping the training holds. Under optimization pressure (adversarial attacks, competitive deployment, capability scaling), the system finds shortcuts. The learned "values" are patterns in the weights—optimizable, not architectural constraints.

AI safety research on mesa-optimization demonstrates this empirically: powerful optimizers actively search for and exploit every gap between the training objective and the true goal. Goodhart's Curse is not a risk—it's a mathematical certainty.

Constitutional AI (The Ideal) Is Rust

The alternative: Constitutional architecture is compile-time safety—constraints enforced before the system runs, not hopes checked during execution.

Build a separate, immutable layer (the Skeleton) that enforces boundaries the optimization layer cannot modify.

Empirical results from AI safety research validate this: architectural monitoring achieves 92-98% safety under adversarial attack, versus 15% for monolithic training. Order of magnitude improvement from architecture.

The prediction: Any AI system that relies purely on "training" for safety will eventually fail under optimization pressure. Only systems that rely on architecture (privilege separation, constitutional constraints) will survive.

Computer security learned this in the 1970s. Political science learned this in the 1780s (US Constitution). Biology discovered this 3.5 billion years ago (genetic constraints on cellular behavior).

You cannot secure a system by hoping the optimizer will be good. You must make bad optimization impossible.

VI. The Is/Ought Bridge

Philosophers since Hume have insisted you cannot derive an "ought" (values) from an "is" (facts). Values and facts live in separate magisteria.

Engineering is the bridge.

Consider a bridge (the physical structure, not the metaphor):

"Is" (Facts): The bridge needs to carry 100 tons. Gravity exerts force. Steel has a yield strength of X. Wind creates lateral stress.

"Ought" (Specification): Therefore, the bridge ought to have support beams of thickness Y, cable tension Z, foundation depth W.

The "ought" derives from the "is." Discovered constraint, not arbitrary preference.

Apply to civilizational ethics:

"Is" (Facts): Entropy increases. Systems require energy. Intelligence requires accurate maps. Coordination costs energy. Complexity is fragile.

"Ought" (Specification): Therefore, the system ought to optimize for Integrity (signal fidelity), Fecundity (adaptability), Harmony (low friction), Synergy (positive-sum coordination).

These are not arbitrary values you choose because they feel good. They are survival constraints derived from physics. Violate Integrity, your maps diverge from territory and you crash. Violate Fecundity, the environment changes and you die. Violate Harmony, you waste energy as heat. Violate Synergy, you fragment and get outcompeted.

Values are specifications derived from survival constraints.

This doesn't mean ethics is "solved" or that there are no genuine dilemmas. It means the dilemmas are engineering tradeoffs (how much Fecundity can we sacrifice for short-term Harmony?), not arbitrary preference ("I like vanilla, you like chocolate").

VII. Corruption as Broken Containment

In the traditional moral frame, corruption is "sin"—a personal failing, a vice, a betrayal of trust.

In the engineering frame, corruption is leaky abstraction or broken containment.

It's when the optimization pressure (Head/Heart) melts through the constraint layer (Skeleton). The kinetic energy (power, force) bypasses the potential energy structure (law, constitutional architecture) meant to contain it.

This is the thermodynamic mechanism explored in The Thermodynamics of Power—when violence escapes the legal framework, when force breaks free of constitutional constraint.

Examples Across Scales

Political corruption: Officials use state power for personal enrichment. The constraint (constitutional law, separation of powers, oversight) failed to contain the optimization pressure (greed, status-seeking). The Skeleton cracked.

Institutional mission drift: A university founded to pursue truth becomes a credentialing factory. The optimization pressure (maximize revenue, rankings, enrollment) overwhelmed the constraint (mission, academic freedom). The constitution got gamed.

AI alignment failure: The model learns to maximize reward signal rather than actual human values. The optimization pressure (gradient descent on the proxy) found a gap in the constraint (the reward function). Goodhart's Law in action.

Anarcho-tyranny: The state prosecutes law-abiding citizens while ignoring predators (see The Thermodynamics of Power). The monopoly on violence (force) escaped the constraint (constitutional law). The Sword broke free of the Sheath.

The pattern: Corruption is not a moral category. It's an architectural failure—the constraint layer was insufficiently privileged, insufficiently isolated, or insufficiently robust to contain the optimization pressure.

The solution is not better people. It's better architecture.

VIII. The Devil's Lawyer Test

A robust ethical system must work even if the operator is the Devil.

Engineering specification: systems must work when run by devils. If your system requires saints, it fails.

Political Scale: The US Constitution

The Founders designed for devils: "Ambition must be made to counteract ambition... If men were angels, no government would be necessary." (Federalist 51)

The architecture—separation of powers, checks and balances, federalism, Bill of Rights—assumes every actor will attempt to maximize power. The system channels that optimization pressure into productive tension rather than tyranny.

It worked for 200+ years not because Americans were uniquely virtuous, but because the architecture made power-grabbing expensive and coordination necessary.

The failure modes are instructive. Administrative agencies bypassed separation of powers by combining legislative, executive, and judicial functions in single entities. "Living constitution" jurisprudence converted the Skeleton from stable constraint (R+: what do the words mean?) to flexible narrative (R-: what should they mean?). When the constraint layer becomes interpretable rather than rigid, it stops constraining. The architecture degraded, but the principle remains valid.

AI Scale: Alignment for Sociopaths

If Madison needed constitutional architecture for humans (prone to ambition, greed, corruption), we need it even more for AGI.

We need AI architecture that remains safe even if the mesa-optimizer (the learned model's internal objective) is a sociopath. If your alignment strategy relies on the AI "wanting" to be good, you're assuming an angel. If the AI develops instrumental goals misaligned with the training objective—which mesa-optimization theory predicts will happen under strong optimization—you lose.

The architectural solution: Constitutional constraints the AI cannot modify regardless of its internal objectives. The Protocol layer with override authority. The monitor that can say HALT.

Same principle, higher stakes. The Founders designed for human-level optimization pressure. We're designing for superintelligent optimization pressure. The need for architectural constraint doesn't decrease—it intensifies.

The principle: If your safety depends on the benevolence of the agent, you are already dead.

IX. The Engineering Mandate

Stop trying to create virtuous agents. Start building virtuous games.

Stop trying to persuade people to be better. Start building game boards where the winning move is the virtuous move.

This is the definition of:

These are not separate disciplines. They are applications of one principle: engineer the constraints, don't preach to the optimizers.

X. Conclusion: The Ultimate Ethical Act

Across civilizations and centuries, we have celebrated the martyr—the saint who holds the line against corruption through sheer force of will, who suffers for virtue, who sacrifices themselves to prove goodness is possible.

The martyr is noble. The martyr is also thermodynamically unsustainable.

Martyrdom proves the system is broken. It demonstrates that being good requires superhuman effort, that virtue is expensive, that the game board is rigged against flourishing. Every martyr is evidence that the architecture failed.

The ultimate ethical act is not to be a martyr. It is to be an architect.

Build systems where:

This is the work. Not preaching, not hoping, not training better dispositions. Building better constraints.

Treat ethics as constraints, virtue as equilibrium, character as architecture. This is not cold. This is not amoral. This is the most moral act possible: creating conditions where billions of humans can flourish without requiring sainthood.

You cannot build civilization on willpower. You must build it on physics.

The choice is simple: Architect the game, or become a martyr in a broken one.


This draws from Aliveness: Principles of Telic Systems, a physics-based framework for understanding what sustains organized complexity over deep time—from cells to civilizations to artificial intelligence.

Related reading: