The Halting Problem of Law

Why nobody knows what the laws are—and why the system prefers it that way

I. The Constitutional Hallucination

A citizen wants to know what the law says about their situation. They go to Finlex, Finland's official legal database. At the bottom of every consolidated statute, they find this disclaimer:

"Oikeusministeriö ei vastaa virheistä." (The Ministry of Justice does not guarantee accuracy.)

The official government database doesn't know what the law says. And it explicitly refuses to be held accountable for errors.

This isn't a Finnish peculiarity. EUR-Lex, the European Union's legal database, states that consolidated texts have "no legal effect"—only the original amending acts published in the Official Journal are authoritative. To know EU law definitively, you must read the base act plus every subsequent amendment PDF and mentally apply the changes yourself.

The working assumption is that consolidation—combining a base law with its amendments to produce current text—is a clerical chore. Something computers should handle easily. The reality is that it's computationally impossible to do reliably.

II. Syntactic vs. Semantic: The Architectural Flaw

In software, version control works because changes are syntactic operations:

"Replace line 47 with X"
"Delete lines 10-15"
"Insert Y after line 20"

Any computer can apply these. Same input produces same output. Deterministic.

Legal amendments are semantic operations:

"Replace 'child' with 'person' everywhere except in §4(2)"
"The provisions of the 1985 law §12, as they were on 1.1.2010, shall apply"
"Add exception for cases where the general principle of administrative law applies"

These require interpretation. What does "everywhere" cover—references in other laws? Implicit references? Cross-references that reference this section? What was the 1985 law's state on a specific date, and how was that derived? What are "general principles of administrative law"—they're not enumerated anywhere, they evolve through case law, and different jurists disagree.

The problem isn't append-only architecture (that works fine in databases). The problem is that amendments are semantic instructions rather than syntactic diffs. Semantic operations require human judgment. Human judgment produces inconsistent results.

III. The Compile-Time Problem

In software development:

The compiler forces ambiguity resolution before execution
If code is ambiguous, it doesn't compile
Errors are caught before deployment

In law:

The "compiler" is the Judge
The Judge only runs during a lawsuit (runtime)
And only on the specific provisions that caused the "crash"

The vast majority of legal code is never compiled. It exists in a quantum superposition of "maybe valid, maybe not" until someone gets sued. A 2025 French study attempted to automate consolidation using GPT-4 and achieved only 63% accuracy on complex bills—the remaining 37% included "hallucinations" where the model invented plausible but incorrect legal phrasing.

If state-of-the-art AI can't reliably consolidate legislation, the problem isn't inadequate tooling. The problem is that the input data is architecturally incompatible with deterministic processing.

IV. Error Accumulation

Model consolidation as a state machine:

S₀ = Initial Law
δ = Transition Function (Amendment)
S₁ = δ(S₀, A₁)
S₂ = δ(S₁, A₂)
...
Sₙ = δ(δ(δ(S₀, A₁), A₂), A₃)...

If δ is ambiguous (semantic operation), uncertainty compounds through the chain. Finnish tax law has undergone 50+ amendments over decades. Each δ introduces interpretation variance. By amendment 50, the "current state" is unknowable without an oracle (a court ruling).

The test: Give the same base law and amendments to five law firms. Ask each to produce a consolidated text. Will you get five bit-for-bit identical files?

Almost certainly not. The "current state of the law" is not a deterministic value—it's a probability distribution across possible interpretations.

V. The Intent Trap

In software:

// This function calculates tax (intent)
return x * 0.25; // (reality)

Intent is a comment. Code is reality. If the comment says "calculates tax" but the code returns a random number, the code wins.

In law, intent (esityöt/preparatory documents) is treated as part of the source code. "What did the legislators mean?" requires consulting committee reports, parliamentary debates, and expert opinions.

The bug: you cannot mechanically merge intent. You can only merge text. By treating intent as resolvable, the system guarantees non-determinism. Even if you could query the original legislators:

They may disagree with each other
They may be dead
Their "intent" may have been a political compromise with no coherent meaning

VI. The Oracle Is Also Broken

The Supreme Court is supposed to be the final oracle—the authority that resolves ambiguity definitively. But oracles produce split decisions. 5-4 rulings are common. The nine "most qualified" legal minds in the country cannot agree on what the law means.

If they can't agree, what does "correct answer" even mean?

Worse: the law is temporally unstable. Roe v. Wade (1973) stood for 49 years as "settled law" before Dobbs (2022) reversed it. Plessy v. Ferguson (1896) lasted 58 years before Brown v. Board (1954). Same constitutional text, different answers.

Law(case, t) = f(court_composition_at_t)
Law(case, t+50) ≠ Law(case, t)

One justice dying literally changes what the law "is." This isn't discovery of truth—it's creation by majority vote among a politically-selected group, revocable by future majority votes.

Prediction models achieve 70%+ accuracy forecasting Supreme Court rulings based solely on political variables. If an algorithm can predict rulings without reading the legal arguments, the written opinions are post-hoc rationalization, not causal reasoning.

The system is:

Non-deterministic: Multiple valid outputs exist at time T
Temporally unstable: Output at T can be invalidated at T+n
Non-monotonic: Past "settled" answers can be declared wrong

Stare decisis is a norm, not a guarantee. The law is whatever five people agree it is today, until five different people disagree tomorrow.

VII. The Economic Cost of Pretending

The Confederation of Finnish Industries estimates regulatory compliance costs Finnish businesses 5-7 billion euros annually. A significant portion isn't the cost of following clear rules—it's the cost of figuring out what the rules say.

This "interpretation tax" includes:

Legal fees for navigating ambiguity
Delayed decisions while awaiting clarification
Abandoned projects in uncertain regulatory zones
Risk premiums for operating where law is unclear

The Standard Cost Model used by the OECD breaks administrative burden into "familiarization costs" (understanding obligations) and "substantive costs" (fulfilling them). If law isn't consolidated reliably, familiarization costs explode. Reading base_act.pdf + amendment_A.pdf + amendment_B.pdf + amendment_C.pdf takes far longer than reading consolidated_law.pdf.

Because EUR-Lex and Finlex explicitly disclaim accuracy, businesses operate in a liability grey zone. Follow the government's consolidated text, get fined because it contained an error, and your defense ("I relied on the official database") fails because the database warned it wasn't authoritative.

This forces "double verification"—checking official gazettes against consolidated texts—or purchasing expensive private legal databases. The state produces raw material (acts) but refuses to guarantee the finished product (consolidated law), creating a market for legal certainty that shouldn't need to exist.

VIII. The "Flexibility Is Good" Defense

Lawyers will object: "We need discretion! Rigid rules can't handle edge cases! Flexibility serves justice!"

The counter: Discretion belongs in application (judicial), not consolidation (clerical).

These are different operations:

Consolidation: What does the law say? (Should be deterministic)
Application: How does this law apply to this case? (Can involve judgment)

We need to know WHAT the text is before deciding HOW to apply it. Currently we argue about both simultaneously. This is like debating what the source code says and what it should do at the same time.

In functional programming, you isolate pure functions from IO—side effects are contained in explicit monads, not scattered throughout. Good legal architecture does the same: isolate the discretionary. Mark exactly where human judgment enters. Keep everything else deterministic.

Formalizability varies by domain, not provision type. Empirical data:

Tax & Benefits: 90-99% formalizable. OpenFisca encodes 3,963 legislative elements of France's socio-fiscal system. U.S. Tax Court: 99.2% of cases settle without trial—the outcome is predictable enough that parties don't fight. Catala runs in production at French tax authority.
Administrative regulations: 60-70%. Information-theoretic analysis shows regulations have high compression factors (code-like structure). But Chevron data: in contested cases, only 32.8% of statutes were "clear"—66.4% required interpretation.
Sentencing: 64-80%, with a floor. U.S. Sentencing Guidelines adherence: 80.6% (1991) → 63.9% (2001) → ~42% (2024, advisory era). Even mandatory guidelines hit a "20% discretionary floor"—judges found reasons to depart.
Constitutional/Civil: <10%. High vocabulary entropy, low compression. "Due process" and "reasonable care" are semantic, not syntactic. These function like literature, not code.

The Chevron 66.4% "ambiguous" rate measures contested cases—the hard edge. The Tax Court 99.2% settlement rate measures the base—where the law is clear enough that parties don't litigate. Both are true: most tax law is deterministic, but the edges of administrative law are genuinely ambiguous.

By making consolidation non-deterministic, we've imported judgment into what should be a mechanical operation. The flexibility lawyers value in application has leaked into the infrastructure layer, corrupting the foundation.

IX. Power Is the Substrate

Legal non-determinism isn't an accident. It's a power architecture.

Ambiguity favors the powerful:

Those with resources can argue for favorable interpretations
Expensive lawyers find "creative readings"
The poor get default (unfavorable) interpretations
Same text, different outcomes based on who's arguing

If law is an NP-hard search problem, "justice" becomes a function of computational power. In legal context, compute equals billable hours. A litigant with unlimited resources can pay a team to exhaustively search the decision tree for winning arguments, obscure precedents, or procedural loopholes.

This creates regressive distribution of legal error. The poor are subject to "rough justice"—heuristics, plea bargains, summary judgments—because they can't afford the compute required to verify their rights. The rich purchase "precision justice"—exhaustive procedural verification and exploration of every interpretive branch.

Deterministic law would constrain power:

Formal specifications leave no room for "interpretation"
Version control prevents rewriting history
The powerful couldn't purchase favorable readings
Law would actually bind equally

This explains why the obvious fix hasn't happened. It's not ignorance or technical difficulty. Ambiguity is load-bearing for the current power structure. Legal interpretation is a power-hiding mechanism. "We're just finding the correct meaning" obscures "we're deciding who wins."

Making law deterministic would be power-revealing. It would show who benefits from current rules. It would constrain discretion. It would make law actually bind the powerful. It would reduce the value of expensive lawyers.

The resistance to formal legal infrastructure isn't conservative inertia. It's that the Head (power) will resist any Skeleton-strengthening move (constitutional constraint).

X. The Full Law-Power Chain

Non-determinism enters at every stage:

Creation: Legislature writes semantic amendments. Ambiguity may be intentional (defer hard choices) or accidental (rushed drafting). Either way, interpretation is delegated downstream.
Publication: Official databases disclaim accuracy. Citizens can't know what the law says. The state refuses to guarantee its own rules.
Interpretation: Courts resolve ambiguity—but produce split decisions, reverse precedents, and vary by composition. The "oracle" is probabilistic.
Application: Who can afford to explore the decision tree? Rich litigants purchase precision justice; the poor get rough justice. Same text, different outcomes based on compute budget.
Oversight: Who holds courts accountable? Judicial independence protects against political interference but also shields patterns of discretion from scrutiny. (See: Theatrical Accountability.)
Amendment: Can the system correct itself? Constitutional lock-in and veto players prevent adaptation.

This essay focuses on stages 2-3 (publication and interpretation). But even perfect consolidation wouldn't solve stages 4-6. And there's a deeper problem: words don't control behavior. The causal chain isn't "law → behavior" but "incentives + architecture + selection → behavior." Law is epiphenomenal—it describes outcomes, doesn't cause them.

Solving the Halting Problem of Law is necessary but not sufficient. You need to know what the law says (this essay's focus). But you also need mechanisms that make the law effective—skin in the game, automatic triggers, exit rights. A right without a mechanism is a wish.

XI. The Estonian Half-Solution

Estonia has made progress. Since 2010, the electronic Riigi Teataja has been the only official publication, and consolidated texts are legally binding.

But Estonia still has consolidation. Amendments are still semantic instructions that must be interpreted and applied to produce consolidated text. The state just takes responsibility for the result instead of disclaiming it.

This is better than nothing—it shifts liability from citizen to state, which creates an incentive for accuracy. But it doesn't solve the underlying problem. The consolidation step still exists. Interpretation still happens. Errors are still possible. Estonia has moved the risk, not eliminated it.

The real solution is more radical: eliminate consolidation entirely.

Instead of: Base Law → Semantic Amendment → Interpretation → Consolidated Text

Do: Database State → Direct Edit → New Database State

No interpretation step. No consolidation. The law IS the database. Amendments are commits, not instructions. Parliament votes on the diff, not a description of the diff. The "consolidated text" is just the current state of the repository—always authoritative, always current, no interpretation required.

Estonia proves the problem is political, not technical. But Estonia is step zero (accept liability for consolidation). The full solution is step one (no consolidation needed).

XII. Architecture That Forces Honesty

The system currently pretends courts "discover" eternal truths. How do you architect a system where this pretense becomes impossible?

Principle: Don't ask them to stop pretending. Make pretending architecturally infeasible.

1. Immutable version control. Every change cryptographically timestamped and signed. You can't claim "the law always meant X" when git log shows otherwise. History becomes undeniable fact, not narrative.

2. Court composition tagging. Every decision tagged with judges and vote split. "Decided by [names], vote [5-4]." Automatically visible: this isn't "the law"—it's what these five people decided. When composition changes, flag all rulings at reversal risk.

3. Mandatory probability disclosure. Before ruling, prediction market odds recorded. After ruling, accuracy tracked. System shows: "This court's rulings match market predictions 73% of the time." Undeniable proof the system is probabilistic.

4. Interpretation cloud. For ambiguous sections, show ALL valid interpretations. "This clause has 3 valid readings. Courts have chosen [A] 60%, [B] 30%, [C] 10%." You can't pretend one truth when the UI shows three options with percentages.

5. Precedent stability flags. Algorithm detects: "This ruling was 5-4. Two majority justices replaced. Reversal probability: 67%." You can't call it "settled law" when labeled "UNSTABLE - HIGH REVERSAL RISK."

6. Sunset clauses on precedent. Rulings expire in 20 years unless explicitly renewed. Forces acknowledgment: "We're choosing to keep this" not "This is eternal truth." Each generation must actively affirm, not passively inherit.

7. Formal specifications where possible. For calculation laws (taxes, benefits): the specification IS the law. No room for interpretation—the code runs or it doesn't. Projects like Catala have formalized parts of French tax code and US benefits law, revealing bugs and inconsistencies that human drafters missed.

The meta-principle: architecture forces honesty. The probabilistic, power-contingent, temporally-unstable nature of law becomes visible through the interface itself. You cannot maintain the pretense when the system shows you the probability cloud.

XIII. The Solution

Recall the empirical partition from Section VIII:

Tax & Benefits (90-99%) — Catala/OpenFisca territory. Fully formalizable.
Administrative regulations (60-70%) — High compression, but contested edges remain ambiguous.
Sentencing (64-80%) — Algorithmic with a "20% discretionary floor."
Constitutional/Civil (<10%) — High entropy. Literature, not code.

Each domain needs different treatment:

Tax & Benefits: Fully deterministic. Law lives in a version-controlled database. Amendments are direct edits, not semantic instructions. No consolidation step—the repository state IS the law. Parliament votes on diffs. The Catala approach: specification IS the law, natural language is derived. This is proven technology—France runs it in production.

Administrative regulations: Formalize the 60-70% that has code-like structure. For the contested edges, bound discretion explicitly: not "reasonable" but "factors A, B, C with weights." Log every application. Make patterns visible.

Sentencing: Accept the 20% discretionary floor—judges will depart when the algorithm produces injustice. But require explicit justification. Tag every departure with reasoning. Enable statistical analysis: which judges depart, for whom, how often?

Constitutional/Civil: Accept that "cruel and unusual" must evolve. But make interpretation visible. Track how meaning drifts over time. Enable statistical analysis of who gets which interpretation. The discretion remains—the hiding doesn't.

The meta-principle: Isolate the impure. Like Haskell separates IO from pure functions, good legal architecture separates the discretionary from the deterministic. Mark exactly where human judgment enters. Keep everything else mechanical.

Estonia is step zero: accept liability for consolidation. The full solution is step one: eliminate consolidation entirely for the 90%+ formalizable domains (tax, benefits). Then bound and log the rest.

The halting problem of law is solvable—for most of it. Tax and benefits: 90-99% formalizable, proven in production. Administrative regulations: 60-70%. Even sentencing can reach 80% with bounded discretion. Only constitutional and civil law remain genuinely open-textured. Eliminate consolidation where possible. Bound and log discretion everywhere else. The goal isn't eliminating all judgment—it's isolating judgment to where it belongs and making its exercise visible.

The transition to computational jurisprudence is inevitable. The question is whether that code will be a black box of control for the powerful, or an open-source platform for democratic accountability.

References

EUR-Lex Consolidation Disclaimer: eur-lex.europa.eu/consolidation — "Consolidated text is meant purely as a documentation tool and has no legal effect."
Finnish Legal Database (Finlex): finlex.fi — Ministry of Justice does not guarantee accuracy of consolidated texts.
Estonian Riigi Teataja: riigiteataja.ee — Since 2010, consolidated texts are legally binding. A half-solution: moves liability to the state, but still requires consolidation.
French AI Consolidation Study (2025): arXiv:2501.16794 — GPT-4 achieved ~63% accuracy on complex legislative consolidation.
Catala Programming Language: INRIA Catala Project — Domain-specific language for formalizing tax and benefits law. Academic paper. 1.5:1 code-to-law expansion ratio.
OpenFisca: openfisca.readthedocs.io — 3,963 coded legislative elements of France's socio-fiscal system. Powers Mes Aides and LexImpact.
Chevron Deference Empirics: Bednar, "Chevron on the Eve of Loper Bright" — 32.8% of contested statutes clear (Step 1), 66.4% ambiguous (Step 2).
U.S. Sentencing Guidelines Data: USSC Downward Departures Report — Adherence: 80.6% (1991) → 63.9% (2001). The "20% discretionary floor."
Legal Entropy Analysis: Friedrich, "Complexity and Entropy in Legal Language" — Regulations: high compression (code-like). Constitutions: high entropy (literature-like).
Regulatory Costs (Finland): Confederation of Finnish Industries (EK) — €5-7 billion annual compliance costs.
OECD Rules as Code: "Cracking the Code: Rulemaking for Humans and Machines"
Supreme Court Prediction: Katz, Bommarito & Blackman — "A general approach for predicting the behavior of the Supreme Court" — 70%+ accuracy from political variables alone.

This essay provides the theoretical foundation for Lainsäädäntöinfrastruktuuri, a proposal for version-controlled legislation with formal specifications. It draws from Aliveness: Principles of Telic Systems.

Related reading:

Laws Are the Wrong Abstraction — Even if you solve the Halting Problem, words don't control behavior. The full case for governance-as-engineering.
Theatrical Accountability — Stage 5 of the law-power chain: the five-layer stack of elite immunity
The Vibes Constitution — How Finland built a mood board instead of a legal system
Ethics Is an Engineering Problem — Why architecture determines outcomes more than intention