ARMORIQ

The Problem With AI Guardrails Isn’t Safety. It’s Architecture.

The question hiding underneath the Fable debate

Jun 11, 20267 min read
The Problem With AI Guardrails Isn’t Safety. It’s Architecture.// Cover

The recent debate around Anthropic’s Fable model has largely been framed as a debate about safety.

Researchers are questioning restrictions embedded inside the model. Anthropic is defending the need for safeguards. Depending on which side of the discussion you follow, the concern is either that the model is too constrained or not constrained enough. The arguments revolve around transparency, openness, capability, and risk.

What interested us most was a different question entirely. Not whether the restrictions were correct. Not whether the safeguards were justified. But where the safeguards actually lived.

At first glance, that sounds like a technical detail. In reality, it may be the most important question in the entire discussion.Because the history of computing suggests that durable control systems are rarely defined by the rules they enforce. They are defined by where those rules are enforced from. And AI appears to be approaching one of those moments where architecture matters more than policy.

The lesson computing keeps relearning

One of the recurring patterns in software is that the thing performing work eventually becomes different from the thing governing work.

Applications do not enforce operating-system security. User code does not define process isolation. Workloads do not establish cloud boundaries. Over time, every mature architecture separates the execution domain from the enforcement domain.

The reason is surprisingly simple: The moment the controller becomes part of the workload, it inherits many of the same limitations as the workload itself. The control plane stops being a boundary and becomes another participant in the system.

Computing has rediscovered this lesson repeatedly: Whenever a system becomes important enough, someone eventually tries to make it govern itself. At first the approach feels elegant. Then the edge cases appear. The system begins supervising the very behavior it is supposed to constrain. The distinction between controller and controlled slowly erodes.

Eventually a separate control plane emerges. The pattern is so common that we rarely stop to think about it which is why AI feels unusual. The industry is increasingly trying to solve AI governance by adding more AI.

We thought the action was the problem

When we first started thinking about agent control, we made the same assumption many people make today. We assumed the problem would appear where behavior became visible: a database query, an API call, a file modification, a command executed on a machine.

Those actions felt important because they were observable. They left traces. They appeared in logs. They interacted with systems we already knew how to govern. For a while, this seemed like the obvious place to establish control. Then we started investigating incidents more closely. What we found was surprisingly consistent. The action itself was rarely the interesting part.

Consider something as simple as a database query. By the time the query appears in a log, the system has already done an enormous amount of work. It has interpreted a request, explored alternatives, selected context, and concluded that this particular information belongs inside the task.

The query is not where the important decision happened. The query is where the decision became visible. Every investigation seemed to produce the same realization. The visible action pointed toward an earlier decision. And that earlier decision pointed toward another layer underneath it. The deeper we looked, the more obvious it became that behavior was being created long before it reached a tool.

Then we discovered the plan

At first, plans seemed like the answer. Actions emerge from plans. If an action was wrong, perhaps the plan was wrong. Govern the plan and the behavior should follow. The idea felt intuitive. For a while, it even appeared correct. Then we encountered plans that were perfectly coherent and still wrong.

The plans were logical. They were internally consistent. In many cases they were exactly what a reasonable observer would have produced given the information available to the system. And yet they still drifted away from what the human behind the task actually wanted.

That realization forced us to ask a different question. The problem was not simply that a plan existed. The problem was how the plan came into existence.

Somewhere between the original objective and the final plan, the system had made a series of decisions about what mattered, what could be ignored, and what success looked like. The plan was not the source of behavior. It was the result of another process happening underneath.

Then we discovered the thing underneath the plan

Human beings rarely communicate in executable workflows. Nobody tells an agent which systems to query, which tools to invoke, or which sequence of actions to perform. Instead, people communicate objectives. Prepare me for a board review, Investigate customer churn, Analyze this incident, Help me understand what happened. Everything between that objective and an executable plan is interpretation.

The system resolves ambiguity, fills in missing details, explores alternatives, and gradually transforms a vague objective into something operational. The more we studied agent behavior, the more we realized that this transformation process was where many of the most interesting failures originated.

Not because the system was malicious. Not because it was irrational. Because it became increasingly confident about an interpretation that slowly drifted away from the original objective.

The plan looked reasonable. The interpretation that produced the plan was where the drift began. Once we saw that pattern, it became difficult to think about governance in the same way.

The problem was no longer simply controlling actions. The problem was maintaining control over the process that turns objectives into behavior.

Why the obvious solution doesn’t work

At this point, the industry’s preferred answer becomes understandable. If reasoning is the source of the problem, perhaps another reasoning system can supervise it. If planning is the source of the problem, perhaps another model can validate the plan. If an agent creates risk, perhaps another agent can govern it. The idea is attractive because it keeps the control layer as flexible and adaptive as the workload itself.

The difficulty is that uncertainty never actually leaves the system. Imagine a system that is correct ninety percent of the time. Now imagine placing another system that is also correct ninety percent of the time on top of it. Most people instinctively assume the second system improves the first.

The mathematics are less forgiving. The uncertainty remains inside the control loop. In many situations it compounds. The exact numbers are not important. The intuition is. A stochastic controller remains stochastic.

This is not a criticism of AI. It is simply a property of control. A probabilistic system can explain another probabilistic system. It can classify it, critique it, score it, and observe it. What it cannot magically do is transform uncertainty into determinism.

And that realization is what eventually changed how we thought about the problem.

Why ArmorIQ ended up with four control planes

Looking back, the ArmorIQ architecture did not emerge because we set out to build a stack. If anything, we spent a long time trying not to. Every architect prefers a simpler answer. The problem was that every investigation kept ending the same way.

What initially looked like an execution failure turned out to be a planning problem. The planning problem turned out to be a refinement problem. The refinement problem turned out to be a reasoning problem. And every time we thought we had reached the source of behavior, another layer appeared underneath.

Eventually we stopped treating those layers as implementation details and started treating them as separate control surfaces. Reasoning required one kind of boundary. Refinement required another. Actions required another. Execution required another.

What eventually became MAP, PAP, IAP, and KAP began as a series of observations. We kept discovering different places where the execution domain and the enforcement domain needed to separate. The stack was not the starting point.

It was the conclusion.

The architecture AI is forcing into existence

The debate around Fable will eventually move on. The next model will arrive. The next controversy will emerge. Researchers and vendors will continue arguing about openness, transparency, capability, and restrictions.

Those conversations matter. But we suspect the deeper question is architectural.

Can a stochastic system reliably govern another stochastic system?

Or does meaningful control require the same separation that every previous generation of computing eventually discovered?

History suggests the answer is the latter. The future of AI will not be defined solely by models. It will be defined by the control planes that surround them. And those control planes will emerge wherever behavior is created, not merely where behavior becomes visible.

That is the pattern we kept finding. And the more capable agents become, the more inevitable that pattern starts to feel.

Onboarding open

Ready to control what your AI agents actually do?

Join the teams shipping safer, compliant AI agent deployments. White-glove onboarding for the first 50 design partners.

Read Docs →
Live Intent Assurance
The Problem With AI Guardrails Isn’t Safety. It’s Architecture. | ArmorIQ Blog