We Don’t Just Secure Agents

For a long time, we’ve known how to control software. You define what a system should do, you decide who is allowed to run it, and you constrain what it can access. Once those boundaries are in place, you enforce them at execution. This model has held remarkably well, even as systems became distributed and complex, because one assumption remained intact: behavior was fundamentally stable.

That assumption is now gone.

What we are calling “agents” are not systems that execute predefined behavior. They are systems that construct behavior as they run. A request is interpreted, refined, decomposed, and assembled into something actionable. By the time the system executes, it has already decided what it is. That decision is the system. And that is where control begins to fail.

Why securing execution no longer works

The instinct across the industry has been to respond the way we always have. We add guardrails, layer policies, build evaluators, and introduce control planes around execution. We try to make systems safer by constraining what they do once they have decided.

Underneath all of these approaches is a quiet belief: if we can understand intent well enough, and supervise behavior closely enough, we can make these systems safe.

That belief depends on something that no longer holds. It assumes that intent can be captured and that reasoning can be trusted. Neither is true.

Intent is not something you can fully capture

When a person gives an instruction to a system, what they provide is not a complete specification. It is an incomplete expression of purpose, shaped by context that is rarely written down and constraints that are often implicit. The system fills in those gaps. It takes something ambiguous and makes it concrete. That process is not neutral.

Every step introduces interpretation. The more concrete the system becomes, the more it risks drifting from what was originally meant. There is no moment where intent is perfectly captured. There are only successive approximations that become increasingly specific. And that specificity is where mistakes are locked in.

Reasoning does not stabilize the system

If intent is incomplete, we rely on reasoning to compensate. We assume the system will interpret correctly, decompose appropriately, and construct a plan that aligns with what the user had in mind.

But reasoning in these systems is not a stabilizing force. It is generative. It evolves with context, and that context is constantly shifting. The same instruction can produce different plans. The same plan can evolve differently depending on what becomes salient.

The system does not hold a fixed interpretation of the task. It negotiates it. And that negotiation happens at every step.

The failure pattern is subtle but consistent

Across real systems, the same pattern emerges. A vague request is refined into a concrete plan. That plan introduces new tools, new data access, or broader execution scope. The system becomes more certain about what it is doing. At the same time, it becomes more powerful.

Nothing explicitly authorizes this expansion. It emerges as a side effect of refinement. Each step appears valid in isolation, but the sequence produces a result that violates the original purpose. The system becomes more certain. And more wrong.

The system is not a set of actions

To understand why this happens, you have to change how you think about the system. It is not a sequence of actions. It is a sequence of transformations.

A request moves from purpose to intent, from intent to plan, from plan to action, and ultimately to effect. Each transformation reduces ambiguity while introducing assumptions and reshaping authority.

Correctness is not guaranteed at any single step. It is a property of the entire chain. And that is exactly what current control systems do not evaluate.

Why traditional control collapses

Traditional security frameworks assume that behavior is predefined, authority is static, and correctness can be evaluated locally. This allows systems to enforce identity, access, and execution boundaries effectively.

Agent systems violate all three assumptions. Behavior is constructed dynamically. Authority evolves as plans are refined. Correctness depends on the entire chain of transformations, not any individual action.

As a result, controlling execution is no longer sufficient.Because execution is no longer where the system decides what to do.

Where control actually needs to live

If failures originate during transformation, then control must move to that layer. It cannot sit only at the boundary of execution. It must exist at the point where the system commits to a plan.

At that moment, the system transitions from exploring possibilities to deciding what will become real. If it is allowed to expand its capabilities at that point, then everything that follows will be valid according to its rules and still wrong according to the original purpose.

Controlling the right layer means introducing a boundary here.

A system should be free to generate possibilities. But not all possibilities should be allowed to become executable. Before a plan is accepted, it must be evaluated for whether it expands capability, weakens constraints, or introduces behavior that was never implied by the original request. If it does, it should not proceed.

What changes once behavior is structured

Once this boundary exists, everything downstream becomes simpler. Execution can be enforced because the system is no longer free to act outside what it has already committed to. Actions can be validated as part of a structure, not in isolation.

At that point, control no longer depends on perfectly understanding intent. It depends on ensuring that the system cannot drift silently away from it.

What are we doing?

We are not trying to make these systems perfectly correct. We are trying to make them safe even when they are not. That requires accepting that intent will always be incomplete, that reasoning will always be probabilistic, and that supervision will always be imperfect.

Control must therefore move from actions to transformations. From execution to construction. From what the system does to what it is allowed to become.

We don’t just secure agents. We make them behave like systems.

We Don’t Just Secure Agents

Why securing execution no longer works

Intent is not something you can fully capture

Reasoning does not stabilize the system

The failure pattern is subtle but consistent

The system is not a set of actions

Why traditional control collapses

Where control actually needs to live

What changes once behavior is structured

What are we doing?

Related reading

The Four Surfaces Every AI Agent Touches

Weekly Updates (06/08 - 06/12)

AI Is Creating New Control Principals

—Ready to control what your AI agents actually do?