Loop Safety Without Reinventing the Stack

The first time a loop feels different

The first time you watch a looping agent operate, it doesn’t seem fundamentally different from any other agent. It searches for information. It evaluates results. It decides what to do next. If you’ve spent any time around modern AI systems, none of that feels remarkable. Then something subtle happens. The agent doesn’t stop.

Hours later it is still working. It has gathered new information, revised its assumptions, adjusted its plan, and continued pursuing the same objective. The original request may be long forgotten by the person who issued it, but the agent is still carrying it forward.

That moment is where loop engineering begins to feel different.

Traditional software tends to be transactional. A request arrives, a result is produced, and the interaction ends. Looping agents behave more like processes. They continue operating, adapting, and making decisions long after the initial request has disappeared from view. This is exactly what makes them powerful. It is also what makes them difficult to govern.

Because the longer a system continues acting on an objective, the more opportunities it has to become something slightly different from what it was originally asked to be.

The failures rarely look dramatic

When people imagine an unsafe agent, they often imagine something obvious. A dangerous command. A catastrophic action. A clear violation of policy. Reality is usually much less dramatic.

The most interesting failures we have observed in looping systems tend to emerge gradually. A travel agent checks prices again because prices have changed. A research agent expands its search because it found a promising lead. A documentation agent revisits files because the code evolved. If you examine any individual decision in isolation, it is often difficult to find anything objectionable.

In fact, many of these decisions are exactly the sort of initiative we want intelligent systems to demonstrate. The difficulty only becomes visible when we step back and look at the entire sequence.

What begins as a hotel search becomes continuous monitoring of dozens of routes. What begins as a focused research task slowly expands into adjacent areas. What begins as a documentation update gradually touches parts of a system that nobody originally considered within scope.

The problem is rarely a single action. The problem is the accumulation of actions. Loops fail differently because behavior emerges across time rather than in a single moment.

We assumed loops would need a new assurance plane

When we first started thinking seriously about loop safety, our instinct was probably the same as everyone else’s. Loops felt different. Therefore they probably required something new.

The software industry has a long history of responding to new problems by introducing new layers. New abstractions create new categories. New categories create new control planes. At first glance, loop engineering appeared to fit that pattern perfectly.

The more we studied looping systems, however, the less convinced we became. What surprised us was not how different the failures looked. What surprised us was how familiar they felt.

Every loop failure seemed to decompose into a problem we had already seen somewhere else. A loop that drifted beyond its objective looked remarkably similar to a planning system that had expanded its authority during refinement. A loop that repeatedly invoked tools in unexpected ways looked very similar to an execution path that had drifted away from its original intent. A loop that accumulated unexpected side effects looked remarkably similar to execution patterns we had already encountered in other contexts.

The loops were new. The underlying problems were not. That observation ended up changing the direction of our thinking.

The plan was already evolving

One of the assumptions hidden inside many conversations about agent safety is that plans are static. Real systems rarely behave that way.

A useful agent encounters new information. It discovers assumptions that were wrong. It finds opportunities that were invisible at the beginning of the task. The plan evolves because the environment evolves.

The question is not whether plans should change. The question is whether those changes remain accountable to the objective that originally justified them. Loops make this challenge impossible to ignore because they stretch it across time.

A loop that runs for several hours may revise its plan dozens of times. A loop that runs for days may revise it hundreds of times. Each modification seems reasonable. Each adaptation improves the agent’s understanding of the task. Yet every modification also creates an opportunity for drift.

At some point, the interesting question stops being whether the current plan is valid. The interesting question becomes whether the current plan still belongs to the same objective that started the loop. That distinction turns out to matter more than almost anything else.

The thing that matters is continuity

One of the more surprising realizations from studying looping systems was how rarely the problem involved a single decision. When people imagine an agent failure, they often imagine a dramatic event. A dangerous command. A catastrophic action. A clear moment where something obviously went wrong. Reality is usually far less cinematic.

What we encountered repeatedly were systems that looked perfectly reasonable at every individual step. A loop would revise a plan because new information appeared. Then it would revise the plan again because another assumption turned out to be incorrect. Later it would delegate part of the task to a specialized workflow. A few iterations after that, it would discover a more efficient way to achieve the objective and adjust its approach yet again.

If you examined any one of those decisions in isolation, it would be difficult to object. In many cases they represented exactly the sort of adaptability we want from intelligent systems. The difficulty only became visible when we stepped back and looked at the entire sequence.

Somewhere along the way the loop had stopped pursuing the original objective and started pursuing a different one. The transition was rarely dramatic. There was no obvious point where the system crossed a line. The drift accumulated gradually, hidden inside dozens of individually reasonable decisions. Looking backward, the path appeared coherent. Looking forward, nobody would have intentionally designed it.

That observation changed how we thought about loop safety. The problem was not that plans evolved. Useful systems must evolve. The problem was losing continuity between the current plan and the purpose that originally justified it.

Why loop safety is really an orchestration problem

The first instinct when encountering a new failure mode is usually to invent a new control mechanism. For a while we followed the same path. Loops appeared different enough that they seemed to deserve their own assurance plane. The more we explored the problem, however, the more we found ourselves tracing failures back into familiar territory.

What initially looked like a loop problem would eventually reveal itself as a refinement problem. A refinement problem would turn out to be an execution problem. An execution problem would reveal a lineage problem. The loops were new. The underlying failures were not. That realization gradually changed the question we were asking.

Instead of asking how to govern loops, we started asking how existing assurances behaved when stretched across time. Viewed through that lens, a loop stops looking like a new category of system. It begins to look like a sequence of smaller workflows connected by continuity. Each iteration still reasons. Each iteration still refines a plan. Each iteration still executes actions against real systems. The challenge is not inventing entirely new controls for those activities. The challenge is ensuring that the controls remain connected as the loop evolves.

What appeared to be a governance problem slowly transformed into an orchestration problem.

The stack was already there

This realization eventually led us back to a surprisingly simple conclusion. The existing assurance stack already contained most of the primitives required for loop safety.

Purpose still needs to remain bounded as it becomes plans. Plans still need to remain connected to actions. Actions still need to remain bounded when they reach real systems. None of those requirements disappear simply because an agent repeats the process hundreds of times.What changes is the duration over which those guarantees must hold.

A looping agent forces us to think about continuity rather than individual decisions. Plans must evolve without losing their lineage. Authority must adapt without expanding beyond its original boundaries. Actions must remain accountable to the workflows that produced them even as those workflows change.

Viewed this way, loop safety is not the creation of a new assurance plane. It is the careful orchestration of the planes we already have. The challenge is not inventing a new control layer for loops. The challenge is ensuring that intent, authority, and accountability survive every iteration of the loop. Because the most dangerous loops are rarely the ones that fail dramatically. They are the ones that slowly become something nobody intended.

Loop Safety Without Reinventing the Stack

The first time a loop feels different

The failures rarely look dramatic

We assumed loops would need a new assurance plane

The plan was already evolving

The thing that matters is continuity

Why loop safety is really an orchestration problem

The stack was already there

Related reading

Why AI Governance Eventually Reaches the Kernel

Weekly Updates (06/15 - 06/19)

The Four Surfaces Every AI Agent Touches

—Ready to control what your AI agents actually do?