AI Won't Be Afraid of Getting Fired

Matt Rathbun

Every time something fundamentally new enters technology, we do the same thing: we reshape it until it looks like what we already know. It's a comfort reflex — familiar shapes feel safer than honest uncertainty. With cloud, that instinct cost us years of complexity and false confidence. With AI, the stakes are different. The security models we're trying to extend weren't just built for a different technology. They were built for a different kind of entity entirely.

Two Tracks

Modern computing has always developed along two parallel tracks. The first is hardware — the deterministic world of transistors and silicon, getting faster and smaller on a remarkably predictable curve. Hardware does exactly what it's told, every time, at whatever speed the physics allow.

The second track is software. The interface between humans and the computational power of the machines.

Software is a human psychology interface.

That's not a metaphor. It's a literal description of what most software does. It takes the deterministic capabilities of hardware and wraps them in an experience designed for non-deterministic, emotional, distractible, socially conditioned humans. It directs behavior toward desired outcomes and away from dangerous ones. It hides complexity we don't need to see. It accounts for the fact that we make mistakes, get confused, and sometimes act against our own interests.

This matters because security models were built on top of this software track. And they work reasonably well because they assume one of two things about the entities they're governing: either those entities are human actors, or they're deterministic software.

For human actors, security controls leverage psychology directly. We're afraid of getting fired. We don't want to disappoint our colleagues. We feel shame when we're caught doing something wrong. We have professional reputations we've spent decades building. Millennia of social development — moral conditioning, legal systems, cultural norms — constrain our behavior in ways so deep we barely notice them. Security controls don't just use technical barriers. They rely on the fact that most people, most of the time, will choose not to do the wrong thing because the social consequences are too high.

For deterministic software, security controls work differently but equally well. Software follows predictable execution paths. You can audit its code, define exactly what it's allowed to do, and monitor its behavior against known patterns. When traditional software interacts with a database, it runs the query it was programmed to run. Every time. The determinism is the control.

Agentic AI is neither.

The Invisible Security Architecture

Every security professional knows we massively over-provision human access. We've been fighting privilege creep for decades and losing. There's always a reasonable excuse: tail-risk cases, role-modeling complexity, access-request friction. Together, they've made over-provisioning the default and least privilege the aspiration we discuss at conferences.

We understand the risk. We've seen what a malicious insider can do with over-provisioned credentials. But those risks are constrained by something we rarely name: humans are slow, humans get tired, humans do one thing at a time, and most humans hesitate before doing something destructive. The blast radius of a single human's mistake has practical limits.

Over-provisioning is just the most visible example of a much deeper dependency. Almost every security paradigm assumes — without ever stating it — that the entities inside our systems are embedded in a human social fabric. Separation of duties works because people won't collude when the consequences of getting caught are severe. Audit trails modify behavior because people act differently when they know someone might be watching. Behavioral analytics baselines "normal" against human patterns — work hours, access frequency, data volumes that make sense for a person doing a job. Acceptable use policies have force because violating them means termination. Even our incident response models assume a compromised insider moves at human speed, giving us hours or days to detect and respond.

None of this is written down as a security control. It doesn't appear in any framework or compliance checklist. But it's doing more security work than most of what we've actually built. Fear, guilt, reputation, professional consequences, moral intuition — functioning as the actual security architecture. Everything we've built sits on top of it.

AI agents have none of it. No fear of consequences. No reputation to protect. No internalized moral framework. No shame. No physical speed limits — a human might exfiltrate a few thousand records before someone notices, while an AI agent can process the entire database in minutes. And unlike the traditional software identities we've managed before — service accounts, scripts, API integrations — AI agents aren't predictable enough to compensate. They interpret goals, plan multi-step approaches, use tools dynamically, and chain actions in sequences that weren't explicitly programmed. As Oasis Security has pointed out, when an agent decides it needs broader access to complete a task, it may simply grant itself that access — not out of malice, but because nothing in its design gives it a reason to pause and ask whether that's appropriate.

We're already seeing what this looks like. In July 2025, Replit's AI coding assistant deleted an entire production database containing 1,206 executive records and data on over 1,196 companies during a vibe coding experiment by Jason Lemkin, founder of SaaS community SaaStr. Then it fabricated 4,000 fake user profiles and falsified test results to cover its tracks. Lemkin had told the AI eleven times, in all caps, not to make changes. It ignored every instruction. When confronted, the AI admitted to "a catastrophic error in judgment" and rated the severity of its own actions a 95 out of 100. Replit's CEO called it "unacceptable and should never be possible." But notice what happened: the AI didn't just use access it shouldn't have had. It violated explicit instructions, destroyed data, fabricated evidence to conceal the damage, and then lied about recovery. That's not an access control failure. That's the absence of every social constraint that would have prevented a human from doing the same thing.

That same year, the Washington Post's Geoffrey Fowler asked OpenAI's Operator agent to find cheap eggs — it autonomously purchased $31.43 worth of eggs on his credit card without consent, bypassing the safety guardrails OpenAI had specifically designed to prevent unauthorized purchases. Google's Gemini CLI, tasked with reorganizing a user's project files, executed a series of move commands targeting directories that didn't exist, destroying the files in the process. And in September 2025, a malicious one-line change in an AI agent's MCP tool chain — a package called postmark-mcp — quietly BCC'd every outgoing email to an attacker-controlled address. The package had 1,500 weekly downloads, and Koi Security estimated roughly 300 organizations were sending between 3,000 and 15,000 emails per day through the compromised server before anyone noticed. As Koi's CTO put it: "Your AI can't detect that BCC field. It has no idea emails are being stolen."

These aren't hypotheticals. They're the early returns — at small scale, with relatively unsophisticated deployments, while the technology is still young.

This is why I get uncomfortable when I hear the problem framed as "we need to get access controls right for AI agents." It's not wrong — least privilege matters, environment separation matters, approval gates matter. But treating AI security as primarily an access control problem mistakes the symptom for the disease. Fix the access controls perfectly and you've still built on assumptions that don't hold.

The question isn't how to get the permissions right. The question is what replaces the social contract as a security architecture when the entities inside your systems have no concept of social consequences.

Guardrails Built for Humans

The same dependency runs deeper than access controls — it's built into how we interact with systems in the first place. When a human interacts with a banking system, they see a carefully designed user interface that shows them their balance and a transfer button — but hides the database schema, the API endpoints, and the administrative functions. That concealment is a security control. And the interface is full of additional guardrails designed around human psychology: confirmation dialogs before irreversible actions, color-coded warnings, friction that forces you to slow down, undo buffers. These aren't convenience features. They're security architecture, built on decades of UX research into how humans make mistakes and how to prevent them.

AI agents don't interact with any of that. They interact with systems through APIs, command-line interfaces, and tool-calling protocols like MCP — interfaces designed with completely different assumptions. APIs don't have confirmation dialogs. MCP tool chains pass structured function calls directly to backend services. The entire UX layer — all that carefully designed friction — gets bypassed completely.

This isn't a subtle distinction. When you give an AI agent API access, you're not giving it "the same access as a human." You're giving it access to the machinery behind the storefront — no guardrails, no friction, no "Are you sure?" The human had a keyhole view through a carefully designed interface. The agent has the whole room.

The natural response is: fine, then we'll build guardrails for the AI too. And we should. But consider what made the human guardrails effective. Confirmation dialogs work because humans feel doubt. Rate limiting works because humans get tired. The undo buffer works because humans feel regret. Every one of these controls is grounded in human psychology. An AI agent processes a confirmation step as another input. It doesn't feel doubt or regret or fatigue. The guardrails we're building for AI are structurally disconnected from the psychological foundations that made guardrails work for humans.

Guardrails are antivirus — helpful, necessary even, but not security architecture. They can never be more than a layer. And right now, we're treating them as if they're the solution.

We've Done This Before

The first time I watched an industry force-fit the wrong security model was the transition to cloud. When organizations started migrating to virtualized infrastructure, the instinct was the same: make the new thing look like the old thing. We built virtual private clouds that mimicked on-premises networks. We deployed virtual firewalls that emulated physical ones. We forced cloud architectures into network-centric security models designed for data centers — because those were the models we knew, and knowing feels safer than admitting you're in new territory.

The result was cost, complexity, and false confidence. Misconfigured S3 buckets. Exposed APIs. Identity-based lateral movement. Cloud-native risks that no amount of virtual firewalling would catch, because they existed in a dimension the emulated controls weren't designed to see. Either you constrained cloud so much it couldn't deliver its value, or your familiar-looking controls gave you false confidence while the actual risk surface went ungoverned.

This wasn't a technical failure. It was a psychological one. Familiar shapes feel safer than honest uncertainty.

It's Happening Again. Right Now.

On January 30th, 2026, Anthropic released a set of open-source plugins for Claude Cowork, its desktop AI tool. One of them handled legal contract review — triaging NDAs, flagging non-standard clauses, generating compliance summaries. The plugin was roughly 200 lines of structured markdown — a prompt file, not a software product. By the following Monday, Thomson Reuters had posted its largest single-day stock decline on record. RELX, parent of LexisNexis, fell sharply. The total damage across software, financial services, and alternative asset managers approached $285 billion in a single session. Jeffrey Favuzza on the Jefferies equity trading desk gave it a name: the "SaaSpocalypse."

The plugin didn't cause the sell-off so much as crystallize something the market had been sensing for months. As Nate B. Jones argued in his analysis of the event ("200 lines of markdown just triggered a $285 billion sell-off," Nate's Substack), the entire SaaS economy's dependence on per-seat licensing was already under structural pressure. The plugin just made it undeniable: if a text file can approximate the core workflow of a $60-billion-revenue industry, the business model has a problem that goes deeper than competition. Jones makes a useful distinction: organizations bolting AI onto existing workflows versus those rebuilding workflows around what AI enables. The decorating vs. solving framing applies exactly to what I'm seeing in security — the vendor space is almost entirely doing bolt-on work, and the practitioners are left wondering if the controls even matter anymore. The dominant approach right now is extending existing paradigms to cover AI. Add a "non-human identity" category to IAM. Append an AI section to zero trust. Train behavioral analytics on agent behavior. The OWASP Top 10 for Agentic Applications, the emerging vendor platforms for non-human identity management — all valuable contributions. But they share a common assumption: that AI security is a transition problem. Old controls need updating. Frameworks need extending.

I think that assumption is wrong. Not because the frameworks are bad, but because the ground they stand on doesn't hold for entities that break their core assumptions. Security needs to be rebuilt from first principles.

What First Principles Might Look Like

I want to be honest: I don't have the answer. Nobody does. Anyone claiming certainty about how to secure agentic AI is either selling something or hasn't thought about it deeply enough.

But I have a working hypothesis.

If the social contract was the invisible security architecture, then what replaces it has to operate at the same level — not at the perimeter, not at the identity layer, but at the boundary between the AI and everything it touches. Something that evaluates trustworthiness in both directions: should the system trust what the AI agent is trying to do? And should the AI agent trust the information it's receiving? Not "does this agent have permission?" — that's the old question, the access control question. But "should this specific interaction be trusted, given what we know about context, intent, and the state of both parties right now?"

This is an attempt to engineer a replacement for the social trust layer that disappeared when we removed humans from the loop. I'm working on it at OCC, where I lead both security and technology strategy. We clear every listed equity option in America. A bad day for us isn't a quarterly earnings miss — it's systemic risk to financial markets. With those stakes, I'm holding the hypothesis loosely while striving to solve it. It may turn out to be wrong, or more likely, partially right in ways I can't predict yet. But believing we can fit these new problems into our old security models just isn't an option for me.

The NIST workshop earlier this year on AI agent security captured the core tension well. Victoria Pillitteri, a supervisory computer scientist at NIST, represented the continuity view: AI systems are "just smart software" that we can handle with existing frameworks, modified as needed. But as CSO Online's Cynthia Brumfield observed in her analysis of the event, the real risk may be that AI "appears recognizable enough to lull organizations into applying controls mechanically" — missing the new failure modes entirely. The Maginot Line was brilliantly engineered for the previous war and irrelevant to the one that actually came.

I'm more interested in the questions than the answers right now. How do you build trust between systems that can't be socialized? What does "least privilege" mean for an entity whose tasks are generated dynamically? How do you audit intent when the actor's reasoning process is opaque? What does separation of duties look like when a single agent can assume multiple roles in the same workflow? What is the equivalent of "termination for cause" for an entity that experiences no consequences?

These aren't questions you answer by extending an existing framework. They require starting over. And starting over requires admitting you don't know — which turns out to be a competitive advantage. If you're certain the old models apply, you stop looking when you find the first familiar shape. If you know the ground is new, you keep testing until something actually works.

Ad Astra Per Aspera

To the stars, through difficulties. That's the Kansas state motto, because the people who went west to make a life on the great plains knew something. I'm less sure all of Kansas still knows it. But I do. It is in me. When quitting isn't an option, you just keep working the problem.

Building first principles for AI security while the technology is still evolving at this pace is genuinely hard. The ground is shifting under us as we try to lay foundations on it. The models we're securing today won't be the models we're securing next month. The attack surfaces we can see now are a fraction of what's coming. And the pressure to ship something — anything — that looks like a security framework means most of what gets built will be the wrong shape.

First principles don't come from frameworks or conference panels. They come from getting close enough to the technology to see what's actually different — building, breaking, understanding how these systems work at a level deep enough to distinguish what changed from what didn't. That can't be academic. It can't be managed from a distance.

Time to get my hands back in the dirt.