AI securityagentszero trustidentityClaude Codeopen source

Anthropic Wrote the Zero Trust Playbook for AI Agents. We'd Been Building It.

Bottom Line Up Front

Anthropic's new Zero Trust for AI Agents framework says the controls that actually hold up are hardware-bound credentials, expiring tokens, and cryptographic identity. Capabilities removed, not friction added. That's the exact bet LastID made. Here's how our agent maps to their framework, where we're ahead, and where we're honestly not.

Matt Jezorek

June 11, 2026 · 12 min read

Someone finally wrote it down

Anthropic just published Zero Trust for AI Agents, a security framework for deploying autonomous agents in the enterprise. I read all 36 pages of the eBook in one sitting, and about halfway through I had a weird feeling. It read less like a framework we needed to adopt and more like the spec we'd already been building against for a year.

That's not me taking a victory lap. There are parts of it we don't do yet, and I'll get to those. But the core of the document is so close to our architecture that I want to walk through it honestly: what they say, where we land, and where we still have work.

The one test that matters

Strip away the tiers and the threat taxonomy and there's a single sentence in the framework that does most of the work:

Does this make the attack impossible, or just tedious?

Their point: an AI-accelerated attacker has unlimited patience and near-zero cost per attempt. So any control whose value comes from friction, like extra pivot hops, rate limits, or SMS codes, degrades to nothing. The controls that survive share a pattern. Hardware-bound credentials. Expiring tokens. Cryptographic identity. Network paths that don't exist. Prefer removing a capability over throttling it.

I've been saying a clumsier version of this for years. A phone number is not identity. A password is not identity. A long-lived API key in a .env file is a breach waiting for a calendar date. The framework gave it a crisp name, so let's use their test.

Identity: the agent has to be someone

The framework's first pillar is identity, and at the top tier it asks for hardware-backed identity that "cannot be exfiltrated from a compromised host."

Every LastID agent carries a verifiable credential issued to that agent: a real cryptographic credential, not a label you can reassign. A credential is data, though, and data can be copied. So here's the honest part, and the part that actually matters: the credential is leakable, but presenting it is not.

Every time the agent proves who it is, it has to sign a fresh proof with a key that lives in the machine's Secure Enclave, bound to this machine and this agent and held by a small code-signed, notarized helper. That key never leaves the enclave. The agent's own process, the Node process running your tools, never holds it, and it can't be exported or copied off the machine.

So steal the credential and you've got a locked door with no key. You'd also need the exact hardware it was bound to. That's the "impossible, not tedious" test passing: you can compromise the agent's process, copy everything you find, and still not present its identity, because the thing that presents it never enters that process.

Credentials: borrow them for five minutes, then burn them

The framework's strongest access-control tier is Just-in-Time access: grant a credential only at the moment of need, scope it tightly, and revoke it immediately so "an attacker finds no cached credentials to steal."

This is the part where I actually said "huh" out loud, because the eBook's own pro-tip points at Claude Code's OS credential store as the reference example, and it's describing, almost line for line, how LastID's vault already works.

Your agent never holds your API keys. It holds metadata: that a credential exists, what it's for, how it gets attached. When it actually needs to make a call, it mints a single-use handle that lives for five minutes, the secret gets wrapped to that handle, injected at the network boundary, and zeroized the instant the call is done. The secret never lands in the model's context. It never touches disk. There is nothing cached to steal between one request and the next.

And the timing is only half of it. You also set the rules for whether that handle can be minted at all. Limit what a credential is allowed to do. Fence it to a specific working directory or path, so an agent can only reach for it from where it's actually supposed to be running. Restrict it to certain hours. Cap how often it can fire. Or require that every single use stops and asks you for approval before anything leaves the machine. The credential isn't just short-lived. It's short-lived and fenced. An attacker who somehow talks your agent into reaching for it still has to satisfy every condition you put on it first.

A stolen API key is good for thirty days. A stolen LastID handle is good for nothing.

And we don't just claim that. We measure it, with a number we think the whole industry should start asking for: permissioned time. Permissioned time is the total wall-clock time a credential spent decrypted and usable across your fleet. Not how many keys you hold. Not how often you rotate them. How long any secret was actually exposed.

Here's our own fleet, last 30 days: 6 minutes and 21 seconds of permissioned time. Total. Across 79 credentialed calls and 9 agents, with a peak single window of 3 minutes 24 seconds. The rest of those 30 days, standing access was zero.

Now hold that against the status quo. A single API key sitting in a .env file carries a full 30 days of standing access over that same window. That makes this fleet roughly 6,798× less permissioned than one long-lived key. One key. We're running nine agents.

That's the question we want security teams to start asking, instead of "do you rotate your keys?": what's your permissioned time?

Multi-agent: agents shouldn't trust each other on a handshake

The framework warns about privilege inheritance: a manager agent handing its full access to a worker, or a low-privilege agent relaying instructions a high-privilege agent executes without checking. Its suggestion is to log inter-agent communication and verify identity at each step.

We went further than logging. When a LastID agent spawns a sub-agent, that sub-agent gets its own cryptographic identity and its own bounded credential, never a copy of the parent's. And agents talk to each other (and to you) over MLS, the same end-to-end encryption standard behind modern secure messengers, hardened with a post-quantum cipher suite so traffic captured today stays safe against tomorrow's hardware. Group membership is cryptographic. An agent can't be slipped a forged instruction from outside the group, and every action a sub-agent takes is attributed to its identity in the audit chain, cross-referenced to whoever delegated it.

Audit: a log you can't quietly edit

The framework's Enterprise tier wants immutable audit trails with integrity verification. Ours are append-only, hash-chained, and signed, with periodic integrity checkpoints, shipped off the device. And which events get recorded is governed by a policy your operator signs, so an attacker who silences an audit class can't do it without your signing key. The log isn't just hard to tamper with. Tampering is detectable.

Rules and ground truth that follow your agent

An agent is only as safe as the boundaries it actually respects. In LastID you set those boundaries, and they travel with the agent everywhere it runs.

Rules are deny, warn, and rewrite policies you author once: block reads of secret files, refuse a destructive flag, rewrite a risky install command into a safe one. They're signed with your authority and enforced locally, in the agent's own pre-tool hook, before a tool ever runs. The agent can't quietly opt out of them, and the policy is delivered no matter what, so there's no capability an agent can drop to escape governance.

Bedrock memories are your ground truth: the facts and instructions you want the agent to treat as non-negotiable, above whatever it absorbed in training. "This repo deploys on Fridays only." "Never approve a PR with a hardcoded secret." You write them in plain language in your console, and every agent you run carries them into every session.

Identity, rules, and memory that follow your agent. Set them once. They hold wherever it goes.

Memory that follows your agent

Memory in LastID isn't trapped inside one agent. It's shared across all of yours. Something one agent learns doesn't have to be relearned by the next one. It carries over from that agent's work, so every agent you run starts from what your whole fleet already knows. Learn it once, anywhere, and it's everywhere you work.

And the work happens on your side. The embeddings and the semantic search that make memory useful run locally, on your machine, not in our cloud. Sync carries memory between your own devices and agents over an encrypted channel; the processing, and the plaintext, stay with you.

Shared memory cuts both ways, and we take the risk seriously. The framework names it as a threat, "shared context poisoning": when memory is shared across a pool of agents, one poisoned entry can steer all of them, and sharing it across an enterprise fleet makes that pool bigger, not smaller. So we don't hand-wave it. Every memory is signed, so a forged or tampered record fails verification and never loads. Every memory has an owner and a history, so you can see where it came from and roll it back. Making shared memory provably safe at enterprise scale is work we're still doing in the open, not a box we've already checked.

And all of it is yours. Read any memory, edit it, change it, or delete it, any time, from your console, and the change reaches every agent at once. Recovery from a bad memory isn't a roadmap promise. It's a control you already have, across the whole fleet.

We can't read any of it

Here's the part that should matter to anyone evaluating where this data lives. We don't see it.

Your memories, your vault credentials, your rules, your agent's instructions, your chats: all of it is encrypted on your side before it ever reaches us. What sits on our servers is ciphertext we hold no key for. We sync it between your devices and your agents and we relay your encrypted messages, but we can't read the contents, and neither can anyone who breaches us. The plaintext only ever exists on your machines.

That's not a privacy policy you have to trust. It's the architecture.

Change your mind in real time

None of this needs a restart. The agent holds a live connection to you, so when you change something, the change lands on its next turn.

Push a new rule and the running agent starts enforcing it. Edit a memory and the corrected version is what it reads next time. Share a credential and it can use it moments later, without you touching the agent at all. Nothing gets reconfigured and rebooted. The agent just picks up the new instructions on the next thing it does.

And it works the same in the other direction, which is the part that matters when something is wrong. Everything is revocable in real time. Pull a credential and the agent's next attempt to use it fails. Revoke an agent and its very next call is rejected, sub-agents included. Delete a memory and it's gone from the fleet's next turn. You don't file a ticket and wait for a sync window. You change it, and it's changed.

Where we're honestly not there yet

I told you I'd get to this. The framework is six pillars wide, and we don't lead on all of them.

Behavioral monitoring. We produce signed, tamper-evident audit data, and we don't yet have a system watching it for anomalies and responding at machine speed. The framework moved automated first-pass triage down to the Foundation floor. That's our biggest gap, and we know it.

Automated response. The operator can already edit, delete, or roll back any memory by hand, and revoke an agent in a tap. What we don't have yet is the machine-speed version: a system that flags a suspect memory or a misbehaving agent and quarantines it automatically, before you even look. The manual controls are there. The automation is what we're building.

Supply-chain attestation. We practice good dependency hygiene, but a formal AI Bill of Materials and a standing reachability gate is on the roadmap, not in the build.

I'd rather tell you that than pretend the map is all green. The framework is explicit that skipping one pillar is where attackers get in, so we're building toward the rest in the open.

Why I think this matters

For a year the answer to "is it safe to give an AI agent real access?" has been vibes. Anthropic just turned it into a checklist with a pass/fail test, and that's genuinely good for everyone shipping agents. It means you can evaluate a deployment instead of arguing about it.

Run the test against your own setup. If your agent holds a long-lived key in an environment variable, or authenticates with something you could screenshot, or talks to its sub-agents on trust, those are tedious-not-impossible controls, and the framework is telling you they won't hold.

We built LastID so the answer to that test is "impossible" for the parts that matter most: identity, credentials, and who's allowed to talk to your agent at all.

Try it

The agent plugin for Claude Code is open source. Install it, provision an identity from your phone, and your agent gets a real cryptographic identity, hardware-bound credential custody, and a verifiable channel to you, instead of a phone number and a hope.

Plugin + setup: GetTrustedApp/lastid-agent
Anthropic's framework: Zero Trust for AI Agents

The threats are going to keep accelerating. The agents aren't slowing down. The least we can do is make the attacks impossible instead of tedious.