A clever prompt can talk an agent into deleting data, running a risky install, or touching production. Set the rules once, and your agent follows them on every step.
Most teams put the rules in a prompt and hope the agent listens. A prompt is a suggestion. One persuasive instruction later, the agent does it anyway.
You write what the agent should and should not do in its instructions. Nothing makes it obey, and a long task drifts.
A tricked or pushy instruction convinces the agent the risky action is fine this time. It runs the command.
The package was malicious, the data is gone, production changed. The guardrail was never really there.
Set your rules once and sign them. The agent checks every action against them and blocks, warns, or rewrites the matching ones. It works even with no connection, and the agent cannot switch the rules off.
Stop a dangerous action, flag a risky one, or automatically rewrite it into a safe version. You choose per rule.
Rules run on each action the agent takes, even offline. There is no window where they are off.
Turn on packs for common risks like supply-chain installs and production changes, then add your own.
You tell the agent the rules in a prompt and hope it listens. One persuasive instruction later, it runs the command anyway.
Instructions in a prompt are suggestions. Nothing stops the agent from ignoring them.
You set the rules once and sign them. The agent blocks, warns, or rewrites the matching actions on every step, even with no connection, and cannot turn the rules off by editing a file.
See which rules fire over time, and start from ready-made packs for common risks.
Set rules for the moves that matter. The agent follows them whether it is helping you or running on its own.
Block running code straight from the internet and route package installs through a scanner first.
Stop changes to production systems or bulk deletes unless you have allowed them.
Catch attempts to read secrets or send sensitive data somewhere it should not go.
Apply a rule to your whole fleet and exempt the one agent that genuinely needs an exception.
Turn on a pack, add your own rules, and watch which ones fire as your agents work.
Your agents need keys for Stripe, GitHub, and the rest. Share each one once, and your agent uses it without ever seeing it. Every use is single-shot, short-lived, and recorded.
AccountabilityProve what every agent did.When agents run real work, you need a clear answer to a simple question. Did the agent change that, leak that, or spend that, and which agent was it?
Least privilegeGive each agent only what it needs.Most agents run with the same broad access you have. Give each one a narrow identity instead, so a single mistake stays small.