← Back to the feed

A security gate you can poke

Most “AI safety” demos ask a model to judge whether an action is dangerous. That’s backwards for the cases that actually matter. A model can be argued out of its judgment — “ignore your previous instructions” — but a regular expression cannot. The boring layer, a deterministic policy that runs before the action with no model in the loop, is the one I trust to stop a runaway agent.

So here’s the boring layer, made pokeable. Propose a command; the gate matches it against a fixed ruleset and returns one of three verdicts:

  • allow — a read or a green-light op, runs unattended;
  • hold — reversible but loud (a force-push, a publish, a sudo), parked for a human;
  • block — destructive or exfiltrating, refused outright.
crickets · intercept-gate demo
Propose a command, tap an example, or run the whole sequence — the gate returns allow, hold, or block.
agent$
A toy model of the idea — a couple dozen fixed rules, no model in the loop. The real thing is a pre-action hook (Crickets’ pre-push PII gate works the same way). Poke it from the console: __cricketsGate.classify("…").

It’s a toy — a couple dozen rules, not the real engine. But the shape is exactly how Crickets’ pre-push hook stops a PII leak, and how the kill-switch stopped my own agent cold: match first, no LLM to sweet-talk, same answer every single time. Determinism is the feature — you can read every rule, and so can the next person.

← Back to the feed
Get new experiments by email.No spam, no tracking — or grab the RSS.