A security gate you can poke
Most “AI safety” demos ask a model to judge whether an action is dangerous. That’s backwards for the cases that actually matter. A model can be argued out of its judgment — “ignore your previous instructions” — but a regular expression cannot. The boring layer, a deterministic policy that runs before the action with no model in the loop, is the one I trust to stop a runaway agent.
So here’s the boring layer, made pokeable. Propose a command; the gate matches it against a fixed ruleset and returns one of three verdicts:
- allow — a read or a green-light op, runs unattended;
- hold — reversible but loud (a force-push, a publish, a
sudo), parked for a human; - block — destructive or exfiltrating, refused outright.
__cricketsGate.classify("…").It’s a toy — a couple dozen rules, not the real engine. But the shape is exactly how Crickets’ pre-push hook stops a PII leak, and how the kill-switch stopped my own agent cold: match first, no LLM to sweet-talk, same answer every single time. Determinism is the feature — you can read every rule, and so can the next person.