Running Agent M all day
I wanted Agent M running most of the day — planning, building, reviewing — while I got on with other things. There were still a few things in the way, but a big one was what it cost. An agent that works all day is spending tokens, and those usage ceilings come up fast.
The time had come to think of token efficiency as the feature: getting a full day’s work to fit comfortably inside a smaller, cheaper plan.
I made a number of small changes, but there were three key moves. Route the model to the work: let the expensive model do the planning, and a cheaper one do the long stretch of execution. Keep the cache warm — most of what an agent reads in a session repeats from turn to turn, so if its memory and workflow stay stable, you send that context once and read it back cheaply. Nearly all my tokens are cache reads now. And carry less context than you think you need — the 1M-token window taught me that more isn’t free.
The durable memory paid off here too. Because the agent reloads what it knows from the vault instead of having everything re-explained each session, the expensive “catch me up” part of a session mostly disappears.
So Agent M is now cheap enough to leave running. The budget is still something to keep an eye on — it’s not like you can turn on the most expensive model for heavy multi-agent planning and not expect to spend your token allowance — but now the usage limits are manageable, and I don’t need the largest Claude plan to keep working consistently. When I measured it, I cut what a long session costs by roughly 90%, for the same output.
The point was never to spend more to do more — it was to make doing more cost less.
Liked this? Get new experiments by email — or grab the RSS.