Temper puts hard limits on what your coding agent can touch and pushes it to work the way a careful human would, committing only what holds up. It drives Claude Code or Codex on your own subscription; you read the diff and merge.
built under its own discipline
Temper is developed under its own gates. You can verify every number here yourself: clone the repo and run one command.
cat package.jsonnpm testnpm run critic-checkStress-tested on a 2,935-file production codebase.
the problem
They re-implement what already exists, leave dead code behind, widen the scope of a change, and quietly silence the linter to make an error go away.
Temper puts a deterministic gate between the agent and your git history. Work that introduces new entropy never gets committed. It gets re-prompted with the evidence, or, if a problem stays stuck, escalated to you instead of burning iterations.
Deterministic by default. LLMs only for irreducible judgment.
Every gate is plain, fast code, except the one call a machine can't make reliably: "did this re-implement something that already exists?"
how it works
The gates run cheapest-first. A violation is shown in full and fed back as root-cause feedback for the next attempt. If a single failure-domain recurs, Temper stops and hands it to you, rather than quietly burning through its iteration budget.
Nothing is committed unless every gate is green. What lands in your history is measurably clean.
the method
None of this is novel. These are established practices from teams shipping with AI, and Temper enforces them in code so they hold on every run.
The cheapest place to catch a bad change is before it exists. Temper has the agent draft a Plan (scope, acceptance, and the assumptions it rests on) for you to approve first. Catching a wrong approach in a one-page Plan is cheaper than catching it in a thousand-line diff.
Let it iterate freely. Let nothing into your history that didn't pass deterministic checks. The loop is the cheap part; the gate (scope, dead code, duplication, your tests) is what makes it safe to leave running.
Each violation is fed back as specific, root-cause evidence for the next attempt, not a blind "try again." If one problem keeps recurring unchanged, Temper stops and hands it to you rather than burning its iteration budget.
Everything that can be checked is checked by plain, fast code. The one call a machine can't make reliably ("did this re-implement something that already exists?") gets an LLM, with guardrails measured against the rates at which AI judges actually fail.
mode b · the overnight queue
Queue an ordered sequence of Plans and let Temper work through them unattended overnight. If a run is interrupted, it resumes from where it stopped.
# Temper run report - Queue: .temper/phases - Branch: temper/phases (from main; NOT merged) - Outcome: all-green | # | phase | status | commit | |---|-------------|-----------|-----------| | 1 | Add slugify | committed | 0d8515ba1 | | 2 | uniqueSlug | committed | cce8e3f9d | | 3 | Public API | committed | 6c17ebbe8 | Committed: 3/3 — review temper/phases, then merge.
get started
# one-time: clone, then put `temper` on your PATH git clone https://github.com/michaelrowejones/Temper && cd Temper && npm link # from inside the repo you want to work on: temper init # entry-point-aware gate config temper plan "add a foo widget" # draft a Plan from the codebase $EDITOR ./PLAN.md # review + approve it temper run ./PLAN.md # run it to a green gate
Temper must run in your own terminal. It cannot run inside a host-managed session, because nested credentials return a 401 (ADR-0003).
Temper runs in your own terminal, on your own subscription. Approve one Plan, and only clean work ever lands.