Daily burn heatmap
Every day colored by tokens spent, on a log scale.
Build guide
Your AI spend is the most honest report card you have. This guide turns that spend into a dashboard you can build yourself: one screen for seeing where tokens go across Codex, Claude, and ChatGPT.
Burn rate is not a vanity stat. It is the clearest signal of whether you are getting fluent or just getting expensive.
What you build
The dashboard is not complicated. It is five different views over normalized daily totals. The discipline is in the data shape and the labels.
Every day colored by tokens spent, on a log scale.
A cleaner read on whether your usage is climbing or getting leaner.
The work families that eat the budget: shipping, research, review, video, admin.
A huge token number translated into something a human can feel.
The receipts: per tool, per day, exact and estimated side by side.
Visual pass
The page should feel like the system it is asking you to build: logs moving into rows, rows becoming views, views getting verified before anything ships.
Exact logs and honest estimates need to land in one normalized file. That is the whole backbone.
The target is not a fancy chart collection. It is a readable daily operating surface for AI spend.
Collect, normalize, build, verify, ship. If the totals do not reconcile, the loop is not finished.
Why build it
Most people pay the bill or hit the wall and assume that is just what AI costs. It is not. The first step to spending less is being able to see what you spend.
Burn hides in habits: raw PDFs, runaway conversations, bad model choice, and expensive context. A dashboard turns invisible waste into a number you watch.
Tokens per outcome is the clearest tell of whether you are getting better with the tools or just spending harder.
As models get more capable, sloppy workflows cost more. Measuring early gives you a way to get sharper before the burn scales up.
The anatomy
Start with daily totals and build outward. The visual layer should make spikes visible, but the raw rows still need to reconcile.
The old mistake is making this pretty before making it true. Normalize the data first, then give every visualization the same exact-vs-estimated honesty. A dashboard that confidently displays bad math is worse than no dashboard.
The driverfield is the move that makes it useful. It turns "I spent a lot" into "I spent a lot on shipping, research, review, or video."
Before you start
If you already build with a coding agent and can run a dev server, you have the practical skills you need.
Step 1
Some tools log real token usage. Others force you to estimate. The honest move is making that fidelity visible in the interface.
| Source | Fidelity | How to pull it |
|---|---|---|
| Codex | exact | Codex app and CLI sessions write local logs with real token usage. Have your agent total input and output by local day. |
| Claude Code | exact | Claude Code stores per-session JSONL with input, output, and cache counts. Include API or agent calls if you use them. |
| Claude chat | estimated | The web and desktop chat path has no tidy local token export. Estimate from message counts and average lengths, then label it honestly. |
| ChatGPT | estimated | Request your data export and tokenize conversation text by date. Treat this as calibrated estimation unless you have exact provider logs. |
Do not let estimates cosplay as measurements. That is the fastest way to make the dashboard feel useful while quietly training you on bad numbers.
Step 2
Every view is just a different read of this shape. Get this right and the UI becomes straightforward.
// daily-burn.json - one row per day, in your local timezone
[
{
"date": "2026-05-24",
"codex_tokens": 184320,
"claude_code_tokens": 512880,
"claude_code_calls": 47,
"claude_chat_est": 38000,
"chatgpt_est": 21000,
"total": 756200,
"driver": "shipping",
"evidence": "NAT-1035 build, OB1 PR review"
}
]Step 3
The difference between a useful agent build and a random dashboard is precision: file shape, view order, hard rules, and verification.
Paste this from the project folder with daily-burn.json in the root.
# Token-Burn Dashboard - Build Prompt You are building a single-page token-burn dashboard. I have a file `daily-burn.json` in this folder: an array of daily rows. Each row has: date (YYYY-MM-DD), codex_tokens, claude_code_tokens, claude_code_calls, claude_chat_est, chatgpt_est, total, driver (a short work-family label), and evidence (an optional note). ## Stack - A minimal Next.js App Router app, or a single static HTML file if that is simpler. - No backend. Read `daily-burn.json` at build time. - No heavyweight chart library unless you need one. Prefer small, dependency-light SVG/canvas. - Deployable to Vercel with zero config. ## Build these five views, in this order 1. Daily burn heatmap. A calendar grid with weeks as columns and weekday rows. Color each day by `total` on a logarithmic color scale, so quiet days and spikes are both legible. Include a legend. On hover, show the date, total, and driver. 2. Weekly trend line. Sum tokens per ISO week and plot a line chart with a log y-axis. Mark the highest week. 3. What's driving the burn. Group by `driver`. Show a sorted dot plot or bar list: each driver's total tokens, percent share, and the evidence note. Add quick windows: today, last 7 days, last 30 days, peak day, active days. 4. Scale equivalents. Convert the all-time `total` into 2-3 human-relatable Fermi comparisons, like novels worth of text or hours of reading. Keep the math visible and clearly approximate. 5. 30-day moving-average table. A table of the last 30 days: date, total, and each source column: codex, claude_code exact, claude_code_calls, claude_chat_est, chatgpt_est. Show a 7-day moving average column for `total`. ## Rules - Label exact vs estimated data visibly. Never present an estimate as a measurement. - Add a time-range selector affecting all views: 90 days / 180 days / 1 year / all-time. - Make it responsive and readable on a phone. - Keep it a single self-contained page if you can. - Comment the color-scale and tokenizer math. ## When done 1. Run the dev server and confirm all five views render from `daily-burn.json`. 2. List exactly which files you created. 3. Tell me how to add a new day of data. 4. Give me the one command to deploy to Vercel.
Step 4
Each view has a job and a common failure mode. Check those before you polish anything.
The at-a-glance conscience. It should make a runaway day obvious without flattening quiet days.
Use a log color scale. Linear heatmaps lie when one spike dominates.
Smooth the daily noise into a direction. The question is whether usage is compounding or getting leaner.
Use log y-axis again and label the peak week.
Turn 'I spent a lot' into 'I spent a lot on shipping, review, or research.'
Driver labels matter. Keep the vocabulary small.
Translate huge token totals into rough human-scale comparisons.
Show the math or it reads as a gimmick.
The receipts drawer. This is where exact and estimated sources sit side by side.
Make the trust level obvious at every row.
Step 5
The dashboard does not need a backend for v1. Keep the data file local, build the page, and deploy only the normalized totals you are comfortable exposing.
npm installnpm run dev, then open localhost:3000vercel from the folder.Verify
Before you call it done, make the dashboard prove it is telling the truth.
Pick one day. Add each source column by hand. It should match the day's total.
Readers should never need to guess whether a number is measured or inferred.
A 10x day and a 1x day should both be visible. If the map looks flat, the scale is wrong.
Switch 90 days to all-time and confirm the heatmap, trend, drivers, and table all respond.
Troubleshooting
If the dashboard feels wrong, debug the input assumptions before you redesign the output.
Logs are often UTC. Pick one timezone, convert on ingest, and bucket by local date before totals are calculated.
That almost always means a linear color scale. Switch to log and use enough ramp stops to show quiet days.
A bad tokens-per-message constant can swamp the dashboard. Calibrate it against one real conversation.
Make it yours
Once the dashboard is accurate, add the features that push behavior instead of merely reporting it.
Set a daily target and mark days that blow past it. A line you can cross is a line you start to notice.
Tag days with what shipped and divide. Falling cost per outcome is the fluency curve worth bragging about.
A small nightly script can read each tool's logs and append a new day, so the dashboard stays current without touching JSON.
This guide is adapted from the original Limited Edition Jonathan build guide and the companion thesis that token burn rate is a revealing metric of AI fluency. Read the related essay on Nate's newsletter.