Build guide

Build your own token-burn dashboard.

Your AI spend is the most honest report card you have. This guide turns that spend into a dashboard you can build yourself: one screen for seeing where tokens go across Codex, Claude, and ChatGPT.

Burn rate is not a vanity stat. It is the clearest signal of whether you are getting fluent or just getting expensive.

A weekendBuild time$0Local data + Vercel5 viewsHeatmap, trend, drivers, scale, table
CodexexactClaude CodeexactClaude chatestimatedChatGPTestimateddaily-burn.jsonone row per daydriver labelsbehavior signal

One tidy table, five honest reads.

The dashboard is not complicated. It is five different views over normalized daily totals. The discipline is in the data shape and the labels.

01

Daily burn heatmap

Every day colored by tokens spent, on a log scale.

02

Weekly trend line

A cleaner read on whether your usage is climbing or getting leaner.

03

Burn drivers

The work families that eat the budget: shipping, research, review, video, admin.

04

Scale equivalents

A huge token number translated into something a human can feel.

05

Moving-average table

The receipts: per tool, per day, exact and estimated side by side.

Make the invisible workflow visible.

The page should feel like the system it is asking you to build: logs moving into rows, rows becoming views, views getting verified before anything ships.

Diagram showing AI usage sources flowing into one daily burn data file.
Source map

Map the sources before you touch the UI.

Exact logs and honest estimates need to land in one normalized file. That is the whole backbone.

Dark cyan Unlock AI dashboard mockup showing token burn heatmaps, trend data, source fidelity, and spend drivers.
Target surface

Keep the finished dashboard visible while you build.

The target is not a fancy chart collection. It is a readable daily operating surface for AI spend.

Diagram showing the build loop from usage logs to normalized rows, dashboard views, verification, and deployment.
Build loop

Work in loops until the math gets boring.

Collect, normalize, build, verify, ship. If the totals do not reconcile, the loop is not finished.

A bill tells you what happened. A burn dashboard changes behavior.

Most people pay the bill or hit the wall and assume that is just what AI costs. It is not. The first step to spending less is being able to see what you spend.

01

You cannot improve what you cannot see.

Burn hides in habits: raw PDFs, runaway conversations, bad model choice, and expensive context. A dashboard turns invisible waste into a number you watch.

02

It is a fluency meter, not a bill.

Tokens per outcome is the clearest tell of whether you are getting better with the tools or just spending harder.

03

The lesson gets more expensive.

As models get more capable, sloppy workflows cost more. Measuring early gives you a way to get sharper before the burn scales up.

The heatmap is the conscience. The table is the truth.

Start with daily totals and build outward. The visual layer should make spikes visible, but the raw rows still need to reconcile.

The old mistake is making this pretty before making it true. Normalize the data first, then give every visualization the same exact-vs-estimated honesty. A dashboard that confidently displays bad math is worse than no dashboard.

The driverfield is the move that makes it useful. It turns "I spent a lot" into "I spent a lot on shipping, research, review, or video."

Nothing exotic. Local files, one agent, one deploy.

If you already build with a coding agent and can run a dev server, you have the practical skills you need.

Tools

  • Codex App, Claude Code, Cursor, or another coding agent.
  • Node 20+ and a terminal.
  • A Vercel account if you want to publish it.
  • The usage data from whichever tools you want to track.

Rules

  • Keep raw exports out of public repos.
  • Commit only the normalized totals you are comfortable sharing.
  • Label estimated values everywhere.
  • Build local first. Deploy only when the math reconciles.

Get the data, then admit how good it is.

Some tools log real token usage. Others force you to estimate. The honest move is making that fidelity visible in the interface.

SourceFidelityHow to pull it
CodexexactCodex app and CLI sessions write local logs with real token usage. Have your agent total input and output by local day.
Claude CodeexactClaude Code stores per-session JSONL with input, output, and cache counts. Include API or agent calls if you use them.
Claude chatestimatedThe web and desktop chat path has no tidy local token export. Estimate from message counts and average lengths, then label it honestly.
ChatGPTestimatedRequest your data export and tokenize conversation text by date. Treat this as calibrated estimation unless you have exact provider logs.

Do not let estimates cosplay as measurements. That is the fastest way to make the dashboard feel useful while quietly training you on bad numbers.

Normalize everything into one daily row.

Every view is just a different read of this shape. Get this right and the UI becomes straightforward.

// daily-burn.json - one row per day, in your local timezone
[
  {
    "date": "2026-05-24",
    "codex_tokens": 184320,
    "claude_code_tokens": 512880,
    "claude_code_calls": 47,
    "claude_chat_est": 38000,
    "chatgpt_est": 21000,
    "total": 756200,
    "driver": "shipping",
    "evidence": "NAT-1035 build, OB1 PR review"
  }
]

Let the agent build the dashboard, but give it the right spec.

The difference between a useful agent build and a random dashboard is precision: file shape, view order, hard rules, and verification.

Token-burn dashboard build prompt

Paste this from the project folder with daily-burn.json in the root.

# Token-Burn Dashboard - Build Prompt

You are building a single-page token-burn dashboard. I have a file `daily-burn.json` in this folder: an array of daily rows. Each row has: date (YYYY-MM-DD), codex_tokens, claude_code_tokens, claude_code_calls, claude_chat_est, chatgpt_est, total, driver (a short work-family label), and evidence (an optional note).

## Stack
- A minimal Next.js App Router app, or a single static HTML file if that is simpler.
- No backend. Read `daily-burn.json` at build time.
- No heavyweight chart library unless you need one. Prefer small, dependency-light SVG/canvas.
- Deployable to Vercel with zero config.

## Build these five views, in this order

1. Daily burn heatmap. A calendar grid with weeks as columns and weekday rows. Color each day by `total` on a logarithmic color scale, so quiet days and spikes are both legible. Include a legend. On hover, show the date, total, and driver.

2. Weekly trend line. Sum tokens per ISO week and plot a line chart with a log y-axis. Mark the highest week.

3. What's driving the burn. Group by `driver`. Show a sorted dot plot or bar list: each driver's total tokens, percent share, and the evidence note. Add quick windows: today, last 7 days, last 30 days, peak day, active days.

4. Scale equivalents. Convert the all-time `total` into 2-3 human-relatable Fermi comparisons, like novels worth of text or hours of reading. Keep the math visible and clearly approximate.

5. 30-day moving-average table. A table of the last 30 days: date, total, and each source column: codex, claude_code exact, claude_code_calls, claude_chat_est, chatgpt_est. Show a 7-day moving average column for `total`.

## Rules
- Label exact vs estimated data visibly. Never present an estimate as a measurement.
- Add a time-range selector affecting all views: 90 days / 180 days / 1 year / all-time.
- Make it responsive and readable on a phone.
- Keep it a single self-contained page if you can.
- Comment the color-scale and tokenizer math.

## When done
1. Run the dev server and confirm all five views render from `daily-burn.json`.
2. List exactly which files you created.
3. Tell me how to add a new day of data.
4. Give me the one command to deploy to Vercel.

Review the five views like a builder, not a spectator.

Each view has a job and a common failure mode. Check those before you polish anything.

01

Daily burn heatmap

The at-a-glance conscience. It should make a runaway day obvious without flattening quiet days.

Use a log color scale. Linear heatmaps lie when one spike dominates.

02

Weekly trend line

Smooth the daily noise into a direction. The question is whether usage is compounding or getting leaner.

Use log y-axis again and label the peak week.

03

What is driving the burn

Turn 'I spent a lot' into 'I spent a lot on shipping, review, or research.'

Driver labels matter. Keep the vocabulary small.

04

Scale equivalents

Translate huge token totals into rough human-scale comparisons.

Show the math or it reads as a gimmick.

05

Moving-average table

The receipts drawer. This is where exact and estimated sources sit side by side.

Make the trust level obvious at every row.

Run it locally. Ship it when the math is boring.

The dashboard does not need a backend for v1. Keep the data file local, build the page, and deploy only the normalized totals you are comfortable exposing.

Local
  • npm install
  • npm run dev, then open localhost:3000
  • Confirm all five views render from your data.
Vercel
  • Push the project to GitHub.
  • Import it at Vercel or run vercel from the folder.
  • Set it private if your usage data is personal.

A subtly wrong token dashboard is worse than none.

Before you call it done, make the dashboard prove it is telling the truth.

The totals reconcile.

Pick one day. Add each source column by hand. It should match the day's total.

Exact and estimated numbers are labeled everywhere.

Readers should never need to guess whether a number is measured or inferred.

The log scales actually read.

A 10x day and a 1x day should both be visible. If the map looks flat, the scale is wrong.

The time-range selector changes every view.

Switch 90 days to all-time and confirm the heatmap, trend, drivers, and table all respond.

The problems are usually data problems.

If the dashboard feels wrong, debug the input assumptions before you redesign the output.

fix

Timezones smear your days

Logs are often UTC. Pick one timezone, convert on ingest, and bucket by local date before totals are calculated.

fix

The heatmap looks flat

That almost always means a linear color scale. Switch to log and use enough ramp stops to show quiet days.

fix

Estimates dwarf the real numbers

A bad tokens-per-message constant can swamp the dashboard. Calibrate it against one real conversation.

V1 shows the spend. V2 changes the spend.

Once the dashboard is accurate, add the features that push behavior instead of merely reporting it.

+

A burn budget

Set a daily target and mark days that blow past it. A line you can cross is a line you start to notice.

+

Tokens per outcome

Tag days with what shipped and divide. Falling cost per outcome is the fluency curve worth bragging about.

+

Auto-ingest

A small nightly script can read each tool's logs and append a new day, so the dashboard stays current without touching JSON.

This guide is adapted from the original Limited Edition Jonathan build guide and the companion thesis that token burn rate is a revealing metric of AI fluency. Read the related essay on Nate's newsletter.