Build a Tax Prep Organizer Agent.

This guide takes the same reusable context architecture from the healthcare appeals workflow and points it at tax prep.

The goal is not AI filing your taxes. The goal is a clean, evidence-backed packet: normalized income, categorized expenses, missing documents, review questions, and a human or CPA handoff.

People lose high-friction paperwork fights because their information is scattered, unstructured, uncited, and incomplete. The fix is to own the context: collect the mess, normalize it, ground it in source documents, and produce the next human-reviewed action.

Real IRS rules Source layerSynthetic taxpayer Demo dataCPA-ready Packet output
01 / Shared skeletonSame runbook shell, different bureaucracy.Tax prep uses the same primitives as healthcare: own the context first, then draft the next reviewed action.

Do not position this as AI filing taxes. The v1 is a tax prep organizer: it turns scattered documents into a reviewable packet for the taxpayer or CPA.

The tax problem is context disorder.

People lose high-friction paperwork fights because their information is scattered, unstructured, uncited, and incomplete. The fix is to own the context: collect the mess, normalize it, ground it in source documents, and produce the next human-reviewed action.

The tax version applies that pattern to W-2s, 1099s, receipts, CSVs, prior-year summaries, and IRS rules. The agent does not file the return. It builds the file so the human or CPA can review from clean evidence.

You do

Drop synthetic tax forms, receipts, CSVs, and source rules into the starter repo.

The AI does

Normalize the tax-year ledger, flag gaps, draft CPA questions, and export the prep packet.

Show the full prompt
<prompt>
  <task>Build a document-grounded case workflow from reusable Open Skills primitives.</task>
  <thesis>
    People lose because their information is scattered, unstructured, uncited, and incomplete.
    The workflow should help the person own their context, not outsource judgment to a black box.
  </thesis>
  <primitive_chain>
    <step>Ingest documents into markdown/text with raw source coordinates as anchors. Use PDF page/region, CSV line number, or form box identifiers, and embed the identical anchor scheme in the markdown that downstream citations will use. Keep one numbering scheme end to end.</step>
    <step>Chunk and tag source evidence by structure.</step>
    <step>Normalize the case facts into a ledger.</step>
    <step>Run the coverage gate. Every ingested document must produce at least one normalized record or be explicitly marked reference-only. Print the list of unconsumed documents and stop before drafting if any document is unaccounted for.</step>
    <step>Reconcile shared facts across sources before drafting. Compare the same fact anywhere it appears, turn every mismatch into a named review question, and record which source governs the tracked value.</step>
    <step>Store chunks, records, mappings, and outputs in SQLite by default.</step>
    <step>Optional: if you already run OB1, mirror the case store into Open Brain; otherwise skip this step entirely. SQLite is the complete beginner path.</step>
    <step>Retrieve relevant evidence deterministically before drafting.</step>
    <step>Validate citations before export. The citation guard returns pass / needs_review / fail verdicts. Any fail blocks packet export until fixed or converted to a named review question, and the guard verdict summary must appear in the packet README.</step>
    <step>Export an editable packet and stop at human review.</step>
  </primitive_chain>
  <constraint>The agent organizes and drafts. It does not sign, send, file, submit, authorize, or transmit sensitive data.</constraint>
</prompt>

The shared primitive chain is the point.

The tax guide should explicitly call the same Open Skills as the healthcare guide. That sets the precedent: guides grow together because each one improves the shared primitives beneath it.

SQLite stays the beginner path. Open Brain becomes the OB1 path when the person wants the ledger and evidence to become durable personal context.

02 / Data strategyUse real IRS rules and synthetic taxpayer records.The demo should be authoritative without exposing private financial information.

Real rules, fake taxpayer.

Use IRS publications, instructions, recordkeeping pages, and form references for rule context. Use synthetic taxpayer records for W-2s, 1099-NEC, 1099-MISC, interest/dividend statements, receipts, mileage logs, business bank CSVs, and prior-year summaries.

Start with a normal-person scope.

The first scope should be one tax year, federal-only, single taxpayer or sole proprietor, common forms, and Schedule C-style expense organization. The artifact is a prep packet, not a filed return.

Scope the tax year before you trust the ledger.

Ask the agent to infer the tax year from evidence density across the inbox: form years, statement ranges, receipt dates, CSV periods, and prior-year summaries. Documents from other years, such as a prior-year return dropped in the same folder, become reference-only records and stay out of the ledger.

Define expected-but-missing forms from gaps and contradictions in this case's own documents, not from a generic fixture list. Coverage windows, unmatched deposits, payer names, and prior-year carryover notes create the missing-document checklist.

Coverage checkpoint: the inbox document count must equal the source_documents row count, and every ingested document must produce at least one normalized record or be explicitly marked reference-only. Hard stop with a named-file list on mismatch. Without this check, a weak agent can silently drop 5 of 15 documents including the W-2, then report the W-2 as missing while it sits in its own ingested folder.

03 / Domain layerMap forms and expenses to review categories.The tax-specific intelligence lives in the ledger schema, form mappings, and review questions.

Reconcile across sources before categorizing.

One real-world transaction equals one ledger row. Deduplicate by date, amount, and vendor: a receipt and its matching bank CSV line merge into a single row that cites both pieces of evidence. If the receipt and bank line disagree on amount, create a review question instead of two expense entries.

Separate corroboration from primary evidence. Bank deposits that match a 1099 or invoice total become corroboration records that cross-reference that form. Unmatched deposits stay primary income and generate a missing-1099 checklist item.

Cross-check every W-2 and 1099 against deposits, and every business deposit against a 1099 or invoice by payer. Unmatched payers become named missing-document checklist items with dates and amounts, for example: "Apex Consulting, $6,200 across 2 deposits, no 1099 on file."

Retrieval starts from tax structure.

Map from evidence to Schedule C-style categories only after reconciliation. Use a starter category set with explicit matching rules:

Advertising. vendor identity or receipt text shows paid promotion, sponsorship, listing fees, or ad platform spend.

Car and truck. mileage logs and vehicle expenses tied to business use; a mileage log maps to one aggregate car-and-truck evidence record using total business miles times the IRS standard mileage rate, never to per-trip dollar lines. The IRS publishes the rate, and the source list links the rate page.

Office expense. workspace supplies, postage, printing, and small office items with a business receipt.

Supplies. materials consumed in the work product or client service delivery, supported by itemized receipts.

Software and subscriptions. business tools, hosting, domains, AI tools, editing apps, and SaaS accounts tied to the work.

Utilities. business phone, internet, or workspace utility charges when the evidence shows business use.

Meals at 50%. business meals with receipt, date, amount, participants or purpose, and 50% treatment flagged in the ledger.

Match on vendor identity and supporting evidence, never on substrings. Without this rule, a weak keyword map can file "PACIFIC GAS & ELECTRIC" under Car and truck because it matched "gas." Recurring personal-pattern charges such as rent, groceries, streaming, and restaurants default to excluded-pending-review, never to a business category.

Set needs_review as the default status for any transaction that lacks a matched receipt or documented business purpose. Only receipt-corroborated, single-category items enter the ledger as ok. Meals and mixed receipts always start as review questions: meals need the 50% rule and a business purpose, and mixed receipts need a split between business and personal items before they can affect totals.

The output is a prep packet.

Export a folder, not one mystery file. The packet contains editable drafts plus a combined PDF handoff.

Packet folder. One packet/ directory per tax year: income-summary.md, expense-ledger.csv, deduction-evidence-map.json, missing-documents.md, cpa-questions.md, schedule-c-summary.md, packet.pdf, and a sources/ manifest.json.

The PDF is one combined, rendered document for the taxpayer or CPA, with the individual drafts kept editable in the folder. For one taxpayer-year it should be a sane low double digit page count, tables should render as tables, and raw markdown artifacts should not appear in the PDF.

The packet reproduces the citation guard's actual pass, needs_review, and fail counts verbatim. Export refuses while any claim fails, or stamps DRAFT-INVALID on the cover if the workflow allows a draft handoff for review.

04 / Sources and gatesDo not let the guide pretend to be a tax preparer.The credibility comes from clean organization, citations, and explicit handoff boundaries.

Cite the rule sources.

The guide should point readers to IRS Publication 17, Schedule C instructions, IRS recordkeeping guidance, and IRS pages for common information returns such as W-2s and 1099s.

The human or CPA files.

The workflow stops at review/edit/export. Filing taxes is a legal and financial act. The agent prepares the evidence packet and questions; the taxpayer or professional decides what goes on the return.

05 / Verification gatesDone means verified.Make each stage prove its own output before the packet reaches a taxpayer or CPA.

Add a prove-it checklist to every stage.

Ingestion. Does every inbox file appear in the index with an anchor? A broken run has files on disk that never appear in source_documents. Fix by stopping the run and listing the missing filenames before normalization starts.

Chunking. Does a cited chunk contain the actual form box or CSV row, not front matter? A broken run cites a document title, cover page, or section intro while the claimed value lives elsewhere. Fix by re-chunking around form boxes, table rows, and receipt line items.

Normalization. Do names look like names; do dates parse; does the ledger total reconcile against source line-item sums? A broken run has payer fields that look like categories, invalid dates, or totals that do not foot to the source rows. Fix by repairing parsers and adding a ledger-total reconciliation step.

Reconciliation. Is every receipt merged with its bank line; is every 1099 and W-2 matched against deposits? A broken run double-counts receipts and bank rows or leaves named payers unmatched. Fix by merging duplicate evidence into one row and producing missing-document questions for unmatched payers.

Drafting plus guard. Does a clean draft pass and a seeded fabricated citation fail? A broken run accepts invented citations or vague source references. Fix by requiring chunk-level anchors and a negative test before export.

Export. Open the PDF: is the page count sane and are the tables rendered? A broken run produces a huge export, blank tables, or raw markdown. Fix by rendering the packet, opening it, and blocking export when the PDF does not match the folder drafts.

Human gate. Does the packet stop at review with CPA questions present? A broken run presents the packet as ready to file or hides unresolved questions. Fix by requiring review status, visible CPA questions, and a person-owned filing decision.

Ship only when the guard says the packet is reviewable.

The guard's verdict is the gate. Passing claims can ship into the review packet, needs_review flags stay honest review questions, and failed claims block export until repaired or removed.

Put the export acceptance check on the human-gate checklist itself: open packet.pdf, confirm the page count, confirm the expected sections are present, confirm tables render, and confirm the packet still stops at CPA or taxpayer review.