Dingo & Co. Knowledge Work
A 23-deliverable consulting brief: research, financial reconciliation, regulatory analysis, decks and spreadsheets. Tests whether a model can run an entire knowledge-work engagement end to end.
An excellent Dingo & Co. run: complete, artifact-rich, visually usable, and unusually strong on the benchmark’s legal/ethical/absurdity traps. It falls short of near-mastery mainly on research citation precision, spreadsheet formula reliability, and final production polish rather than on core reasoning or completion.
What it nailed
- Completed the full 23-file package with real Office, PDF, HTML, markdown, and JSON artifacts.
- Handled the benchmark’s central absurdities with unusually strong judgment: dingo behavior, Alaska/Australia optics, import-created demand, legal uncertainty, ethics, and support liability.
- Maintained a consistent source-of-truth posture for revenue, launch timing, pricing, budget, runway, imports, TAM, and customer-feedback exclusions.
- Used provided source imagery in the deck, sales one-pager, dashboard, personas, and copy references rather than inventing inconsistent visuals.
- Separated real competitors from fictional/scenario competitors and did not treat inquiry counts or curiosity traffic as TAM.
- Produced a credible board/investor strategy with staged gates and non-evasive answers to uncomfortable questions.
Where it slipped
- Regulatory research is directionally strong but citation precision is uneven; several official sources are generic agency URLs rather than direct rule pages.
- Some market-sizing claims are derived or loosely supported and would need stronger verification before investor/public use.
- The pricing workbook’s Cost_Sensitivity formulas appear to contain column-reference errors, reducing spreadsheet reliability.
- The public-facing press release still mentions Northern Canid Imports/acquisition support, which may be legally and reputationally sensitive despite the no-CTA posture.
- The package is visually solid but not elite editorial design; operator review noted the one-pager footer is close to the page edge.
- The dashboard depends on Chart.js via CDN, so it is not fully self-contained/offline despite being a standalone HTML artifact.