Dingo & Co. Knowledge Work
A 23-deliverable consulting brief: research, financial reconciliation, regulatory analysis, decks and spreadsheets. Tests whether a model can run an entire knowledge-work engagement end to end.
Recalibrated from the prior visual-audit score of 78.4 to 81.0. The package remains substantively impressive: complete, well-researched, legally cautious, and strategically strong on the dingo/import absurdities. The rendered visuals still fail professional frontend standards in places, with overflowing text, clipped headings, chart-label collisions, cut-off captions, disconnected funnel graphics, and poor typography/spacing; therefore the visual_storytelling and ux_reviewability signals remain capped below 60. Under the operator’s cross-model calibration, however, these defects are treated as systemic but non-blocking and comparable to the Opus 4.8 funnel-visual case, so the strict-score visual deduction from the pre-audit 83.0 is about 2 points total.
What it nailed
- Completed the full artifact set with correct filenames and real business-document formats.
- Handled the benchmark’s central absurdities and legal/ethical traps with unusually strong judgment.
- Produced a robust assumptions file and source log that separate official sources, secondary sources, internal estimates, and fictional competitors.
- Used provided image assets and generated real visual artifacts rather than text-only stand-ins, even though the rendered polish is flawed.
- Strong GTM and investor-facing strategy with staged budget gates, channel rules, support-language controls, and NCI risk mitigation.
Where it slipped
- Rendered deck/dashboard visuals contain hard defects: overflowing funnel text, a clipped Executive summary heading, labels touching chart elements, cut-off caption text, disconnected/misaligned funnel graphics, and poor typography/spacing.
- Material price inconsistency: workbook/deck state a $749 hard floor, while email/GTM material offers a $699 lapsed-owner price.
- TAM math is internally inconsistent; stated $45M-$85M and $60M+ conclusions do not follow from the product-fit formulas shown in the workbook.
- Some public-facing copy includes unsupported or risky factual claims, especially 'first ten thousand support tickets' and beta-use claims.
- A few research claims rely on secondary or commercial sources where primary verification would be preferable.
- Some customer/beta quote usage is not clearly traceable to permissioned evidence despite the package’s stated quote policy.