Error State Pattern System
The UX of agentic failure — how the interface contains, communicates, and recovers from agent errors. Three components cover the full failure spectrum: containment (ErrorBoundaryCard), recovery breadcrumb (AgentRecoveryFlow), and partial completion (PartialSuccessPanel).
Color Logic
Error severity maps directly to the three-primary semantic system. Never use raw hex — every color must trace to a --ds-* token.
Token Reference
| Token | Value | Usage |
|---|---|---|
| --ds-color-outcome-negative | #C62828 / dark: #EF9A9A | Fatal error border, failed step dot, override button background |
| --ds-color-validation | #C49A1A / dark: #FFD54F | Recoverable error border, awaiting-human step, PartialSuccessPanel accent |
| --ds-color-temporal | #2B44D4 / dark: #90CAF9 | Transient error border, in_progress recovery step dot, retry button |
| --ds-color-outcome-positive | #2E7D32 / dark: #A5D6A7 | Succeeded step dot and label in AgentRecoveryFlow and PartialSuccessPanel |
| --ds-color-validation-subtle | gold tint | Header background on recoverable error and PartialSuccessPanel header |
| --ds-color-validation-fg | accessible gold fg | Text on gold/validation backgrounds |
| --ds-text-inverse | white / dark: black | Text on colored button backgrounds (override, approve) |
ErrorBoundaryCard
Containment unit for agent errors. Surfaces severity, source, message, optional technical detail, and explicit recovery actions. Never silently swallows errors — every state is visible.
- Use
severity="fatal"only for truly unrecoverable states where no action chain fixes the problem - Include a
codeor trace ID whenever available — operators need it for support tickets - Provide at least one explicit action — never leave the operator with no path forward
- Keep
messageplain language — operators are humans under stress
- Don't use
severity="fatal"for rate limits or timeouts — those are transient - Don't use
variant="override"(red) for recoverable operations — reserve it for bypassing safety checks - Don't put a retry button on a fatal card — it creates false hope
- Don't suppress the
detailprop when debugging info is available
| Prop | Type | Default | Description |
|---|---|---|---|
| title | string | — | Short error title displayed in the card body heading. |
| message | string | — | Human-readable explanation of what failed and why. |
| source | string | — | Agent or system that produced the error (e.g. 'TREASURY-AGENT'). |
| timestamp | string | — | ISO timestamp or display string shown in the action footer. |
| severity | 'fatal' | 'recoverable' | 'transient' | — | Controls border and accent color. fatal=red, recoverable=gold, transient=blue. |
| code | string | undefined | Optional error code or trace ID rendered as a mono chip. |
| detail | string | undefined | Raw technical detail — shown in a collapsible pre block. |
| actions | ErrorAction[] | [] | Recovery actions. Each action has label, variant (retry|escalate|dismiss|override), and onClick. |
AgentRecoveryFlow
A timeline breadcrumb showing every recovery step the agent attempted after hitting a dead end. Makes the agent's reasoning legible — operators see exactly what was tried, what failed, and what comes next.
- Show every step the agent actually attempted — even failed ones build trust
- Include the
errorfield on failed steps — raw messages help operators diagnose - Use
overallStatus="escalated"when the human must make a call, not just approve - Keep
originalGoalas the agent stated it — don't reframe failures after the fact
- Don't omit failed steps to make the agent look cleaner — that's deceptive
- Don't use
overallStatus="resolved"until all steps are confirmed complete - Don't mix recovery steps with business steps in the same flow — keep this to the failure path only
- Don't show MANUAL OVERRIDE unless you've implemented a real override path
| Prop | Type | Default | Description |
|---|---|---|---|
| agent | string | — | Agent that owns the recovery flow. |
| originalGoal | string | — | What the agent was trying to accomplish before it hit the dead end. |
| steps | RecoveryStep[] | — | Ordered list of recovery steps with status (succeeded|failed|in_progress|pending|skipped). |
| overallStatus | 'in_progress' | 'resolved' | 'abandoned' | 'escalated' | — | Controls the top-level accent color and status label. |
| onOverride | () => void | undefined | Called when the operator clicks MANUAL OVERRIDE. |
| onDismiss | () => void | undefined | Called when the operator dismisses the flow. |
PartialSuccessPanel
Surfaces the state "agent completed N of M steps — awaiting human on step K." Gold throughout: this is not an error, it is a handoff. The agent succeeded at its part; the human must complete theirs before the workflow continues.
- Use gold — this is the validation/quality color, not an error
- Show ALL steps including completed ones — context for the approver
- Write
humanInstructionas a direct instruction to the specific operator, not a system message - Mark remaining steps as
remaining, notpending— they haven't failed
- Don't use red for this — partial success is not a failure state
- Don't have more than one step in
awaiting_humanat a time — sequence handoffs, don't batch them - Don't hide completed steps — the operator needs to see how far the agent got
- Don't show APPROVE & CONTINUE if you haven't wired up the actual continuation
| Prop | Type | Default | Description |
|---|---|---|---|
| agent | string | — | Agent that produced the partial result. |
| task | string | — | High-level task description shown in the header. |
| steps | PartialStep[] | — | Steps with status: completed|awaiting_human|remaining|skipped. |
| onApprove | () => void | undefined | Called when the operator approves the awaited step to continue. |
| onReject | () => void | undefined | Called when the operator rejects the awaited step. |