Foundation — Agentic Design Language
LLM Latency States
LLM inference has distinct temporal characteristics that differ from REST API loading: streaming output, variable latency, token-by-token generation, and tool-call interruptions. Loading states designed for APIs are semantically wrong for LLM responses. The design language must represent these characteristics accurately — false immediacy is dishonest.
Components: StreamingResponse · ThinkingState · MCPToolCall · StreamingDot · TokenCounter
Four Computational States
Every LLM operation passes through one or more of these states. Each state has a distinct visual treatment. Collapsing them into a single spinner erases signal.
Blinking cursor (StreamingDot) at end of partial content
LLM is generating tokens — content is arriving character by character
Stops: The cursor disappears the instant the final token arrives. No fade.
Anti-pattern: Revealing complete content character-by-character with setInterval. This is fake streaming.
--ds-color-temporalPulsing blue left-border with elapsed time counter
Agent is reasoning before generating a response (chain-of-thought)
Stops: Border pulse stops instantly when reasoning concludes. No transition.
Anti-pattern: A generic spinner labeled 'Loading...' — gives no information about what kind of computation is happening.
--ds-color-temporalTool name + input parameters + in-progress state badge
Agent is executing a function call, database query, or API request
Stops: Transitions to result display when the tool returns. Shows success or error.
Anti-pattern: Hiding tool calls as internal implementation details. Tool calls are observable agent actions — they belong in the audit trail.
--ds-color-temporalStatic left-border (structural, not temporal blue)
Agent has reached a conclusion and returned a final response
Stops: N/A — this IS the stopped state. No active indicators.
Anti-pattern: Keeping any temporal-blue indicators visible after completion. Blue means 'computation is happening.' After completion, there is no computation.
--ds-border-structureInteractive Preview
Token Reference
| Token | Value | Meaning | Use in latency states |
|---|---|---|---|
| --ds-color-temporal | #2B44D4 | Active computation — LLM is thinking, streaming, or tool-calling | ALL latency indicators while in-flight. Stops at completion. |
| --ds-border-structure | — | Static structural border — no computation active | Completed StreamingResponse border (replaces temporal blue on done) |
| --ds-type-mono-font | — | Monospace font for data and timing | Elapsed time counter, token count, tool input parameters |
| --ds-text-muted | 4.83:1 contrast | Secondary / metadata text | Tokens-per-second, elapsed time display, tool call label |
| --ds-motion-stream | — | Animation token for streaming cursor | StreamingDot blink animation (1s infinite, stops on completion) |
| --ds-color-outcome-negative | #C62828 | Computation failed | Failed tool call, hallucination warning, error state |
Do / Don't
DO
Distinguish streaming from loading
StreamingDot communicates 'content is being generated in real time.' Skeleton communicates 'a known shape will arrive.' These are different experiences — use the right component.
DON'T
Fake streaming with setTimeout character reveals
Simulated streaming delays the user's ability to act on partial output and misrepresents the model's actual generation pattern. If the response has arrived, display it immediately.
DO
Show reasoning traces while the model is thinking
Chain-of-thought reasoning is computation the user paid for. ThinkingState accepts a trace prop — surface it. Visible reasoning builds appropriate trust in the output.
DON'T
Keep temporal-blue indicators visible after completion
--ds-color-temporal means computation is happening. An agent that is done thinking should not show a blue pulsing border. Completion is an event, not a transition.
DO
Show tool calls as observable events
MCPToolCall displays what function ran, what parameters it received, and what it returned. This is the audit trail for agent actions — it must be visible.
DON'T
Label all states as 'Loading...'
Streaming ≠ thinking ≠ tool-calling ≠ loading. A generic spinner with 'Loading...' communicates nothing about the nature, progress, or expected duration of the computation.
DO
Display elapsed time on extended computations
ThinkingState shows a live elapsed counter and surfaces estimated remaining time after 10 seconds. This is honest communication about cost and duration.
DON'T
Dump a complete response after a long blank wait
8 seconds of spinner followed by 500 words of text gives the user no signal that anything was happening. Stream content as it arrives.
Usage
import { StreamingResponse } from "@/components/patterns/StreamingResponse";
import { ThinkingState } from "@/components/patterns/ThinkingState";
// StreamingResponse — pass real partial content; never simulate
<StreamingResponse
content={partialContent} // grows as tokens arrive
isStreaming={!isDone} // false the moment generation completes
tokensPerSecond={42}
elapsed={elapsedMs}
/>
// ThinkingState — show while agent is reasoning before responding
<ThinkingState
isThinking={isReasoning}
startedAt={reasoningStartedAt}
trace={chainOfThoughtLines} // add lines as they arrive
estimatedSecondsRemaining={12}
/>
// Rule: never pass isStreaming={true} to a response that has already arrived.
// If the full response is available, render it directly — isStreaming={false}.Component API
Prop reference for the two latency-state components.
StreamingResponse
| Prop | Type | Default | Description |
|---|---|---|---|
| content | string | — | Current text content — may be partial while isStreaming is true. Grows as tokens arrive. Never pass a static complete response with isStreaming=true. |
| isStreaming | boolean | false | True while the LLM is actively generating. Set to false the moment generation completes — this stops the cursor animation and hides the token rate counter. |
| tokensPerSecond | number | — | Display-only token throughput shown in the streaming footer. |
| elapsed | number | — | Elapsed time in milliseconds since the request started. Used to render the elapsed timer in the streaming footer. |
| tokenTimeoutMs | number | 30000 | Milliseconds of silence (no new tokens) before the timeout UI is shown. |
| onRetry | () => void | — | Called when the user requests a retry after a token timeout. |
| onCancel | () => void | — | Called when the user discards the partial response after a token timeout. |
| isTruncated | boolean | — | True when the stream ended but the output was cut short (max tokens, context limit). Shows a continuation prompt. |
| onContinue | () => void | — | Called when the user requests continuation of a truncated response. |
ThinkingState
| Prop | Type | Default | Description |
|---|---|---|---|
| isThinking | boolean | false | True while the agent is actively reasoning before responding. Controls the pulse animation. |
| trace | string[] | — | Chain-of-thought lines as they arrive. Add lines incrementally — each new line animates in. |
| startedAt | string | — | ISO timestamp when thinking started. Used to calculate and display elapsed time. |
| estimatedSecondsRemaining | number | — | Shown after 10s threshold as a hint. Not a countdown — displayed as a rough estimate. |
| timeoutSecs | number | 120 | Seconds of thinking before the timeout escalation UI is triggered. |
| onTimeout | () => void | — | Called once when elapsed time first reaches timeoutSecs. |
| onRetry | () => void | — | Called when the user requests a retry from the timeout state. |
| onEscalate | () => void | — | Called when the user chooses to escalate after a thinking timeout. |
| isInterrupted | boolean | — | True when reasoning was interrupted before reaching a conclusion. |