Foundation — Agentic Design Language

LLM Latency States

LLM inference has distinct temporal characteristics that differ from REST API loading: streaming output, variable latency, token-by-token generation, and tool-call interruptions. Loading states designed for APIs are semantically wrong for LLM responses. The design language must represent these characteristics accurately — false immediacy is dishonest.

Components: StreamingResponse · ThinkingState · MCPToolCall · StreamingDot · TokenCounter

Four Computational States

Every LLM operation passes through one or more of these states. Each state has a distinct visual treatment. Collapsing them into a single spinner erases signal.

STREAMINGStreamingResponse ↗

Blinking cursor (StreamingDot) at end of partial content

LLM is generating tokens — content is arriving character by character

Stops: The cursor disappears the instant the final token arrives. No fade.

Anti-pattern: Revealing complete content character-by-character with setInterval. This is fake streaming.

--ds-color-temporal

THINKINGThinkingState ↗

Pulsing blue left-border with elapsed time counter

Agent is reasoning before generating a response (chain-of-thought)

Stops: Border pulse stops instantly when reasoning concludes. No transition.

Anti-pattern: A generic spinner labeled 'Loading...' — gives no information about what kind of computation is happening.

--ds-color-temporal

TOOL-CALLINGMCPToolCall ↗

Tool name + input parameters + in-progress state badge

Agent is executing a function call, database query, or API request

Stops: Transitions to result display when the tool returns. Shows success or error.

Anti-pattern: Hiding tool calls as internal implementation details. Tool calls are observable agent actions — they belong in the audit trail.

--ds-color-temporal

COMPLETE—

Static left-border (structural, not temporal blue)

Agent has reached a conclusion and returned a final response

Stops: N/A — this IS the stopped state. No active indicators.

Anti-pattern: Keeping any temporal-blue indicators visible after completion. Blue means 'computation is happening.' After completion, there is no computation.

--ds-border-structure

Interactive Preview

COMPLETE0 chars

Token Reference

Token	Value	Meaning	Use in latency states
--ds-color-temporal	#2B44D4	Active computation — LLM is thinking, streaming, or tool-calling	ALL latency indicators while in-flight. Stops at completion.
--ds-border-structure	—	Static structural border — no computation active	Completed StreamingResponse border (replaces temporal blue on done)
--ds-type-mono-font	—	Monospace font for data and timing	Elapsed time counter, token count, tool input parameters
--ds-text-muted	4.83:1 contrast	Secondary / metadata text	Tokens-per-second, elapsed time display, tool call label
--ds-motion-stream	—	Animation token for streaming cursor	StreamingDot blink animation (1s infinite, stops on completion)
--ds-color-outcome-negative	#C62828	Computation failed	Failed tool call, hallucination warning, error state

Do / Don't

Distinguish streaming from loading

StreamingDot communicates 'content is being generated in real time.' Skeleton communicates 'a known shape will arrive.' These are different experiences — use the right component.

DON'T

Fake streaming with setTimeout character reveals

Simulated streaming delays the user's ability to act on partial output and misrepresents the model's actual generation pattern. If the response has arrived, display it immediately.

Show reasoning traces while the model is thinking

Chain-of-thought reasoning is computation the user paid for. ThinkingState accepts a trace prop — surface it. Visible reasoning builds appropriate trust in the output.

DON'T

Keep temporal-blue indicators visible after completion

--ds-color-temporal means computation is happening. An agent that is done thinking should not show a blue pulsing border. Completion is an event, not a transition.

Show tool calls as observable events

MCPToolCall displays what function ran, what parameters it received, and what it returned. This is the audit trail for agent actions — it must be visible.

DON'T

Label all states as 'Loading...'

Streaming ≠ thinking ≠ tool-calling ≠ loading. A generic spinner with 'Loading...' communicates nothing about the nature, progress, or expected duration of the computation.

Display elapsed time on extended computations

ThinkingState shows a live elapsed counter and surfaces estimated remaining time after 10 seconds. This is honest communication about cost and duration.

DON'T

Dump a complete response after a long blank wait

8 seconds of spinner followed by 500 words of text gives the user no signal that anything was happening. Stream content as it arrives.

Usage

import { StreamingResponse } from "@/components/patterns/StreamingResponse";
import { ThinkingState } from "@/components/patterns/ThinkingState";

// StreamingResponse — pass real partial content; never simulate
<StreamingResponse
  content={partialContent}     // grows as tokens arrive
  isStreaming={!isDone}        // false the moment generation completes
  tokensPerSecond={42}
  elapsed={elapsedMs}
/>

// ThinkingState — show while agent is reasoning before responding
<ThinkingState
  isThinking={isReasoning}
  startedAt={reasoningStartedAt}
  trace={chainOfThoughtLines}  // add lines as they arrive
  estimatedSecondsRemaining={12}
/>

// Rule: never pass isStreaming={true} to a response that has already arrived.
// If the full response is available, render it directly — isStreaming={false}.

Component API

Prop reference for the two latency-state components.

StreamingResponse

Prop	Type	Default	Description
content	string	—	Current text content — may be partial while isStreaming is true. Grows as tokens arrive. Never pass a static complete response with isStreaming=true.
isStreaming	boolean	false	True while the LLM is actively generating. Set to false the moment generation completes — this stops the cursor animation and hides the token rate counter.
tokensPerSecond	number	—	Display-only token throughput shown in the streaming footer.
elapsed	number	—	Elapsed time in milliseconds since the request started. Used to render the elapsed timer in the streaming footer.
tokenTimeoutMs	number	30000	Milliseconds of silence (no new tokens) before the timeout UI is shown.
onRetry	() => void	—	Called when the user requests a retry after a token timeout.
onCancel	() => void	—	Called when the user discards the partial response after a token timeout.
isTruncated	boolean	—	True when the stream ended but the output was cut short (max tokens, context limit). Shows a continuation prompt.
onContinue	() => void	—	Called when the user requests continuation of a truncated response.

ThinkingState

Prop	Type	Default	Description
isThinking	boolean	false	True while the agent is actively reasoning before responding. Controls the pulse animation.
trace	string[]	—	Chain-of-thought lines as they arrive. Add lines incrementally — each new line animates in.
startedAt	string	—	ISO timestamp when thinking started. Used to calculate and display elapsed time.
estimatedSecondsRemaining	number	—	Shown after 10s threshold as a hint. Not a countdown — displayed as a rough estimate.
timeoutSecs	number	120	Seconds of thinking before the timeout escalation UI is triggered.
onTimeout	() => void	—	Called once when elapsed time first reaches timeoutSecs.
onRetry	() => void	—	Called when the user requests a retry from the timeout state.
onEscalate	() => void	—	Called when the user chooses to escalate after a thinking timeout.
isInterrupted	boolean	—	True when reasoning was interrupted before reaching a conclusion.

See also

Agent Identity

Who is acting and under what authority

See also

Trust Provenance

How the agent arrived at its output

See also

Multi-Agent Coordination

Handoff, conflict, and delegation patterns

See also

Patterns

StreamingResponse, ThinkingState, MCPToolCall