Skip to content

Foundation — Agentic Design Language

LLM Latency States

LLM inference has distinct temporal characteristics that differ from REST API loading: streaming output, variable latency, token-by-token generation, and tool-call interruptions. Loading states designed for APIs are semantically wrong for LLM responses. The design language must represent these characteristics accurately — false immediacy is dishonest.

Components: StreamingResponse · ThinkingState · MCPToolCall · StreamingDot · TokenCounter

Four Computational States

Every LLM operation passes through one or more of these states. Each state has a distinct visual treatment. Collapsing them into a single spinner erases signal.

01

Blinking cursor (StreamingDot) at end of partial content

LLM is generating tokens — content is arriving character by character

Stops: The cursor disappears the instant the final token arrives. No fade.

Anti-pattern: Revealing complete content character-by-character with setInterval. This is fake streaming.

--ds-color-temporal
02

Pulsing blue left-border with elapsed time counter

Agent is reasoning before generating a response (chain-of-thought)

Stops: Border pulse stops instantly when reasoning concludes. No transition.

Anti-pattern: A generic spinner labeled 'Loading...' — gives no information about what kind of computation is happening.

--ds-color-temporal
03
TOOL-CALLINGMCPToolCall

Tool name + input parameters + in-progress state badge

Agent is executing a function call, database query, or API request

Stops: Transitions to result display when the tool returns. Shows success or error.

Anti-pattern: Hiding tool calls as internal implementation details. Tool calls are observable agent actions — they belong in the audit trail.

--ds-color-temporal
04
COMPLETE

Static left-border (structural, not temporal blue)

Agent has reached a conclusion and returned a final response

Stops: N/A — this IS the stopped state. No active indicators.

Anti-pattern: Keeping any temporal-blue indicators visible after completion. Blue means 'computation is happening.' After completion, there is no computation.

--ds-border-structure

Interactive Preview

COMPLETE0 chars

Token Reference

TokenValueMeaningUse in latency states
--ds-color-temporal#2B44D4Active computation — LLM is thinking, streaming, or tool-callingALL latency indicators while in-flight. Stops at completion.
--ds-border-structureStatic structural border — no computation activeCompleted StreamingResponse border (replaces temporal blue on done)
--ds-type-mono-fontMonospace font for data and timingElapsed time counter, token count, tool input parameters
--ds-text-muted4.83:1 contrastSecondary / metadata textTokens-per-second, elapsed time display, tool call label
--ds-motion-streamAnimation token for streaming cursorStreamingDot blink animation (1s infinite, stops on completion)
--ds-color-outcome-negative#C62828Computation failedFailed tool call, hallucination warning, error state

Do / Don't

DO

Distinguish streaming from loading

StreamingDot communicates 'content is being generated in real time.' Skeleton communicates 'a known shape will arrive.' These are different experiences — use the right component.

DON'T

Fake streaming with setTimeout character reveals

Simulated streaming delays the user's ability to act on partial output and misrepresents the model's actual generation pattern. If the response has arrived, display it immediately.

DO

Show reasoning traces while the model is thinking

Chain-of-thought reasoning is computation the user paid for. ThinkingState accepts a trace prop — surface it. Visible reasoning builds appropriate trust in the output.

DON'T

Keep temporal-blue indicators visible after completion

--ds-color-temporal means computation is happening. An agent that is done thinking should not show a blue pulsing border. Completion is an event, not a transition.

DO

Show tool calls as observable events

MCPToolCall displays what function ran, what parameters it received, and what it returned. This is the audit trail for agent actions — it must be visible.

DON'T

Label all states as 'Loading...'

Streaming ≠ thinking ≠ tool-calling ≠ loading. A generic spinner with 'Loading...' communicates nothing about the nature, progress, or expected duration of the computation.

DO

Display elapsed time on extended computations

ThinkingState shows a live elapsed counter and surfaces estimated remaining time after 10 seconds. This is honest communication about cost and duration.

DON'T

Dump a complete response after a long blank wait

8 seconds of spinner followed by 500 words of text gives the user no signal that anything was happening. Stream content as it arrives.

Usage

import { StreamingResponse } from "@/components/patterns/StreamingResponse";
import { ThinkingState } from "@/components/patterns/ThinkingState";

// StreamingResponse — pass real partial content; never simulate
<StreamingResponse
  content={partialContent}     // grows as tokens arrive
  isStreaming={!isDone}        // false the moment generation completes
  tokensPerSecond={42}
  elapsed={elapsedMs}
/>

// ThinkingState — show while agent is reasoning before responding
<ThinkingState
  isThinking={isReasoning}
  startedAt={reasoningStartedAt}
  trace={chainOfThoughtLines}  // add lines as they arrive
  estimatedSecondsRemaining={12}
/>

// Rule: never pass isStreaming={true} to a response that has already arrived.
// If the full response is available, render it directly — isStreaming={false}.

Component API

Prop reference for the two latency-state components.

StreamingResponse

PropTypeDefaultDescription
contentstringCurrent text content — may be partial while isStreaming is true. Grows as tokens arrive. Never pass a static complete response with isStreaming=true.
isStreamingbooleanfalseTrue while the LLM is actively generating. Set to false the moment generation completes — this stops the cursor animation and hides the token rate counter.
tokensPerSecondnumberDisplay-only token throughput shown in the streaming footer.
elapsednumberElapsed time in milliseconds since the request started. Used to render the elapsed timer in the streaming footer.
tokenTimeoutMsnumber30000Milliseconds of silence (no new tokens) before the timeout UI is shown.
onRetry() => voidCalled when the user requests a retry after a token timeout.
onCancel() => voidCalled when the user discards the partial response after a token timeout.
isTruncatedbooleanTrue when the stream ended but the output was cut short (max tokens, context limit). Shows a continuation prompt.
onContinue() => voidCalled when the user requests continuation of a truncated response.

ThinkingState

PropTypeDefaultDescription
isThinkingbooleanfalseTrue while the agent is actively reasoning before responding. Controls the pulse animation.
tracestring[]Chain-of-thought lines as they arrive. Add lines incrementally — each new line animates in.
startedAtstringISO timestamp when thinking started. Used to calculate and display elapsed time.
estimatedSecondsRemainingnumberShown after 10s threshold as a hint. Not a countdown — displayed as a rough estimate.
timeoutSecsnumber120Seconds of thinking before the timeout escalation UI is triggered.
onTimeout() => voidCalled once when elapsed time first reaches timeoutSecs.
onRetry() => voidCalled when the user requests a retry from the timeout state.
onEscalate() => voidCalled when the user chooses to escalate after a thinking timeout.
isInterruptedbooleanTrue when reasoning was interrupted before reaching a conclusion.

See also

Agent Identity

Who is acting and under what authority

See also

Trust Provenance

How the agent arrived at its output

See also

Multi-Agent Coordination

Handoff, conflict, and delegation patterns

See also

Patterns

StreamingResponse, ThinkingState, MCPToolCall