Show streaming as a distinct response lifecycle with receiving, first output, partial, stalled, stopped, failed, continued, and complete states; keep partial and final content visually distinct; preserve recoverable output; and gate risky actions until final checks are done.
Generated text or structured content can be read or monitored before completion.
Users benefit from low first-token latency, stop control, and visible generation progress.
The product can distinguish partial, stopped, failed, and complete output honestly.
Avoid when
Intermediate chunks may expose unsafe, private, or misleading content.
Users must act only on fully verified output and partial text would increase risk.
The operation is measurable system work better shown as progress.
The wait is brief enough that a scoped spinner is clearer.
The UI cannot preserve partial output or explain stream failure.
Problem it prevents
Streaming generated output can reduce perceived wait time, but partial text is not necessarily final, cited, safe, valid, or complete. Without explicit stream states and recovery, users may trust unfinished output, lose partial work after interruption, or be overwhelmed by constant visual and assistive updates.
Pattern anatomy
What a strong implementation has to make clear
User need
The system may emit output deltas, message events, tool-call events, citation events, moderation events, usage metadata, and final completion events at different times.
Pattern promise
Show streaming as a distinct response lifecycle with receiving, first output, partial, stalled, stopped, failed, continued, and complete states; keep partial and final content visually distinct; preserve recoverable output; and gate risky actions until final checks are done.
Required state
Queued or receiving state before first output arrives.
Recovery path
Users act on a partial answer before the final event changes or qualifies the conclusion.
Access contract
Expose stream milestones such as started, still generating, stopped, failed, citation ready, and complete as status messages.
Quality bar
The difference between expert and weak execution
Strong implementation
Specific, visible, recoverable
A policy assistant shows Answer generating, streams paragraphs into a stable answer region, marks citations pending, exposes Stop generation, then changes to Complete when citations and safety checks finish.
A code assistant streams a function body inside a code block, keeps Copy disabled until the closing fence and final lint status arrive, and shows Continue after a network interruption.
A user sees the first-token state quickly, reads early outline bullets while the answer continues, stops generation after enough detail, and sees the result labelled Partial with Continue and Regenerate options.
A source-grounded answer streams text immediately but keeps citation chips in Pending until retrieval events resolve, then marks which claims are sourced.
Weak implementation
Vague, hidden, hard to recover from
A generated answer appears word by word with no partial label, no stop control, and a Copy button that looks ready before sources arrive.
A streaming JSON answer is displayed as valid data before the closing braces and schema check have completed.
A user copies an early legal recommendation before the final paragraph reverses the conclusion after a tool result arrives.
A stalled event stream leaves a blinking cursor forever and gives no timeout, retry, copy partial, or discard path.
UI guidance
Render streamed output with a visible generation state, partial-answer label, stop control, final-complete state, and clear distinction between text that is still arriving and content that has passed final citation, safety, tool, or format checks.
Use stable containers for partial text, pending citations, tool-call progress, structured chunks, copy controls, and final metadata so new deltas do not shift controls or make early output look finished.
UX guidance
Use a streaming response when showing partial generated output helps users start reading or monitoring work before the model finishes, and when the product can explain that early chunks may still change, be filtered, or lack final sources.
Protect users from acting on unfinished output by preserving partial content on interruption, labelling stopped or failed streams, exposing continue or retry, and delaying copy, publish, apply, citation trust, or structured export until the stream reaches a valid terminal state when risk requires it.
Implementation contract
What the implementation must handle
States
Queued or receiving state before first output arrives.
First output state that confirms generation has started.
Active streaming state with partial-answer label, stop control, and stable output region.
Pending citation, pending tool call, pending moderation, and pending structured-validation states.
Interaction
The UI never presents an active stream as a complete answer until the terminal event or final validation state arrives.
Partial output remains attached to the prompt, source scope, tool state, and stream status that produced it.
Stop generation changes the state to stopped or partial instead of silently converting the text into a finished answer.
Retry clearly states whether it continues from the partial output, restarts the same prompt, or regenerates a new answer.
Accessibility
Expose stream milestones such as started, still generating, stopped, failed, citation ready, and complete as status messages.
Avoid announcing every token or character through a live region.
Keep focus stable while output grows, and provide a keyboard path to the latest generated block, stop control, retry, citations, and final answer.
Respect reduced motion by avoiding distracting cursor animation or rapid visual effects.
Review
Which events prove the response has started, is still partial, has stalled, has stopped, has failed, or is complete?
Can users tell whether text, citations, tool results, and structured output are pending or final?
What actions are safe on partial output, and which require final validation?
What remains available after stop, retry, timeout, offline, rate limit, or moderation hold?
Interactive lab
Inspect the states before you copy the pattern
Inspect a generated answer while it streams
Watch first output, active deltas, pending citations, tool progress, moderation hold, stalled stream, stopped partial, continue, failed stream, retry, complete, copy final, malformed code, auto-scroll, noisy live region, early citation, erased partial, and spinner-only failures.
Streaming response
Interactive demo is ready
Launch the live UI/UX lab when you want to inspect states, keyboard behavior, and common failure modes.
State To Inspect
Queued or receiving state before first output arrives.
Keyboard / Access
Tab reaches Stop generation while the stream is active and reaches Continue, Retry, Copy partial, Discard, and Copy final when those actions appear.
Avoid Generating
Showing a blinking cursor with no state, stop control, or elapsed feedback.
Supports accessible stream milestone announcements without forced focus movement.
Full agent/debug reference
Problem Context
The system may emit output deltas, message events, tool-call events, citation events, moderation events, usage metadata, and final completion events at different times.
Text, Markdown, tables, JSON, code, citations, and generated files may be invalid or incomplete until the stream closes cleanly.
Users may want to start reading, stop when enough information arrives, continue after stopping, retry after network failure, copy partial output, or wait for final source verification.
Streaming can happen inside a chat interface, side panel, document editor, code assistant, workflow run, voice transcript, or background job detail surface.
Accessibility, reduced motion, mobile scroll, text selection, virtualized transcripts, and screen reader announcement rate all affect whether streaming is usable.
Selection Rules
Choose streaming response when partial generated output is useful before the full answer is ready and the product can explain partial versus final state.
Use chat interface when the design problem is the broader conversation transcript, history, turns, and composer rather than one active response lifecycle.
Use progress bar when the system reports measurable operation completion instead of readable generated output.
Use loading spinner when the wait is short and neither partial output nor measurable progress can be shown.
Use feed when users browse finished chronological items rather than watch one answer being generated.
Delay apply, publish, commit, structured export, final citation trust, and high-risk copy actions until the stream reaches a verified terminal state.
Show Stop generation when stopping is technically possible and the partial answer can be preserved or explicitly discarded.
Show Continue or Retry when a stream stops, stalls, times out, or fails while useful partial output remains.
Use a pending source or pending tool state when text arrives before citations, retrieval, function output, or safety checks.
Do not stream raw chunks if they will expose unsafe intermediate content, broken syntax, misleading partial conclusions, or private tool data.
Required States
Queued or receiving state before first output arrives.
First output state that confirms generation has started.
Active streaming state with partial-answer label, stop control, and stable output region.
Pending citation, pending tool call, pending moderation, and pending structured-validation states.
Stalled or slow-stream state with elapsed status and recovery threshold.
Stopped-by-user state that labels partial output and offers continue, retry, copy partial, or discard.
Interrupted, failed, rate-limited, offline, and timeout states that preserve prompt and partial output where safe.
Complete state with final answer, final citations or source status, copy/apply eligibility, and completion metadata.
Interaction Contract
The UI never presents an active stream as a complete answer until the terminal event or final validation state arrives.
Partial output remains attached to the prompt, source scope, tool state, and stream status that produced it.
Stop generation changes the state to stopped or partial instead of silently converting the text into a finished answer.
Retry clearly states whether it continues from the partial output, restarts the same prompt, or regenerates a new answer.
Copy, apply, cite, export, and publish actions indicate whether they operate on partial output or final output.
Streaming updates do not steal focus, break text selection, collapse the composer, or force scroll when users are reading older content.
Screen reader announcements summarize meaningful stream milestones instead of announcing every token.
If safety, citation, tool, or structured-output checks revise or remove streamed content, the UI explains the revision.
Implementation Checklist
Model stream lifecycle separately from prompt draft, submitted prompt, final response, citation resolution, tool execution, and history state.
Track event types such as response started, output delta, tool call delta, citation pending, moderation hold, error, stopped, completed, and final metadata.
Render partial output in stable blocks that tolerate incomplete Markdown, code fences, tables, links, JSON, and citations.
Add Stop, Continue, Retry, Regenerate, Copy partial, Copy final, Discard, View events, and Report issue controls according to product risk.
Throttle visual updates and live announcements to meaningful chunks, sentence boundaries, or elapsed intervals.
Preserve prompt, partial output, selected text, scroll position, source panel state, and focus through stream failure and route changes where possible.
Gate apply, publish, final copy, download, or structured export until final validation passes when early chunks can be misleading.
Test slow first token, long stream, network interruption, server error event, content-filter hold, citation delay, tool delay, malformed Markdown, malformed JSON, mobile keyboard, reduced motion, and screen reader output.
Common Generated-UI Mistakes
Showing a blinking cursor with no state, stop control, or elapsed feedback.
Letting partial output look finished before the final event arrives.
Copying or applying a streamed answer before citations, tool results, or validation complete.
Erasing partial output after a network failure.
Announcing every token through a live region.
Forcing scroll to the newest token while users are reading or selecting earlier text.
Using streaming to hide long-running tool work that should have visible steps or approval.
Critique Questions
Which events prove the response has started, is still partial, has stalled, has stopped, has failed, or is complete?
Can users tell whether text, citations, tool results, and structured output are pending or final?
What actions are safe on partial output, and which require final validation?
What remains available after stop, retry, timeout, offline, rate limit, or moderation hold?
How does streaming behave when users scroll, select text, copy, inspect citations, use mobile keyboards, or rely on screen readers?
Would a spinner, progress bar, chat interface, or workflow trace be clearer than streaming partial text?
Accessibility
Expose stream milestones such as started, still generating, stopped, failed, citation ready, and complete as status messages.
Avoid announcing every token or character through a live region.
Keep focus stable while output grows, and provide a keyboard path to the latest generated block, stop control, retry, citations, and final answer.
Respect reduced motion by avoiding distracting cursor animation or rapid visual effects.
Label partial output, stopped output, failed output, and final output in text.
Ensure streamed Markdown, code, tables, links, citations, and structured chunks remain readable while incomplete.
Keyboard Behavior
Tab reaches Stop generation while the stream is active and reaches Continue, Retry, Copy partial, Discard, and Copy final when those actions appear.
Escape may stop generation only when that shortcut is exposed and does not conflict with closing panels or clearing drafts.
Focus does not jump on every output delta.
When the stream completes, focus remains where the user was or moves to a named completion status according to the task flow.
When a stream fails, focus remains near the partial answer and recovery actions.
Keyboard users can inspect pending and final citations without losing their place in the response.