| UI or UX | UI + UX - Inspectable record of generated AI output, prompt context, model, sources, tools, user actions, versions, and downstream use | UI + UX - Searchable and exportable record of system, user, or administrative events | UI + UX - Whole-answer source coverage and grounding evidence display | UI + UX - Calibrated reliability and uncertainty display for AI or automated predictions before user action | UI + UX - Editable generated draft with provenance, review, and apply controls | UI + UX - AI response regeneration and same-turn retry control with version and context preservation | UI + UX - Live execution trace for an AI agent or automation run after work has started | UI + UX - Runtime checkpoint that pauses AI or automation until an eligible human authorizes the next step |
| UI guidance | Render AI output audit trail as an answer-level evidence record that connects prompt snapshot, response snapshot, model version, source snapshot, retrieved context, tool calls, safety events, user actions, approvals, edits, exports, and retention state. | Render activity logs as evidence-oriented records with event time, actor, action, object, source system, scope, result, and technical context such as IP address or location when available. | Render source grounding as an answer-wide evidence panel that separates source scope, searched sources, retrieved sources, used sources, supported claims, partially supported claims, unsupported claims, and unresolved source states. | Render confidence and uncertainty as labelled reliability information with confidence band, reason, input scope, calibration status, review threshold, freshness, and the next safe action. | Render editable AI output as a generated draft with a clear boundary between original generated text, user-edited text, tracked changes, source mapping, citation preservation, and final applied output. | Render regenerate / retry as a response-level control that names whether it will rerun the same prompt, continue after failure, regenerate a new answer version, or retry failed tool and source work. | Render an agent progress trace as a live, ordered run timeline with run ID, plan version, current step, queued steps, active tool or task, elapsed time, last event time, step status, blocked gates, retry state, and final outcome. | Render a human approval gate as a paused automation checkpoint with the proposed action, tool or workflow step, triggering rule, risk level, payload snapshot, requester or agent, approver eligibility, timeout, and explicit approve, reject, edit, cancel, or bypass controls. |
| UX guidance | Use AI output audit trail when users must investigate, prove, review, dispute, export, or comply with how a generated AI output was created and used. | Use activity log when users need to investigate, audit, verify, or troubleshoot actions across accounts, objects, systems, settings, or security boundaries. | Use source grounding display when users need to judge whether an AI answer is backed by the right body of evidence, not merely open one citation. | Use confidence / uncertainty display when users need to decide whether an AI prediction, classification, recommendation, extraction, risk assessment, or generated answer is reliable enough to apply. | Use editable AI output when users need to revise generated content after creation while retaining provenance, citations, source coverage, and a deliberate apply or save contract. | Use regenerate / retry when users need another AI attempt for the same submitted request, or need recovery from a failed, stopped, low-quality, stale, blocked, or partially generated response. | Use agent progress trace when an AI agent or automation has started multi-step work and users need to monitor progress, intervene on stalls or gates, understand partial completion, and know whether the reviewed plan is still being followed. | Use human approval gate when automation is ready to act but policy, risk, confidence, cost, access, publication, deployment, customer impact, or legal consequence requires a human decision before execution continues. |
| Good UI | A policy answer drawer shows prompt snapshot, model version, response ID, retrieved sources, tool calls, safety filter result, generated text, user edits, approval, copy, export, and retention window. | An organization audit log table shows timestamp, actor, action, target object, app, IP address, result, and a Details drawer with before and after fields. | A policy answer includes a Grounding panel showing 4 sources searched, 3 retrieved, 2 used, 5 supported claims, 1 partially supported claim, and 1 unsupported claim with a Review action. | A claim classifier says Medium confidence, 71 to 78 percent calibrated range, review threshold 80 percent, conflicting account-age signal, and routes the case to manual review. | A policy assistant shows a generated draft answer with citation chips, user-edited spans, tracked change controls, source mapping indicators, and Apply output disabled until unsupported edits are reviewed. | A chat answer shows Regenerate answer, Retry failed sources, and Compare versions with the original prompt, source scope, model, and tool changes visible. | An account research agent trace shows Run A-204, reviewed plan P-18, completed CRM lookup, active policy search, queued draft email, approval gate pending before send, elapsed time, and a View tool details control. | An AI support agent pauses before issuing a refund, shows the proposed amount, customer, policy match, confidence, source grounding, approver role, timeout, Approve refund, Edit amount, Reject, and Stop run controls. |
| Bad UI | A chat transcript shows only the final answer with no prompt, source snapshot, model, tool calls, user actions, or applied output history. | A page titled Activity shows vague entries such as Changed settings with no actor, target, timestamp, or source. | The answer shows a green Grounded badge even though only one citation supports one paragraph. | A generated answer shows 97 percent sure without calibration, threshold, source coverage, or review path. | A final-looking answer becomes editable with no generated-versus-user-edited distinction, no citation preservation state, and no undo to the generated draft. | A Try again button silently changes the prompt, model, sources, and tools, then overwrites the previous answer and citations. | A spinner says Working on it while an agent calls several tools with no step identity, elapsed time, blocked state, or recovery path. | A banner says Human approval needed but does not show the tool call, payload, approver, timeout, or resume consequence. |
| Good UX | A support lead opens a disputed customer reply, sees the exact AI draft, prompt, source article versions, editor changes, approver, sent timestamp, and retention status, then exports the evidence bundle. | An admin filters to failed SSO events, expands one entry, copies the event ID, exports the filtered range, and sees that records older than 180 days require a different archive. | A reviewer opens the grounding panel, sees that the answer used the current policy but not the outdated FAQ, and flags one unsupported claim before publishing. | A reviewer sees low confidence and out-of-distribution input, opens the reason panel, collects the missing invoice, and avoids auto-denying the claim. | A reviewer changes one sentence, sees it marked as user edited, accepts the tracked change, reviews a stale source warning, and applies the output only after unsupported text is resolved. | A user sees answer v1 has stale citations, chooses Regenerate with refreshed sources, compares v1 and v2, then restores v1 because v2 lost a required caveat. | A user watches the active step move from searching policies to drafting the email, opens the blocked permission item, grants access, and sees the run continue from the same step. | A billing lead opens the paused refund gate, sees that the amount is under policy but source grounding is partial, edits the refund to the verified amount, approves, and the agent resumes only that step. |
| Bad UX | A user regenerates an answer and the product overwrites the previous version, leaving no way to prove which output was copied. | A user marks a notification read and the corresponding activity evidence disappears from the only log. | A user trusts a generated answer because the product says Grounded, but the source scope was only web search and did not include internal policy. | Users treat a high-confidence label as proof even though the answer has no source grounding and the claim still needs evidence. | A user edits a generated compliance summary and all citations disappear, leaving no way to know which claims remain source-backed. | A user taps Regenerate and the product removes the original answer, so copied recommendations and review comments no longer have a version reference. | Users cannot tell whether the agent is stuck, waiting for approval, or finished because all states use the same animated progress label. | A human approves a stale agent action from email and the agent applies it to a different customer state. |
| Best fit | A generated AI output can influence compliance, customer communication, security, legal, finance, operations, code, policy, or other high-trust work. | Users need to inspect recorded user, admin, system, security, or integration events. | Users need answer-wide evidence coverage before trusting generated content. | Users must judge whether an AI prediction, classification, recommendation, extraction, risk score, or generated answer is reliable enough to use. | Generated content is expected to be revised before it is copied, saved, sent, published, or applied. | A user needs another AI-generated answer for the same request or a visible recovery path after response failure. | An agent or automation run has started and spans multiple steps, tools, gates, or side effects. | An AI agent, workflow, deployment, or automation is ready to perform a high-impact step and must pause for human authorization. |
| Avoid when | Users only need to read the current answer and inspect citations in the moment. | The goal is only to show a readable milestone history for one case or process. | The system cannot determine source scope, retrieval status, or claim support reliably. | The system cannot estimate uncertainty or calibration honestly. | Users only need to write or revise the request before generation. | The task is a simple non-AI operation retry already covered by the Retry pattern. | Execution has not started and users need to inspect or edit a proposed plan. | The action has already happened and users only need an audit log. |
| Required state | Generated output state with response ID, timestamp, user, conversation or thread ID, and model version. | Default log state with event records, result count, visible timezone, retention window, and permission scope. | Default grounded state with source scope, searched sources, retrieved sources, used sources, and supported-claim count. | High confidence state with calibration scope, reason, and whether direct apply is allowed. | Generated draft state with original generated content, creation time, model or run reference, and source coverage visible. | Initial answer state with prompt snapshot, response version, source scope, model or mode, and available regenerate or retry controls. | Run started state tied to run ID, plan version, objective, and user who started the run. | Paused gate state with proposed action, payload snapshot, reason for gate, and run context. |
| Accessibility burden | Expose output ID, version, timestamp, actor, action, source status, tool status, redaction status, and retention state as text, not color alone. | Use table or structured list semantics so actor, action, object, timestamp, result, and scope are perceivable together. | Expose grounding summary, source scope, status counts, unsupported claims, and source groups as text. | Expose confidence label, uncertainty reason, threshold, freshness, and gated action as text rather than relying on color, position, or animation alone. | Expose generated draft, user edited, tracked change, unsupported edit, unsafe edit, stale source, review required, accepted, rejected, saved, copied, regenerated, and applied states as text. | Expose same prompt, changed context, regenerating, retrying, version created, compare available, cooldown, exhausted, blocked, and restored states as text. | Expose trace status, run ID, current step, elapsed time, blocked state, final outcome, and details availability as text. | Expose gate status, proposed action, target, payload summary, risk, approver rule, timeout, and current run state as text. |
| Common misuse | Calling a chat transcript an audit trail when it does not preserve source, tool, model, action, approval, or version evidence. | Calling a social feed or notification drawer an activity log without event evidence. | Showing a global Grounded badge when only some claims have evidence. | Showing a fake percent or exact decimal for an uncalibrated model score. | Letting users edit a final-looking AI answer without generated-versus-user-edited status. | Using one Try again button for same-prompt retry, prompt edit, tool retry, source refresh, and alternate answer generation. | Using one spinner or vague Thinking label for a multi-step agent run. | Showing Approve without the exact action, payload, target, risk, or resume consequence. |