| UI or UX | UI + UX - Anti-pattern for uncalibrated or over-exact AI confidence displays | UI + UX - Calibrated reliability and uncertainty display for AI or automated predictions before user action | UI + UX - Read-only scalar value gauge within a known range | UI + UX - Whole-answer source coverage and grounding evidence display | UI + UX - Severe-consequence warning copy before an action | UI + UX - Runtime checkpoint that pauses AI or automation until an eligible human authorizes the next step |
| UI guidance | Do not render model uncertainty as exact percentages, decimals, star ratings, gauges, or ranked certainty labels unless the score is calibrated, scoped, fresh, and tied to a decision threshold users can understand. | Render confidence and uncertainty as labelled reliability information with confidence band, reason, input scope, calibration status, review threshold, freshness, and the next safe action. | Show the measured object, current value, unit, minimum and maximum context, and threshold bands close to the gauge so users can interpret the reading without guessing. | Render source grounding as an answer-wide evidence panel that separates source scope, searched sources, retrieved sources, used sources, supported claims, partially supported claims, unsupported claims, and unresolved source states. | Render warning text as a short high-emphasis statement with a warning icon, visible or hidden warning label, and explicit consequence copy placed before the relevant action, declaration, or instruction. | Render a human approval gate as a paused automation checkpoint with the proposed action, tool or workflow step, triggering rule, risk level, payload snapshot, requester or agent, approver eligibility, timeout, and explicit approve, reject, edit, cancel, or bypass controls. |
| UX guidance | Protect users from automation bias by making confidence limits actionable: explain what the number means, where it is valid, what threshold changes the workflow, and what safe action follows. | Use confidence / uncertainty display when users need to decide whether an AI prediction, classification, recommendation, extraction, risk assessment, or generated answer is reliable enough to apply. | Use a meter when users need to judge the current level of a bounded resource, score, capacity, or risk, not when they are completing a task or choosing a value. | Use source grounding display when users need to judge whether an AI answer is backed by the right body of evidence, not merely open one citation. | Use warning text when users must understand a serious consequence before acting or failing to act, such as a fine, loss of access, permanent deletion, eligibility impact, or legal responsibility. | Use human approval gate when automation is ready to act but policy, risk, confidence, cost, access, publication, deployment, customer impact, or legal consequence requires a human decision before execution continues. |
| Good UI | A claim classifier says Calibration unavailable, explains that the model is outside its monitored scope, hides the 0.873 raw score, and routes to reviewer triage. | A claim classifier says Medium confidence, 71 to 78 percent calibrated range, review threshold 80 percent, conflicting account-age signal, and routes the case to manual review. | An account storage card says 86 GB of 100 GB used, marks 70 GB as warning and 90 GB as critical, and labels the current state Critical. | A policy answer includes a Grounding panel showing 4 sources searched, 3 retrieved, 2 used, 5 supported claims, 1 partially supported claim, and 1 unsupported claim with a Review action. | Before Submit declaration, a warning with an exclamation icon says the user may be fined if they provide false information. | An AI support agent pauses before issuing a refund, shows the proposed amount, customer, policy match, confidence, source grounding, approver role, timeout, Approve refund, Edit amount, Reject, and Stop run controls. |
| Bad UI | A generated legal answer says 97.42 percent confident with no calibration scope, threshold, freshness, evidence, or review path. | A generated answer shows 97 percent sure without calibration, threshold, source coverage, or review path. | A red-to-green bar says 89% with no unit, minimum, maximum, or explanation of whether high is good. | The answer shows a green Grounded badge even though only one citation supports one paragraph. | A red sentence says Important below the submit button after the user has already acted. | A banner says Human approval needed but does not show the tool call, payload, approver, timeout, or resume consequence. |
| Good UX | A reviewer sees that exact confidence is unavailable, opens the uncertainty reason, requests missing evidence, and avoids auto-denying a claim. | A reviewer sees low confidence and out-of-distribution input, opens the reason panel, collects the missing invoice, and avoids auto-denying the claim. | A user sees storage at 86 of 100 GB, understands the account is in the critical band, opens Manage storage, and deletes old exports before uploads are blocked. | A reviewer opens the grounding panel, sees that the answer used the current policy but not the outdated FAQ, and flags one unsupported claim before publishing. | Users see the fine or eligibility consequence before checking the declaration and can pause to verify their answer. | A billing lead opens the paused refund gate, sees that the amount is under policy but source grounding is partial, edits the refund to the verified amount, approves, and the agent resumes only that step. |
| Bad UX | Users trust a confident-looking 98.7 percent label and send an unsupported answer to a customer. | Users treat a high-confidence label as proof even though the answer has no source grounding and the claim still needs evidence. | A user watches a meter animate during upload and waits for it to reach full even though it represents remaining quota, not upload progress. | A user trusts a generated answer because the product says Grounded, but the source scope was only web search and did not include internal policy. | A benefit-loss warning appears only after submission, so users cannot change the decision it warns about. | A human approves a stale agent action from email and the agent applies it to a different customer state. |
| Best fit | An AI product displays model certainty, extraction confidence, recommendation score, classifier probability, generated-answer confidence, retrieval rank, or risk score with precision the system cannot justify. | Users must judge whether an AI prediction, classification, recommendation, extraction, risk score, or generated answer is reliable enough to use. | A current value exists inside a meaningful known range. | Users need answer-wide evidence coverage before trusting generated content. | A user must understand a serious consequence before taking or skipping an action. | An AI agent, workflow, deployment, or automation is ready to perform a high-impact step and must pause for human authorization. |
| Avoid when | The value is a validated, calibrated confidence range shown with scope, threshold, reason, and review behavior; use confidence / uncertainty display instead. | The system cannot estimate uncertainty or calibration honestly. | The value has no meaningful maximum or minimum. | The system cannot determine source scope, retrieval status, or claim support reliably. | The message is a dynamic task status that must be announced when it appears. | The action has already happened and users only need an audit log. |
| Required state | Raw score hidden state where an internal score is withheld from the user-facing decision. | High confidence state with calibration scope, reason, and whether direct apply is allowed. | Normal state inside the acceptable band. | Default grounded state with source scope, searched sources, retrieved sources, used sources, and supported-claim count. | No-warning state where the action has no severe consequence. | Paused gate state with proposed action, payload snapshot, reason for gate, and run context. |
| Accessibility burden | Expose calibration unavailable, stale, out-of-distribution, insufficient evidence, and review-required states as text near the score. | Expose confidence label, uncertainty reason, threshold, freshness, and gated action as text rather than relying on color, position, or animation alone. | Prefer the native meter element where possible because it carries the correct read-only meter semantics. | Expose grounding summary, source scope, status counts, unsupported claims, and source groups as text. | Do not rely on color alone; include visible or programmatic warning wording and a non-color cue such as an icon. | Expose gate status, proposed action, target, payload summary, risk, approver rule, timeout, and current run state as text. |
| Common misuse | Formatting an uncalibrated model score as 97.42 percent sure. | Showing a fake percent or exact decimal for an uncalibrated model score. | Using a meter to show task progress such as upload completion. | Showing a global Grounded badge when only some claims have evidence. | Using warning text for routine hints, explanations, or mild reminders. | Showing Approve without the exact action, payload, target, risk, or resume consequence. |