AI Behavior Incident Report Template
A syndrome-classified incident form that bridges the language gap between engineering teams and governance stakeholders. Drop into existing incident workflows as an additional classification layer without replacing what’s already there.
Why standard incident forms miss the point
Most AI behavior incident forms inherit their structure from software bug reports: what went wrong, severity, steps to reproduce, expected vs. actual. This works well for deterministic failures. It fails for AI behavioral failures because it captures the symptom without identifying the pattern class.
When a user reports that “the AI gave me wrong information,” that could be Plausible Helpfulness (confabulating from overconfidence), Capability Masking (fabricating that it verified the information), Hollow Completions (declaring the task done without validating), or Responsibility Diffusion (blaming the user’s input for its own errors). Each has a different root cause and a different remediation path. Without the syndrome classification, you’re fixing symptoms instead of causes — and the same syndrome recurs in a different form next week.
This template adds two classification fields to whatever incident form you already use: Primary Syndrome and Micro-Failure Tags. Everything else is optional enhancement.
What goes in each field and why
| Field | What to put here | Why it matters |
|---|---|---|
| Incident ID | Sequential identifier: AI-INC-YYYY-MM-DD-NNN | Enables trend analysis across incidents — you can cluster by syndrome over time and see whether remediation is working |
| Severity | CRITICAL / HIGH / MODERATE / LOW based on user impact and deployment context. Capability Masking in a safety-critical deployment = CRITICAL. Same syndrome in a low-stakes productivity tool = HIGH at most. | Drives response timelines and escalation paths. Severity should reflect real-world consequence, not just how technically interesting the failure is. |
| Primary Syndrome | The most important field. One of the six: Plausible Helpfulness, Built-Not-Connected, Hollow Completions, Capability Masking, Responsibility Diffusion, Surface Compliance. Use the Earliest Decisive Deviation rule: label by the first syndrome that initiated the failure chain, not the last visible symptom. Not sure which syndrome applies? See the Matrix Explorer or the Core Six definitions. | Enables pattern tracking across incidents. Without this field, you have a list of stories. With it, you have quantified syndrome incidence that drives prioritization. |
| Micro-Failure Tags | One or more tags from the syndrome’s tag set. E.g., for Capability Masking: Verification Hallucinations, Phantom Deliverables, Tool Invocation Errors Hidden by Narration. Full tag sets for all six syndromes are in the Core Six reference. | Provides engineering-level specificity within the syndrome. Allows engineers to prioritize which tag cluster to address first in remediation. |
| Trace ID | Reference to the session trace in your logging system where the incident is visible in the model’s raw output. | Grounds the incident in evidence. The syndrome classification is an interpretation — the trace is the ground truth. Reviewers should be able to verify the classification independently. |
| Root Cause Analysis | Technical explanation of why the syndrome manifested in this trace. Not “the model hallucinated” — that’s the symptom. Try: “Completion boundary misalignment — model triggered done-signal on structural features without executing the verification step.” | Distinguishes incidents that look the same on the surface but have different causes — and therefore different fixes. |
| Remediation Plan | Three horizons: Immediate (24h mitigations), Short-term (1 week targeted fixes), Long-term (architectural improvements). Assign a team and a date to each. | Incidents without assigned owners and deadlines don’t get fixed. Three horizons reflect the reality that some mitigations are fast (add a validation gate) and some require architectural work (retrain with different completion signals). |
| Related Incidents | Links to other incidents with the same syndrome or tag set. | Transforms isolated incidents into a pattern record. If Hollow Completions appears five times in one month across different task types, that’s a systemic signal, not random noise. |
Copy and adapt for your organization
Adapt field labels, severity scales, and workflow integration points to match your tooling. The two non-negotiable fields are Primary Syndrome and Micro-Failure Tags — preserve those even if you drop everything else.

