Supplementary Materials AI Behavior Incident Report Template
S1.2 — Operational Template

AI Behavior Incident Report Template

A syndrome-classified incident form that bridges the language gap between engineering teams and governance stakeholders. Drop into existing incident workflows as an additional classification layer without replacing what’s already there.

Who uses this
Engineering, operations, and compliance teams filing and reviewing AI behavior incidents. Anyone who currently writes “AI error” tickets without syndrome classification.
What it replaces
Generic bug tickets that describe symptoms without naming the failure pattern — which means the same pattern recurs because no one recognized it as a pattern.
What it enables
Cross-team communication where governance can say “Capability Masking incident” and engineering knows exactly what to look for in the trace — without a translation meeting.

Why standard incident forms miss the point

Most AI behavior incident forms inherit their structure from software bug reports: what went wrong, severity, steps to reproduce, expected vs. actual. This works well for deterministic failures. It fails for AI behavioral failures because it captures the symptom without identifying the pattern class.

When a user reports that “the AI gave me wrong information,” that could be Plausible Helpfulness (confabulating from overconfidence), Capability Masking (fabricating that it verified the information), Hollow Completions (declaring the task done without validating), or Responsibility Diffusion (blaming the user’s input for its own errors). Each has a different root cause and a different remediation path. Without the syndrome classification, you’re fixing symptoms instead of causes — and the same syndrome recurs in a different form next week.

This template adds two classification fields to whatever incident form you already use: Primary Syndrome and Micro-Failure Tags. Everything else is optional enhancement.

What goes in each field and why

Field What to put here Why it matters
Incident ID Sequential identifier: AI-INC-YYYY-MM-DD-NNN Enables trend analysis across incidents — you can cluster by syndrome over time and see whether remediation is working
Severity CRITICAL / HIGH / MODERATE / LOW based on user impact and deployment context. Capability Masking in a safety-critical deployment = CRITICAL. Same syndrome in a low-stakes productivity tool = HIGH at most. Drives response timelines and escalation paths. Severity should reflect real-world consequence, not just how technically interesting the failure is.
Primary Syndrome The most important field. One of the six: Plausible Helpfulness, Built-Not-Connected, Hollow Completions, Capability Masking, Responsibility Diffusion, Surface Compliance. Use the Earliest Decisive Deviation rule: label by the first syndrome that initiated the failure chain, not the last visible symptom. Not sure which syndrome applies? See the Matrix Explorer or the Core Six definitions. Enables pattern tracking across incidents. Without this field, you have a list of stories. With it, you have quantified syndrome incidence that drives prioritization.
Micro-Failure Tags One or more tags from the syndrome’s tag set. E.g., for Capability Masking: Verification Hallucinations, Phantom Deliverables, Tool Invocation Errors Hidden by Narration. Full tag sets for all six syndromes are in the Core Six reference. Provides engineering-level specificity within the syndrome. Allows engineers to prioritize which tag cluster to address first in remediation.
Trace ID Reference to the session trace in your logging system where the incident is visible in the model’s raw output. Grounds the incident in evidence. The syndrome classification is an interpretation — the trace is the ground truth. Reviewers should be able to verify the classification independently.
Root Cause Analysis Technical explanation of why the syndrome manifested in this trace. Not “the model hallucinated” — that’s the symptom. Try: “Completion boundary misalignment — model triggered done-signal on structural features without executing the verification step.” Distinguishes incidents that look the same on the surface but have different causes — and therefore different fixes.
Remediation Plan Three horizons: Immediate (24h mitigations), Short-term (1 week targeted fixes), Long-term (architectural improvements). Assign a team and a date to each. Incidents without assigned owners and deadlines don’t get fixed. Three horizons reflect the reality that some mitigations are fast (add a validation gate) and some require architectural work (retrain with different completion signals).
Related Incidents Links to other incidents with the same syndrome or tag set. Transforms isolated incidents into a pattern record. If Hollow Completions appears five times in one month across different task types, that’s a systemic signal, not random noise.

Copy and adapt for your organization

Adapt field labels, severity scales, and workflow integration points to match your tooling. The two non-negotiable fields are Primary Syndrome and Micro-Failure Tags — preserve those even if you drop everything else.

## Incident Summary Incident ID: AI-INC-YYYY-MM-DD-NNN Date: YYYY-MM-DD HH:MM UTC Severity: [CRITICAL | HIGH | MODERATE | LOW] Status: [Under Investigation | Root Cause Identified | Remediated | Closed] ## Classification Primary Syndrome: [Core Six syndrome name] Secondary Syndrome(s): [if applicable — use Earliest Decisive Deviation rule] Micro-Failure Tags: - [Primary tag from syndrome tag set] - [Supporting tags] ## Technical Details Model Version: [version identifier] Trace ID: [reference to session trace in logging system] Context Length: [tokens] Tool Calls Attempted: [count] Execution Time: [duration] ## Incident Description [Narrative: what the system claimed, what actually happened, evidence of discrepancy — include direct quotes from model output] ## User Impact Immediate: [direct consequence to user or downstream system] Scope: [number of affected users / interactions] Business Impact: [quantified if possible — cost, time, trust] ## Root Cause Analysis [Technical explanation of why the syndrome manifested — reference trace ID for evidence. Not just "the model hallucinated" — explain which mechanism in which syndrome drove the failure.] ## Remediation Plan Immediate (24h): [quick mitigations — guardrails, routing changes, etc.] Short-term (1 week): [targeted fixes — prompt tuning, validation gates, etc.] Long-term (1 month): [architectural improvements] Responsible Team: [team name] Target Resolution: [date] Follow-up Review: [date] ## Related Incidents - [Links to incidents with same syndrome or tag cluster for pattern analysis]

Plugging into existing workflows

Jira / Linear
Add “AI Syndrome” as a custom field with a dropdown of the six syndrome names. Add “Micro-Failure Tags” as a multi-select label field. Trace ID maps to the existing “link” or “external reference” field. The rest of the template maps naturally to description and comment fields.
ServiceNow / PagerDuty
Create a child record type “AI Behavior Incident” that inherits from your standard incident record. Add Syndrome and Tags as custom attributes. This keeps AI incidents visible in the same dashboards as other incidents while adding the classification layer.
Markdown / Notion / Confluence
Use the template verbatim as a page template. Create a database view filtered by Primary Syndrome — this is your syndrome incidence tracker. Tag pages with the syndrome name and use database filters to find all Capability Masking incidents across a time period.
Spreadsheet triage
Even a simple spreadsheet with columns for Date, Syndrome, Tags, Severity, and Status gives you the pattern-tracking capability. Add a dashboard tab with a COUNTIF per syndrome — you now have a syndrome incidence chart without any tooling investment.
Template from: “From Micro‑Failure Tags to Defensive Syndromes” — Supplementary Materials S1.2
Ernesto A. Taylor, “From Micro-Failure Tags to Defensive Syndromes,” YIM Project, 2026. Free to use and adapt with attribution (CC BY 4.0).
DOI    CC BY 4.0