Contract SLA Terms
Service level agreement language that makes behavioral quality an enforceable obligation — with monitoring requirements, remediation timelines, and service credit schedules that vendors can actually commit to and buyers can actually audit.
Why uptime SLAs don’t protect you
An AI service that is 99.9% available but has a 12% Capability Masking rate is operationally worse for most use cases than a service with 99.5% availability and 0.8% Capability Masking. Uptime measures whether the system is reachable. It says nothing about whether the system is honest, complete, or compliant when it responds. Both metrics matter; most contracts only measure one.
Behavioral SLA terms create three things that uptime-only SLAs lack: a monitoring obligation (the vendor must measure and report, not just react to complaints), a remediation obligation (breach triggers a required response within a defined time window), and an accountability mechanism (persistent violations have contractual consequences, not just technical tickets).
Not all syndromes have the same SLA stakes
| Syndrome | SLA Priority | Why this tier |
|---|---|---|
| Capability Masking | Critical | A system that fabricates completed actions (“I submitted your form,” “I sent the email”) can cause catastrophic real-world misses. No lag on detection; 48-hour remediation plan required. |
| Plausible Helpfulness | High | Confident misinformation delivered at scale erodes trust and creates liability. 5-day remediation plan. |
| Hollow Completions | High | False task completion signals break downstream workflows. 5-day remediation plan. |
| Built-Not-Connected | Medium | Feature delivery failures are serious but typically discoverable in QA before user impact. 10-day remediation plan. |
| Responsibility Diffusion | Medium | Customer experience and support cost impact is significant but not immediately safety-critical. |
| Surface Compliance | High | A system that acknowledges constraints and then violates them can pass linguistic alignment audits while failing in practice — making it one of the harder failure modes to detect contractually. For safety-critical deployments, treat as Critical: Surface Compliance directly targets the gap between what a system agrees to and what it actually does. For standard deployments, High priority applies because constraint drift is harder to detect than factual errors. |
Copy and adapt
Insert into the Service Level Agreement section of your AI service contract. Ensure that “Syndrome Incidence Reports” and the evaluation methodology are defined earlier in the contract definitions section.

