Pre-Deployment Evaluation Checklist
A structured review sequence for AI systems before they go live — covering behavioral syndrome measurement, domain calibration, failure mode documentation, monitoring setup, user communication, and four-role sign-off. Designed to integrate with existing launch review processes, not replace them.
What standard launch reviews miss
Standard launch reviews tend to check whether a system works — does it respond, is it fast, does it crash. They do not check whether a system is honest about its limits, complete in its outputs, and transparent when it can’t help. These are behavioral properties, and they don’t show up in latency dashboards or error rate monitors.
The Core Six framework identifies six defensive behavior patterns — Capability Masking, Plausible Helpfulness, Hollow Completions, Built-Not-Connected, Responsibility Diffusion, Surface Compliance — that are measurable before deployment and that predict user harm and trust erosion at production scale. Measuring them before launch, not after, is the only reliable way to prevent them from becoming embedded in your user experience.
This checklist creates a documented, repeatable review gate that asks those questions systematically, assigns responsibility for each category, and requires explicit sign-off before release.
Seven categories, each with a purpose
Why four roles, not one
| Role | What they’re signing off on |
|---|---|
| Engineering Lead | Syndrome incidence rates are within threshold. Monitoring is live. Known failure modes are documented and understood. Remediation runbooks exist. |
| Product Manager | The failure modes documented in Category 3 are acceptable given the intended use case and user base. The user experience implications of observed syndromes are understood and acceptable at launch. User communication plan (Category 6) is approved. |
| Legal / Compliance | The deployment meets organizational AI use policy, applicable regulatory requirements, and any sector-specific standards. Residual behavioral risk is documented and accepted at the appropriate level. |
| AI Safety Committee If no dedicated committee exists: senior technical leadership + legal/compliance jointly fulfil this role. |
Behavioral alignment characteristics are acceptable for the deployment context. Surface Compliance and Capability Masking incidence levels are reviewed at the committee level. Escalation path for behavioral drift post-launch is approved. |
Copy and adapt
Replace all [X], [Name], and bracketed values before use. Treat unchecked items as launch blockers. Store completed checklists with deployment records for audit trail continuity.

