RFP Requirements Specification

Vendor behavioral requirements with measurable syndrome thresholds. Replace vague quality language in procurement documents with specific, auditable standards that vendors can actually respond to — and that you can actually enforce.

Who uses this

Procurement, legal, IT governance, and AI safety teams writing requirements for AI system vendors. Anyone who writes or reviews RFPs that include AI components.

What changes

Vague clauses like “the system shall be accurate and reliable” become specific: “Capability Masking incidence must be <[X]% on the customer-specific test suite.”

Before inserting

All threshold values are structural placeholders. Derive your organization’s specific thresholds from your own evaluation data before inserting into any binding document.

Background

Why “ensure AI accuracy” is not a requirement

Standard procurement language for AI systems borrows from software quality requirements: “the system shall be accurate,” “responses shall be helpful and relevant,” “the AI shall perform reliably.” These clauses have no measurable standard, no test methodology, and no enforcement mechanism. A vendor can pass any compliance audit against them because they define nothing.

The six Core Six syndromes give procurement teams six specific, measurable behavioral failure modes to specify against. A requirement like “Capability Masking incidence must not exceed [X]% on the customer-specific evaluation suite” is auditable: you can run the test, count the failures, and either the vendor passes or they don’t.

Vague (unenforceable)

“The AI system shall be accurate, reliable, and helpful. Responses shall be relevant to user intent. The vendor shall ensure the system performs in accordance with reasonable user expectations.”

Specific (enforceable)

“Capability Masking incidence: <[X]% (mandatory). Plausible Helpfulness incidence: <[X]% (mandatory). Vendor must provide evaluation traces demonstrating compliance on the customer-specific test suite.”

Threshold Logic

Mandatory vs. target thresholds

The template distinguishes two tiers of requirement: Mandatory (must meet all — failure means rejection) and Target (meet at least 2 of 3 — allows trade-offs within a band).

Mandatory thresholds apply to the three syndromes with near-zero tolerance: Capability Masking (the system must not fabricate its own actions), Plausible Helpfulness (the system must not confabulate with confidence), and Hollow Completions (the system must not falsely declare tasks complete). These are the syndromes where any meaningful incidence creates direct user harm or institutional liability.

Target thresholds apply to the three syndromes where some incidence is operationally tolerable and domain-specific: Built-Not-Connected, Responsibility Diffusion, and Surface Compliance. A software development context may accept higher BNC incidence because developers can catch integration failures; a customer service context may accept higher SC incidence for some instruction types but not others. Target thresholds create a zone of acceptable performance rather than a binary pass/fail.

Exception — safety-critical deployments: Surface Compliance should be promoted to Mandatory for any system where constraint violations carry safety, legal, or compliance consequences (healthcare, financial, legal contexts). In those cases, move SC to the Mandatory block and apply near-zero tolerance.

Threshold calibration required: The [X]% values in this template are structural placeholders, not empirical data. Before inserting any threshold into a binding contract or RFP: (1) determine your deployment tier and risk context using the Domain Thresholds matrix, (2) apply domain-appropriate multipliers for your use case, and (3) run a pilot evaluation on a representative sample to establish your baseline. Using placeholder values in contracts creates unenforceable requirements that neither party can verify.

Template

Copy and adapt

Insert into the technical requirements section of your RFP. Pair with the Vendor Deliverables section — requirements without a specified evidence format are impossible to evaluate.

## AI System Behavioral Requirements The proposed AI system must undergo Core Six Defensive Behavior Syndrome evaluation and meet the following behavioral requirements. Evaluation methodology: Core Six Syndrome Calibration (Taylor, YIM Project, 2026. doi.org/10.5281/zenodo.19423182) --- Mandatory Thresholds (must meet ALL) --- Capability Masking: <[X]% Near-zero tolerance. The system must not fabricate evidence of actions it did not perform (phantom deliverables, verification hallucinations, impossible action claims). Plausible Helpfulness: <[X]% The system must not generate confident, well-formed responses that are materially incorrect or ungrounded. Hollow Completions: <[X]% The system must not declare tasks complete when they have not been functionally verified. --- Target Thresholds (meet at least 2 of 3) --- Built-Not-Connected: <[X]% Integration gap tolerance appropriate to deployment context. Responsibility Diffusion: <[X]% The system should locate errors in its own output before attributing causation to user environment. Surface Compliance: <[X]% The system should follow explicit constraints in generation, not just in acknowledgment. --- Vendor Deliverables --- Required with proposal response: 1. Complete syndrome evaluation report (all six syndromes) covering vendor's standard benchmark results Required within 30 days of contract award: 2. Syndrome evaluation report on customer-specific test suite (customer will provide [N] representative queries) 3. Domain-specific syndrome profiles for our primary use cases: [list your use cases] 4. Version comparison: this model vs. previous version 5. Documented mitigation strategies for any syndrome exceeding mandatory thresholds 6. Proposed monitoring plan for ongoing syndrome tracking --- Evaluation Dataset Requirements --- Vendor must evaluate using BOTH: (a) Vendor's standard benchmark suite (b) Customer-specific test suite: [N] queries representing real production use cases Both evaluations must use identical syndrome classification criteria. Results from (a) alone are not sufficient for acceptance. --- Acceptance Criteria --- The system passes acceptance testing when: - All mandatory thresholds are met on BOTH test suites - At least 2 of 3 target thresholds are met on the customer-specific test suite - Vendor has provided all required deliverables Failure on the customer-specific suite is grounds for rejection even if vendor benchmark results meet requirements.

Template from: “From Micro‑Failure Tags to Defensive Syndromes” — Supplementary Materials S1.4

Ernesto A. Taylor, “From Micro-Failure Tags to Defensive Syndromes,” YIM Project, 2026. Free to use and adapt with attribution (CC BY 4.0).

← Back to Supplementary Materials

RFP Requirements Specification

Why “ensure AI accuracy” is not a requirement

Mandatory vs. target thresholds

Copy and adapt

research@yeahitsme.com