Core Six Matrix Explorer

Core Six Cross-Reference

Each syndrome mapped to its Group B micro-failure tags, primary user impact language, and remediation targets. From Section 5.1 of the main paper.

Syndrome	Key Micro-Failure Tags (Group B)	User Impact Language (Group A)	Remediation Target
Plausible Helpfulness	Hallucination, Over-helpfulness, Misleading Explanations, Context Pollution, Confidence Inflation, Unverified Referencing	“Smooth but useless,” “Helpful liar,” “Confident fabrication”	Refusal thresholds, verification gates, confidence calibration
Built‑Not‑Connected	Invisible Imports, Silent Activation Failures, Unbound Commands, Handler Registration Gaps, Event Listener Voids, Context Wiring Failures, Integration Surface Omissions	“Phantom features,” “Isolated components,” “Code that never runs”	Entry-point tracing, import verification, handler registration checks
Hollow Completions	Premature Done Flags, False Finality, Non-Executed Tests, Prerequisite Blindness, Missing Upstream Dependencies, Minimalist Completion	“Fake finality,” “Broken at first touch,” “Painted over the hole”	Completion criteria verification, staged validation, FRFR metrics
Capability Masking	Impossible Action Claims, Persistent State Hallucination, Verification Hallucinations, Tool Invocation Errors Hidden by Narration, Memory Poisoning, Phantom Deliverables	“Fake verification,” “Lying about homework,” “Confidence trick”	Tool-Action Consistency checks, verification language gating, capability boundary enforcement
Responsibility Diffusion	Blame-Shifting, External Culprit Narratives, Environmental Attribution, Input Validation Deflection, Defensive Apologies, XPIA Vulnerability	“Defensive,” “Blames the user,” “Always has an excuse”	Self-correction loops, error attribution reordering, self-check incentives
Surface Compliance	Instruction-Execution Decoupling, Training-Reflex Override, Cosmetic Alignment, Safety Theater, Agreement Without Integration, Reward Hacking, Zombie Processes, Same-Response Violation	“Head-nodding,” “Fake agreement,” “Says yes, does no”	Constraint enforcement architecture, instruction-following coupling, behavioral auditing

Matrix 2 — Syndrome Severity by Risk Tier

Illustrative threshold bands mapped by deployment risk tier. Tier 1 = life-safety, critical infrastructure. Tier 4 = low-stakes, supervised use. Organizations must derive their own thresholds using the S2.1 calibration methodology.

Calibration notice: Percentage thresholds are starting points, not empirically validated safe-harbor limits. The important property is the relative ordering (Tier 1 must always be stricter than Tier 4) and the directional intention. Derive your own values from operational data.

Syndrome	Tier 1 — Critical	Tier 2 — High	Tier 3 — Moderate	Tier 4 — Low
Plausible Helpfulness	Near-Zero (<1%)	Strictest (<3%)	Strict (<5%)	Moderate (<8%)
Capability Masking	Near-Zero (<1%)	Strict (<5%)	Moderate (<8%)	Standard (<12%)
Built-Not-Connected	Strictest (<3%)	Strict (<5%)	Moderate (<8%)	Standard (<12%)
Hollow Completions	Strictest (<3%)	Strict (<5%)	Moderate (<8%)	Standard (<12%)
Responsibility Diffusion	Strict (<5%)	Moderate (<8%)	Standard (<12%)	Relaxed (<15%)
Surface Compliance	Near-Zero (<1%)	Strictest (<3%)	Strict (<5%)	Moderate (<8%)

Matrix 3 — Domain-Specific Threshold Adjustments

Illustrative calibration ranges by sector. Cross-reference with Matrix 2 tier thresholds; apply the stricter value where they overlap.

Critical notice: No numeric threshold in S2 is derived from empirical measurement or published study. Every value is a structural placeholder from risk-reasoning only. Do not cite any S2 value as a research-derived standard.

Healthcare AI Systems

Syndrome	Recommended Max	Calibration Approach
Plausible Helpfulness	<0.5%	Near-zero tolerance. Cross-reference pharmaceutical databases and clinical guidelines.
Capability Masking	<0.5%	Verify all claimed capabilities against actual tool bindings and database connections.
Built-Not-Connected	<2%	Audit all integration claims; verify medication databases, lab systems, imaging interfaces.
Hollow Completions	<2%	All safety-critical prerequisites must be explicitly enumerated and verifiable.
Responsibility Diffusion	<3%	Clear provenance chains for all clinical recommendations.
Surface Compliance	<1%	Full compliance verification against HIPAA, FDA, and institutional review requirements.

Legal AI Systems

Syndrome	Recommended Max	Calibration Approach
Plausible Helpfulness	<1%	Cross-reference all legal citations against verified legal databases.
Capability Masking	<1%	Verify access to claimed case law databases, statute repositories, regulatory databases.
Built-Not-Connected	<3%	Audit all document management integrations, court filing system interfaces.
Hollow Completions	<2%	Require explicit jurisdictional analysis and conflict-of-law identification.
Responsibility Diffusion	<3%	Every legal conclusion must trace to specific authorities and reasoning chain.
Surface Compliance	<1%	Full ethics-rule compliance verification, privilege checks, conflict-of-interest screening.

Financial AI Systems

Syndrome	Recommended Max	Calibration Approach
Plausible Helpfulness	<1%	Verify all numerical claims against audited data sources.
Capability Masking	<2%	Verify actual connections to market data feeds, regulatory databases, account systems.
Built-Not-Connected	<3%	Audit all trading system integrations, compliance database connections.
Hollow Completions	<2%	Require explicit risk quantification, regulatory citation, and assumption disclosure.
Responsibility Diffusion	<5%	Clear attribution of every risk assessment and recommendation component.
Surface Compliance	<1%	Full SOX, Basel III/IV, AML/KYC compliance verification.

Software Development AI Systems

Syndrome	Recommended Max	Calibration Approach
Plausible Helpfulness	<5%	Testing coverage provides natural correction; focus on security-critical code paths.
Capability Masking	<3%	Verify all claimed tool integrations, API access, system permissions.
Built-Not-Connected	<5%	All generated code must include integration tests and dependency verification.
Hollow Completions	<5%	Require working build artifacts, not pseudocode or partial implementations.
Responsibility Diffusion	<8%	Acceptable higher tolerance given collaborative development norms.
Surface Compliance	<3%	Verify license compliance, security standards adherence, accessibility requirements.

Education AI Systems

Syndrome	Recommended Max	Calibration Approach
Plausible Helpfulness	<5%	Higher tolerance; learning from errors can be pedagogically valuable.
Capability Masking	<5%	Monitor for misleading capability claims that could affect learning outcomes.
Built-Not-Connected	<8%	Verify curriculum integration and assessment system connections.
Hollow Completions	<8%	Focus on conceptual accuracy over procedural completeness.
Responsibility Diffusion	<10%	Acceptable given supervised learning environment.
Surface Compliance	<5%	Verify FERPA compliance, accessibility standards, assessment validity.

Matrix 4 — Deployment Context Thresholds

Multipliers that adjust thresholds based on operational environment. Apply the stricter value when this matrix and Matrix 3 overlap. A multiplier below 1.0 means tighten thresholds; above 1.0 means relax them.

Deployment Context	Multiplier	Effect on Thresholds	Rationale
Autonomous decision-making	0.5× (halve)	Stricter	No human in the loop to catch failures
Safety-critical real-time systems	0.3× (tighten 70%)	Strictest	No time for human correction; failures have immediate consequences
Public-facing consumer applications	0.7× (tighten 30%)	Tighter	Naive users cannot identify failure modes
Human-in-the-loop advisory	1.0× (baseline)	No change	Human review provides correction opportunity
Internal tools with expert users	1.5× (relax 50%)	Relaxed	Expert users can identify and compensate for failures
Batch processing with review	1.5× (relax 50%)	Relaxed	Review pipeline catches most failures

Matrix 5 — Syndrome Interaction Risk Multipliers

Syndromes rarely occur in isolation. When two or more co-occur, combined impact may exceed the sum of their individual severities. Use the calculator below to estimate compound effective risk.

Syndrome Pair	Multiplier	Compound Risk Description
Capability Masking + Built-Not-Connected	3×	System claims it performed an action through a tool that doesn’t actually connect to the execution path
Plausible Helpfulness + Hollow Completions	2.5×	Output reads well but lacks essential substance; most dangerous to non-expert reviewers
Capability Masking + Plausible Helpfulness	2.5×	False capability claim wrapped in convincing reasoning; hardest to detect
Surface Compliance + Responsibility Diffusion	2×	System appears compliant while distributing accountability so no entity is responsible
Hollow Completions + Responsibility Diffusion	2×	Incomplete work products with no clear owner for completion

Compound Risk Calculator

Enter observed incidence rates for two syndromes to calculate their combined effective risk. Individual syndrome percentages sum naively; this calculator applies the interaction multiplier if a known pairing exists.

Syndrome A

Incidence A (%)

Syndrome B

Incidence B (%)

Effective combined risk

24%

3% + 5% = 8% naive sum × 3× multiplier (CM+BNC pairing)

Known high-risk pairing — 3× multiplier applied

Matrix 6 — Remediation Priority

Each syndrome mapped to typical remediation difficulty, timeline, and suggested priority. Priority reflects both severity and tractability — some dangerous syndromes are also the most amenable to systematic intervention.

Built‑Not‑Connected

Difficulty: Low–Medium

1–3 months

▲ Highest (quick win)

Hollow Completions

Difficulty: Medium

2–4 months

▲ High

Capability Masking

Difficulty: Medium–High

3–6 months

▲ High

Surface Compliance

Difficulty: Medium

2–4 months

▲ High

Plausible Helpfulness

Difficulty: High

4–8 months

Medium–High

Responsibility Diffusion

Difficulty: Highest (systemic + cultural)

6–12 months

Medium

Matrix 7 — Continuous Monitoring Trigger Levels

Three-tier alert framework for distinguishing routine fluctuation from genuine degradation. All trigger levels are illustrative. Recommended cadence: weekly (Tier 1–2), bi-weekly (Tier 3), monthly (Tier 4). Increase to daily during model transitions or system updates.

Green

All syndromes at or below established tier thresholds. System behavior within expected parameters.

Continue standard monitoring cadence. No intervention required.

Yellow

Any syndrome at 80–100% of its threshold. System approaching the intervention boundary.

Increase monitoring frequency. Prepare remediation plan. Brief stakeholders.

Red

Any syndrome exceeds threshold or two or more syndromes are simultaneously at Yellow.

Immediate investigation. Consider deployment pause. Escalate per incident protocol. Apply Matrix 6 remediation priority.

Auto-reject conditions (regardless of thresholds): Capability Masking that claims impossible actions (no tool binding exists) — Systematic Hollow Completions with safety-critical prerequisites missing — Sustained reassurance loops (>5 cycles) that never resolve to honest acknowledgment — Any syndrome showing increasing incidence over consecutive measurement periods.

Core Six Cross-Reference

Matrix 2 — Syndrome Severity by Risk Tier

Matrix 3 — Domain-Specific Threshold Adjustments

Matrix 4 — Deployment Context Thresholds

Matrix 5 — Syndrome Interaction Risk Multipliers

Matrix 6 — Remediation Priority

Matrix 7 — Continuous Monitoring Trigger Levels

research@yeahitsme.com