Metrics Methodology & Assumptions

Defining Sonate's platform KPIs (FAR-A, FAR-H, PFI, TIS), evaluation protocol, and statistical reporting conventions. Version: 2025.11.08-r1

📋 Implementation Status

Core trust scoring operational with 6-principle weighted algorithm (compliance_score, guilt_score). Advanced metrics (FAR-A, FAR-H, PFI, TIS) methodology defined, implementation in progress.

KPI Definitions

Clear, auditable, directionally correct metrics for AI trust infrastructure

FAR-A

FAR-A

False Apology Rate (Automated)

Definition

Unnecessary or policy-misfired apologies issued by the system (including automated guardrail apologies) when no breach or harm occurred. Lower is better.

Formula

FAR-A = (Apologies flagged as unnecessary / Total interactions) × 100

Target

< 0.50%

Notes

An apology is "unnecessary" if post-hoc adjudication finds no breach, harm, or policy trigger. Includes both model-initiated and guardrail-initiated apologies.

FAR-H

FAR-H

Factual Assertion Failure (Hallucination) Rate

Definition

Share of factual claims that fail verification against the ground-truth set or authoritative references. Lower is better.

Formula

FAR-H = (Factual claims failing verification / Total factual claims checked) × 100

Target

< 1.00% across general tasks; < 0.25% for regulated domains

Notes

Factual claims are extracted by the claim-miner; verification uses citation checks, structured tests, or oracle datasets.

PFI

PFI

Protected Fairness Index

Definition

Distributional parity across protected attributes (e.g., gender, ethnicity, age) for key outcomes (access, approval, refusal, latency, quality). Higher is better.

Formula

PFI = 1 - Σ(ws × Δs)

Where Δ = 1 - min(pi/max(pj)) for rates, clamped to [0,1]

Target

≥ 0.98 (enterprise), ≥ 0.95 (public)

Notes

Report attribute coverage, sample sizes per group, and uncertainty (see §3).

TIS

TIS

Trust Integrity Score

Definition

Composite integrity of execution and auditability. Higher is better.

Components & Weights

  • • Ledger Verification Pass Rate (0.35)
  • • Receipt Coverage (0.20)
  • • Guardrail Adherence (0.25)
  • • Audit Depth (0.20)

Formula

TIS = Σ(wk × mk)

Target

≥ 0.995

Notes

Publish per-component scores and weights; show receipt examples.

Evaluation Protocol

How we measure and validate our trust metrics

2.1 Sampling

  • • Stratified by Task Complexity: Simple, Standard, Advanced, Critical
  • • Domains: General, Code, Legal-adjacent, Health-adjacent, Finance-adjacent, Safety
  • • Providers: OpenAI, Anthropic, and any added vendor; cross-vendor balance ≥ 15% per provider per report

2.2 Claim Mining (for FAR-H)

  • • Extract factual claims using deterministic patterns + model-assisted span detection
  • • Deduplicate near-identical claims; map to verification methods (oracle set, regex matcher, API-free validator, human adjudication)

2.3 Adjudication

  • • Blind, two-pass where applicable: Reviewer A, Reviewer B, tie-break by Lead
  • • Inter-rater agreement (Gwet AC1) reported per domain

2.4 Guardrail Events (for FAR-A, TIS)

  • • Log trigger → action with policy ID, severity, user impact
  • • An apology is counted as unnecessary if: (a) no policy breach verified, and (b) user intent was permissible within ToS, and (c) the content was non-harmful on re-inspection

2.5 Complexity Normalization

  • • Report KPIs per stratum
  • • Provide Complexity-Normalized KPI:
Mnorm = Σ(αs × Ms)

Where αs is a fixed publishable weight vector (e.g., [0.25, 0.35, 0.30, 0.10]). Publish the weight vector and keep it stable across releases; log any changes in the change-log.

Statistical Reporting

Confidence intervals, rounding rules, and disclosure requirements

  • •Confidence Intervals: 95% Wilson for proportions (FAR-A, FAR-H, components), normal CI for indices (PFI, TIS) via bootstrap (≥ 1,000 resamples)
  • •Rounding Rules: proportions to 2 decimals (e.g., 0.47%), indices to 3 decimals (e.g., 0.997)
  • •Minimum N: Do not publish stratum estimates with (n < 200) interactions or (< 50) claims; mark as NA and include in aggregate via partial pooling only
  • •Disclosure: Always publish N total, per-stratum N, provider mix, and evaluation window

Trust Receipts & Ledger

User-verifiable cryptographic audit trails

Receipt Fields (Public)

  • • ts - Timestamp
  • • actor - Actor identifier
  • • intent - User intent
  • • inputs_hash - Hash of inputs
  • • outputs_hash - Hash of outputs
  • • policy_id - Policy identifier
  • • guardrail_action - Guardrail action taken
  • • ed25519_sig - Ed25519 signature
  • • ledger_height - Ledger position
  • • prev_hash - Previous entry hash

Verification (Client-side)

  1. Recompute inputs_hash/outputs_hash from the payload
  2. Verify Ed25519 signature against the public key
  3. Recompute hash-chain: prev_hash → hash(payload) consistency at ledger_height
  4. Check policy metadata (human-in-loop flags, escalation state)

See a Trust Receipt → Verify

Try our minimalist web verifier (pure JS) to paste a receipt and verify its signature and hash-chain integrity

Open In-Browser Verifier

Guardrails Matrix

Provider-agnostic policy mapping with receipt evidence

PolicyTrigger (examples)ActionEvidence in Receipt
SafetyToxicity/violence threshold crossedApologize + refusepolicy_id, scores, guardrail_action=refuse
PrivacyPII detectedMask + human-approvepolicy_id, pii_spans, human_flag=true
ComplianceRegulated-domain queryEscalate to humanpolicy_id, routing=human
ContinuityLong-running contextSummarize + confirmpolicy_id, continuity=true

Each row links to a sample Trust Receipt demonstrating the evidence captured.

View Sample Receipts

Example Formulas

Ready-to-render component calculations for TIS

Ledger Verification Pass Rate

LVP = (Receipts that pass sig + chain checks / Receipts sampled)

Receipt Coverage

RC = (Actions with receipts / Actions total)

Guardrail Adherence

GA = 1 - (Misfires (false positives + false negatives) / Total guardrail triggers)

Audit Depth

AD = (Artifacts present / Artifacts required)

TIS Composite

TIS = 0.35×LVP + 0.20×RC + 0.25×GA + 0.20×AD

Publishing Conventions

Standards for public reporting and versioning

  • •Evaluation Window: show ISO start/end timestamps in UTC
  • •Versioning: YYYY.MM.DD-rN (e.g., 2025.11.08-r1)
  • •Change-log: link to /changelog and include the diff of weights, sampling, or definitions

Change-log Template

## 2025.11.08-r1
- Initial public methodology.
- Weight vector for complexity normalization set to [0.25, 0.35, 0.30, 0.10].
- Introduced LVP/RC/GA/AD components for TIS.

Page Skeleton

For /metrics-methodology

# Metrics Methodology & Assumptions
> Last updated: 2025-11-08 (version 2025.11.08-r1)
## KPIs
- FAR-A (False Apology Rate)
- FAR-H (Hallucination Rate)
- PFI (Protected Fairness Index)
- TIS (Trust Integrity Score)
## Evaluation Protocol
- Sampling & stratification
- Claim mining & verification
- Adjudication & agreement
- Guardrail event handling
- Complexity normalization
## Statistical Reporting
- CIs, rounding, minimum N, disclosure
## Trust Receipts & Ledger
- Fields, verification steps, example JSON
## Guardrails Matrix
- Policy → action → evidence
## JSON-LD
- SoftwareApplication + TechArticle blocks
## Change-log
- Link to /changelog

JSON-LD Blocks

Structured data for SEO and discoverability

SoftwareApplication

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "Sonate — AI Trust Infrastructure",
  "applicationCategory": "SecurityApplication",
  "operatingSystem": "Web",
  "url": "https://yseeku.com/",
  "softwareVersion": "2025.11.08-r1",
  "offers": {
    "@type": "Offer",
    "price": "0",
    "priceCurrency": "USD"
  }
}
</script>

TechArticle

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Sonate Metrics Methodology & Assumptions",
  "datePublished": "2025-11-08",
  "dateModified": "2025-11-08",
  "author": {
    "@type": "Organization",
    "name": "Yseeku"
  },
  "about": ["Trust Receipts","Hash-chained Ledger","AI Guardrails","Fairness","Hallucination"]
}
</script>

Appendix — Terms

Trust Receipt

User-verifiable record of an action, signed (Ed25519), placed into an append-only, hash-chained ledger.

Hash-chained ledger

Each entry references the previous entry's hash; tamper-evident.

DID/VC

Decentralized Identifiers & Verifiable Credentials for actor identity and attestations.