Auspexi

Always‑on Evaluators: Cheap, Continuous Risk Scoring for Reliable AI

Summary: Compact small‑language‑model evaluators run per‑turn to score metrics like toxicity, PII, prompt injection, bias, and jailbreaking. Scores feed a pre‑generation Risk Guard that decides whether to generate, fetch more context, reroute, or abstain (fail‑closed). Result: fewer tokens and calls, lower latency and energy, and auditable behavior.

Why evaluators?

Generative systems fail in the gaps: thin context, adversarial prompts, or unclear objectives. Instead of hoping, we measure. A set of compact evaluators score each turn and enforce policy before we generate. This is inexpensive enough to run continuously on CPU/NPU.

What we score

Prompt injection, PII leakage, tool‑error patterns
Toxicity, bias, jailbreak attempts
Optional goal‑completion heuristics for tool‑using agents

How it integrates

Signals → Risk Guard: evaluator scores are inputs to our pre‑generation gate alongside retrieval support, margin, and entropy.
Actions: below threshold → generate; above → fetch more context; well above → abstain or reroute.
Evidence: we log evaluation_events.json and an aggregated evaluation_summary.json into the signed evidence bundle.
Runtime: CPU/NPU‑first via our runner; GPU optional. Cheap metrics parallelised; heavier ones sampled.

Controls

Per‑metric thresholds with a fail‑closed policy
Segment‑aware calibration (e.g., by product, region)
Live toggle in the Stability Demo: “Always‑on Evaluators (20 metrics)”

Impact

Token savings via early abstention and rerouting
Latency reduction by avoiding unnecessary large‑model calls
Energy and cost reduction; on‑device friendly
Auditability through signed evidence