Context Engineering: From RAG to Reliable Answers

Auspexi • Updated:

TL;DR: We add a context layer on top of RAG: hybrid retrieval (BM25+dense+reranker), context signals (retrieval_margin, support_docs, recency, trust, format), token budget packing, and evidence provenance. Signals feed the Risk Guard to fetch/clarify/abstain before generation.

Why context wins (a short story)

Two identical models answer the same question. One sees a clean, relevant brief with citations; the other sees a noisy dump. The first answers correctly and concisely; the second hallucinates. Same model, different context. Our job is to engineer the brief.

Hybrid Retrieval

Combine BM25 + dense embeddings and an optional cross‑encoder reranker
MMR‑style de‑duplication; boosts for recency and source trust

Signals and Policy

Signals: retrieval_margin, support_docs, recency_score, source_trust, format_health
Policy: if signals are thin → fetch more, clarify, or abstain; otherwise generate

Signals map into our pre‑generation Risk Guard. High margin + strong support lowers risk; low margin + weak support triggers fetch or abstain. This prevents wasted tokens and bad answers.

What good context looks like

Short and scoped: only spans that directly answer the question
Citation‑ready: include source IDs and hashable excerpts
Fresh enough: decay old content unless explicitly requested
Typed tool outputs: strict JSON or tables; no blob dumps

Budget packing

We score spans, then pack them into a token budget so the model sees the best 2–3 pages of truth instead of 30 pages of noise. The remainder is accessible on demand.

Evidence

We ship context_provenance.json inside evidence zips with per‑query signals and included sources for audit.

Measuring improvement

Retrieval: P@k, nDCG@k, citation‑hit rate
Answer quality: correctness@k, abstain rate at fixed coverage
Ops: token cost per accepted answer, latency p95

Context Engineering: From RAG to Reliable Answers

Why context wins (a short story)

Hybrid Retrieval

Signals and Policy

What good context looks like

Budget packing

Evidence

Measuring improvement

Get Started