Auspexi

Evidence‑Efficient AI: 73% Token and 73% Latency Savings (NYC Taxi Demo)

Auspexi • Updated:
TL;DR: We reduced tokens and latency by 73% and avoided all large‑model calls on a realistic task using open NYC Taxi anchors. In plain English: we now fetch only what we need, think with smaller pieces, and choose when to answer or ask for more context. This is faster, cheaper, and easier to trust.

What this means for you

What this means for Aethergen

We can operate with service targets for reliability and speed, and we can prove it. The platform composes small specialist pieces with clear guardrails and exports evidence so teams can review decisions without exposing raw data.

73%
Tokens reduced
73%
Latency improvement
100%
Large‑model calls avoided
80–98%
Storage saved (typical)

How we did it (simple version)

What we measured

Using the NYC Taxi Open Anchor Pack, we ran 40 queries. Factual questions used a tiny context and answered immediately. Broader summary prompts used a compact context and asked for more only when needed. We logged tokens, latency, routing actions, and exported an evidence summary.

Latest results (1,000,000,000 queries)

Baseline vs Composed — Tokens
10k 100k 1M Tokens (millions) Baseline Composed 1.02 0.29 10.2 2.90 102 29.0
Baseline vs Composed — Latency
10k 100k 1M Latency (millions of ms) Baseline Composed 9.15 2.51 91.5 25.1 915 251

How storage falls (and why it matters)

Result: it’s common to replace multi‑terabyte corpora with a working set that’s in the 80–98% smaller range while preserving the ability to answer the same questions—with stronger provenance.

Are we ready for commercial success

We are ready to run pilots and production trials with clear guardrails. The approach is reliable by design: it is faster and cheaper while preserving the option to abstain. It ships with evidence so procurement, risk, and engineering can verify how results were produced.

What this is not

What you can do next

Questions or pilot requests: /contact