Auspexi
Technology

On‑Device AI: Hybrid Routing & SLOs

TL;DR Hybrid routing prefers CPU/NPU for gates and re‑rankers, with measurable SLOs: fallback rate, battery budget, and thermal guard. Evidence includes sampled telemetry summaries; raw data stays on device.

What we shipped

Why on‑device now

Two forces drive on‑device AI adoption: (1) cost and latency benefits from running narrow tasks locally, and (2) stronger privacy and availability guarantees for regulated or intermittent environments. A hybrid policy lets you keep the best of both: CPU/NPU for cheap/fast gates and re‑rankers, and cloud for heavy or unsupported workloads.

Quick start

ondevice: { enabled: true, max_fallback_rate: 0.15, max_battery_mwh: 2.5, max_temp_delta_c: 6 }

During evaluation, provide metrics: fallback_rate, energy_mwh, temp_delta_c. Thresholds can be tuned per device class.

Routing policy

Modes: device‑only, hybrid (default), cloud‑only. Prefer NPU where supported; fallback chain is NPU → GPU → CPU. Exceeding SLOs promotes to cloud or reduces coverage.

SLOs in plain language (with examples)

Calibration and tuning

  1. Collect a small set of local traces (latency, energy, thermal, fallback reasons).
  2. Set initial SLOs per device class (e.g., handset vs gateway).
  3. Run a short shadow window; measure observed fallback and energy.
  4. Iteratively tighten or relax SLOs until they hold with buffer across segments.

Telemetry boundaries

We log local, sampled summaries (e.g., P95 latency, energy estimate, fallback reason); raw content stays on the device. Optional DP summaries can sync to the platform for fleet‑level analysis.

Evidence & privacy

Evidence bundles include device‑class tags and telemetry summaries. Raw data and residuals remain on device; optional DP summaries may sync.

Integration checklist

Troubleshooting

See also: Resources · FAQ – On‑Device AI & SLOs · Stability Demo · On‑Device AI Playbook