Auspexi

Evidence Bundles & Testing: Trustworthy AI Without Exposing IP

By Gwylym Owen — 18–24 min read

Executive Summary

AethergenPlatform ships evidence bundles with every model and dataset release: signed metrics, configs, seeds, and hashes that enable buyers and auditors to reproduce claims—without revealing proprietary internals. This approach, builds trust while protecting IP, streamlining adoption for regulated domains like healthcare and finance.

What We Publish

What We Withhold

Testing Matrix

Signing & CI

Case Study: Healthcare Detector Potential

For healthcare-type customers, we can provide an evidence bundle with operating-point utility (e.g., 1% FPR: +18% cases found, CI ±2.1%), stability across NA, EU, and APAC (max delta 2.8%), and drift monitors triggered quarterly. Procurement can review the HTML dashboard offline, verify hashes, and establish a 6-month refresh cadence with a rollback SOP. This can reduce adoption time from months to weeks

Worked Example: Payments Mule-Ring Detector

For payments-type customers, we can deliver a mule-ring detector bundle with operating-point charts (e.g., 2,000 alerts/day: +11% true positives, stable weekends), parameter logs for ring size/reuse, and stability by product (max delta 1.9%). Buyers can reproduce metrics within CI bands using provided seeds, adopt with a quarterly refresh SLA, and integrate the SBOM into their supply-chain audit

FAQ

How is this different from a slide deck?

It’s reproducible. If you rerun with the same seeds/configs, you get the same metrics within confidence intervals. Unlike a slide deck’s static claims, this ties to verifiable artifacts

What if regulators ask for raw data?

We can provide synthetic corpora with measured fidelity/utility; for restricted cases, data can stay within the customer enclave. We also offer privacy probes to validate compliance

Can buyers request custom evidence?

Yes, within scope—e.g., additional segment stability or robustness tests. CI can regenerate the bundle with a new manifest ID

Glossary

Checklist

Contact Sales →

Sample Evidence Bundle Index

 index.json ├─ metrics/ │ ├─ utility@op.json │ ├─ stability_by_segment.json │ ├─ drift_early_warning.json │ └─ robustness_corruptions.json ├─ plots/ │ ├─ roc_pr_curves.html │ ├─ operating_point_tradeoffs.html │ └─ segment_bars.html ├─ configs/ │ ├─ evaluation.yaml │ └─ thresholds.yaml ├─ seeds/ │ └─ seeds.txt ├─ sbom.json └─ manifest.json 

Operating Point Examples

Audit Workflow

  1. Recompute metrics using provided configs and seeds from the ZIP.
  2. Check CI bands and confirm alignment with published values.
  3. Verify SBOM, artifact hashes, and `manifest-hash.txt`; review `.aethergen/change-log.json`.
  4. Record acceptance and attach evidence IDs to change-control.

Procurement Questionnaire

Template: Evidence Manifest

 { "version": "2025.01", "artifacts": { "metrics": ["metrics/utility@op.json", "metrics/stability_by_segment.json"], "plots": ["plots/roc_pr_curves.html"], "configs": ["configs/evaluation.yaml", "configs/thresholds.yaml"], "sbom": "sbom.json" }, "hashes": { "metrics/utility@op.json": "sha256:abc123.", "metrics/stability_by_segment.json": "sha256:def456." }, "seeds": "seeds/seeds.txt" } 

Appendix: Metric Definitions

Security & Privacy Notes

Closing (Comprehensive)

When evidence is part of the product, buyers don’t need persuasion; they need verification. AethergenPlatform turns every release into a verifiable unit of trust—signed, reproducible, and ready for audit