Financial Crime Labs: Safe Scenario Testing with Synthetic Graphs

By Gwylym Owen — 14–18 min read

Executive Summary

Fraud and anti-money laundering (AML) teams need to test ideas rapidly without risking customer data exposure. AethergenPlatform can generate synthetic transaction graphs that replicate network structure, seasonal flows, and edge-case behaviors while stripping direct identifiers. This enables repeatable experiments, procurement-grade evaluations, and evidence-backed deployments, streamlining financial crime prevention as of September 2025.

Graph Data Model: A Detailed Blueprint

The foundation of our synthetic graphs mirrors real-world financial networks, designed for safe experimentation:

Nodes: Customers, accounts, cards, merchants, devices, IPs, and branches, each with unique properties.
Edges: Transfers or payments with attributes like timestamp, amount, channel, MCC (Merchant Category Code), country, and authorization result.
Node/Edge Features: Tenure, KYC tier, merchant risk band, device fingerprint, and velocity bands to capture behavioral patterns.
Global Structure: Degree distributions, community structures, and diurnal/seasonal cycles to reflect network dynamics.

Typology Library: Parameterized Scenarios

AethergenPlatform offers a customizable library of financial crime typologies, each with adjustable parameters:

Structuring (Smurfing): Micro-deposits below reporting thresholds, with configurable window (e.g., 72 hours) and band parameters.
Mule Rings: Device/IP reuse across identities, with controls for ring size (e.g., 12 nodes) and churn rate (e.g., 0.35).
Card Testing: Rapid low-value authorizations across MCCs, with burst (e.g., 50 transactions) and cooldown (e.g., 10 minutes) settings.
Sanctions Evasion Motifs: Indirect paths via neutral hubs, adjustable by path length (e.g., 3-5 hops) and timing (e.g., 24-hour delays).
First-Party Abuse: Refund/chargeback loops with merchant collusion, parameterized by collusion rate (e.g., 0.2).

Evaluation That Sticks in Risk Committees

Our evaluations are designed to win over risk committees with actionable insights:

Operating Point Utility: Detection rates at fixed alert budgets (e.g., 2,000/day) with confidence intervals for reliability.
Segment Stability: Consistency across product lines, regions, and lifecycle stages, with quantified deltas.
Scenario Stress: Performance under parameter sweeps (e.g., varying mule ring size) to test resilience.
Cost Curves: Incremental cases per analyst-hour and escalation loads to optimize resource allocation.

Modeling Baselines: Diverse Approaches

We provide multiple baselines to anchor evaluations:

Rules: Transparent, governance-friendly baselines (e.g., velocity thresholds) for comparison.
Graph Features: Metrics like PageRank centrality, motif counts, and community scores to capture network patterns.
Learned Models: Graph Neural Networks (GNNs) or gradient-boosted trees with calibrated thresholds for advanced detection.

Evidence Bundle: Comprehensive Proof

Each evaluation produces a signed evidence bundle, tailored for procurement and audit:

Signed Lineage: Schema versions, recipe hashes, and environment fingerprints.
Metrics with CIs: Operating point utility, segment stability, and cost curves, all with confidence intervals.
Per-Typology Logs: Parameter settings and sensitivity results for each scenario.
Ablations: Insights into which features or typologies drive lift, with quantified impacts.
Limits and Rules: Intended use, known failure modes, and rollback conditions for safe deployment.

AethergenPlatform lets you ask hard questions safely: What happens to alert yield if we cut our budget 20%? Which motifs crumble first under drift? Answer with evidence, not anecdotes.

Graph Generation Config: A Practical Example

nodes:
  customers: 2000000
  accounts: 3000000
  merchants: 250000
features:
  customer.tenure: log-normal(mean=3.5, sd=1.2)
  merchant.risk_band: categorical([low, medium, high])
edges:
  transaction.amount: log-normal(mean=5.0, sd=1.5)
  transaction.interarrival: mixture(exponential(lambda=0.1), weight=0.7)
seasonality:
  weekly: true
  monthly: true
communities: stochastic_block_model(regions=[NA, EU, APAC], edges=0.05)
typologies:
  mule_ring: {size: 12, reuse: 0.35}
  structuring: {window: 72h, threshold: 1000}

Scenario Design: Structured Experimentation

Design experiments to test hypotheses effectively:

Select Typologies: Choose 3 (e.g., mule rings, structuring, card testing).
Set Budgets: Define 2 operating budgets (e.g., 1,500 and 2,000 alerts/day).
Define Success: Target cases/analyst-hour uplift vs. baseline (e.g., +20%).
Run Sweeps: Adjust parameters (e.g., ring size 10-15) and publish sensitivity curves.

Feature Catalog: Rich Insights

Leverage a broad set of features for detection:

Behavioral: Velocity bands, burstiness, and merchant diversity scores.
Graph Motifs: Stars, cycles, and bi-cliques to identify patterns.
Network Metrics: Community leakage scores and device/IP reuse ratios.

Thresholding Policy: Adaptive Control

Ensure thresholds align with operational needs:

Segmentation: Set thresholds by product and region for tailored detection.
Escalation Caps: Limit escalations to manage workload.
Elastic Budget: Auto-tune to keep alerts within ±10% of target, adjusting for weekends or peaks.

Case Study: Mule Ring Detection at a Mid-Size Bank

Scenario: A mid-size bank tested mule-ring detection with a 2,000 alerts/day budget.

Utility: Graph-aware baseline detected 31% more actionable rings than rules, with CI [28%, 34%].
Stability: 1.8% max delta across regions (NA, EU, APAC), within the 3% gate.
Cost: Reduced duplicate escalations by 18%, saving analyst hours.
Evidence: Bundle included sensitivity curves for ring size (10-15) and parameter logs, signed for audit.
Outcome: Procurement approved deployment in 14 days after offline review.

Case Study: Sanctions Evasion at a Global Bank

Scenario: A global bank tested sanctions evasion detection with a 1,500 alerts/day budget.

Utility: +25% detection lift vs. baseline, CI [22%, 28%].
Stability: 2.0% max delta across product lines, within gates.
Scenario Stress: Parameter sweep showed robustness to path length (3-5 hops).
Evidence: Signed bundle with ablation showing hub centrality’s impact.
Outcome: Adopted after a 16-day review cycle.

Governance and CI Integration

Our process ensures safety and traceability:

CI Generation: GitHub Actions builds evidence bundles, signing with `KeyManagementService` and uploading artifacts.
Fail-Closed Gates: Releases halt if utility or stability fails thresholds.
Change-Control: `.aethergen/change-log.json` tracks updates, signed per release.

FAQ

Will synthetic hurt performance on real traffic?

We measure relative rankings stability and can de-risk via shadow evaluation before promotion. Synthetic graphs are for safe iteration and procurement evidence, not production replacement.

Can we export the graphs?

Yes—export as Parquet or Delta with documented schemas; the evidence bundle includes seeds for regeneration.

How do we validate results?

Use the signed manifest and hashes to verify integrity; re-run with seeds and configs for confirmation.

Glossary

Operating Budget: Fixed number of alerts per time window (e.g., 2,000/day).
Motif: Small recurring subgraph structure (e.g., cycles, stars).
Shadow Evaluation: Parallel scoring to test models without affecting production.

Contact Sales →