ETH/BTC Suspicious Pattern Analysis

Submission · DN Institute Market Data Challenge · github.com/mkzung · Max Gorbuk Dataset: 845 trades · 188 OB snapshots · 2025-09-01 → 2025-09-03 UTC (71.81 h)

Six-detector forensic framework — five primary detectors (D1–D5) plus a peer-corroborated cross-check layer (D6). Primary findings: five mutually-consistent signals of automated, likely non-organic activity — extreme one-sided buy flow that does not move price, identical-clip burst signatures on both sides of the book, sub-second multi-trade clusters, an operator schedule asymmetric to US trading hours, and structural liquidity pathology (median spread 89.7 bps, 15% of trades printed outside the contemporaneous bid-ask). D6 adds three independent microstructure cross-checks (frozen-orderbook asymmetry 9.15×, Benford rejection at K-S=0.063, cron-style buy intervals with CV=0.69), all of which independently reject the null of organic two-sided market activity.

Contents
  1. Headline numbers
  2. Forensic interpretation
  3. D1 — Buy/Sell Imbalance
  4. D2 — Recurring Trade-Size Signatures
  5. D3 — Pump-and-Dump
  6. D4 — Liquidity Quality
  7. D5 — Burst / TOD / Anchors
  8. D6 — Microstructure cross-checks
  9. Cross-detector triangulation
  10. Reproducibility

Headline numbers

Buy / Sell size ratio
175,245 :1
99.9994% of size on buy side · count 5.76:1 · 720 buys vs 125 sells
Buy price impact
0.0 bps
aggressive buys do not move mid · sells move it −17.8 bps median
Self-trade clip
0.00026058 ETH
6× buys + 7× sells · 22h apart · in two ≤2-second bursts · P-null ≈ 0
Burst events ≥5/sec
9
max 12 trades in one second on 09-01 16:16:46 (sells)
Hours with 0 sells
9 / 24
UTC 2-3, 5-6, 9-10, 19, 22-23 · seller US-session-bound
Median spread
89.7 bps
6-18× wider than top-tier ETH/BTC venues (5-15 bps)
Outside-spread trades
127 / 845
15.0% printed outside contemporaneous bid-ask · 65 sells + 62 buys · sells over-rep 3.5×
Depth imbalance
+0.72
top-5 ask-heavy median · paradoxical with 99.9994% buy pressure
Doubling ladder p-value
0.0000
4 explicit 2× pairs in 18 flagged sizes · MC 2000 reps · null max=2

Forensic interpretation

The dataset is consistent with two distinct automated operators on this venue:

Three caveats apply: small sample, single venue, ambiguous unit semantics (the BUY/SELL size disparity may reflect inconsistent unit reporting). All listed in the REPORT § Limitations.

D1 Buy/Sell Imbalance

Sustained, time-localized deviation from balanced flow indicates one-sided pressure consistent with wash, momentum ignition, or single-actor dominance.

Method. 30-min buckets, rolling 8h z-score on log((buy+1)/(sell+1)) (count and size). Eps=1 chosen after audit revealed eps=1e-9 inflated z-score in 119/143 buckets containing zero sells (pure log(buy/eps) artifact).

Flagged buckets (all NEGATIVE z-scores — lulls in buy pressure)

Timestamp UTCBuy / Sell countz (count)z (size)
2025-09-01 13:305 / 5−3.10−2.18
2025-09-02 14:303 / 2−3.44−2.79
2025-09-03 07:006 / 10−3.64−2.79

The fact that even count parity (5:5) is a >3σ event tells you how extreme the persistent baseline is. The baseline itself is the anomaly; flagged buckets mark moments the bot pauses or other flow appears.

Price-impact decomposition (Δmid before/after)

Sidenmedian Δmid (bps)median price vs mid (bps)
buy5790.0+20.2
sell121−17.8−29.1
Aggressive buys that do not move price are inconsistent with genuine taker demand against a thin book — the canonical wash signature.
D1 imbalance
D1 — log buy/sell ratio (count = blue, size = red); rolling z-score with ±3σ markers

D2 Recurring Trade-Size Signatures

Bot/wash activity produces identical-size trades repeated many times. Under a continuous-distribution null with tick rounding, exact repeats are statistically rare for K>2.

Method (corrected from brief). The brief proposed shuffling the size array, which produces a degenerate null (shuffling preserves frequencies). Replaced with KDE-on-log(size): scipy.stats.gaussian_kde per side, 1000 replicates, p99 max-count threshold.

Robustness. Threshold = 2.0 stable across bandwidths {Scott, Silverman, 0.1, 0.3, 0.5, 1.0}. Cross-validated with uniform-on-log-range null (P(null ≥ observed) ≈ 0).

All 13 prints of 0.00026058 ETH (in time order)

Timestamp UTCSidePrice
2025-09-02 15:40:44sell × 60.039258
2025-09-02 15:40:45sell × 10.039258
2025-09-03 13:43:45buy × 60.039497
Identical tick-rounded clip on both sides of the book within a 22-hour window is the canonical wash-trading fingerprint.

Doubling ladder (Monte Carlo verified)

Among 18 flagged BUY sizes, 8 form 4 explicit 2× pairs: 62.64 ↔ 125.28 ↔ 250.56 (chain), 93.96 ↔ 187.92, 137.808 ↔ 275.616. Under MC null (random samples of 18 sizes from BUY distribution, 2000 reps), expected 2× pairs = 0.09 (max observed 2). P(null ≥ 4) = 0.0000.

D2 signatures
D2 — recurrence-count distribution per side (left); flagged-size occurrence timeline (right)

D3 Pump-and-Dump

Volume spike + ≥0.5% directional move + ≥50% reversal in 1h, bidirectional.

Findings. 0 candidates met all three criteria. Sensitivity test at vol_z>1.5, move>0.3%, reversal>30% surfaces 2 sub-threshold dump-recovery events (09-01 04:00, 09-03 07:00), both below industry-standard cutoffs.

Price moves in a 5.7% band (0.0388 → 0.0410) over the 72h window. The market does not pump despite 99.9994% one-sided size pressure — corroborates D1's "buys don't move price" reading.

D3 pump-dump
D3 — price over time; no qualifying spike-and-reverse candidates

D4 Liquidity Quality

Spread distribution + top-5 depth imbalance + trade-vs-spread cross-check via merge_asof.

Tolerance audit. OB inter-snapshot intervals: median 18.3 min, mean 22 min, max 2.4 hours. The brief's 5min tolerance matched only 22.5% of trades. Switched to 30 min (matches 87.8%, gives stable ~17% outside-spread rate consistent with longer tolerances).

Spread distribution (188 snapshots, bps)

minp25medianp75p95max
1.8354.5089.71131.67141.17145.58

The entire distribution sits 6-18× above the 5-15 bps baseline that liquid ETH/BTC venues quote. Either low-quality venue or systemic stale quoting.

Outside-spread breakdown

127 of 845 trades (15.0%) printed at prices outside the contemporaneous best bid-ask, of which 65 are sells and 62 are buys. Sells are 14.8% of all trades but 51% of outside-spread prints (3.5× over-representation). The most extreme deviations cluster in two coordinated SELL bursts:

Timestamp UTCnPriceDev from mid (bps)
2025-09-01 20:38:404 sells0.039900−33.8
2025-09-01 20:42:38–20:42:407 sells0.039809−56.5

Coordinated sub-bid sell prints in the same one to two seconds are consistent with hidden-iceberg fills, off-book reporting, or stale snapshot publication.

D4 liquidity
D4 — spread histogram (top-left), spread time-series (top-right), depth imbalance (bottom-left), trade-vs-spread cross-check (bottom-right, orange = outside)

D5 Burst Execution / Time-of-Day / Anchor Prices

Microstructure-level signatures expose operator behaviour: same-second bursts, hourly buy-share asymmetry, recurring exact prices.

5a. Burst seconds (≥5 trades in one second)

Timestamp UTCnSideUnique sizesNote
2025-09-01 16:16:4612sell12 (varied)largest single burst
2025-09-03 14:10:237buy7 (varied)
2025-09-03 13:43:457buy20.00026058 cluster (D2)
2025-09-02 15:25:457sell7 (varied)
2025-09-02 15:40:447sell20.00026058 cluster (D2)
2025-09-02 20:02:236sell6 (varied)
2025-09-03 07:28:176sell6 (varied)
2025-09-01 20:38:395sell5 (varied)aligns with D4 sub-bid cluster
2025-09-01 17:32:425sell5 (varied)

5b. Time-of-day asymmetry

Sells occur in only 15 of 24 UTC hours. Hours with zero sells: UTC 2, 3, 5, 6, 9, 10, 19, 22, 23. Activity concentrates in UTC 13–21 (US trading session ≈ 09:00–17:00 EST). Minimum buy-share is 50.0% at hour 20 (US market close). The seller(s) operate on a US schedule; buyer(s) operate continuously.

5c. Top anchor prices

Top recurring price 0.039870 appears 21 times (2.5% of all trades). Top 5 prices = 7.6%, top 20 = 20.4%. Concentration on specific tick-rounded levels without round-number bias suggests resting-limit-order anchoring.

D5 bursts/TOD/anchors
D5 — burst seconds (top), hourly buy/sell counts with US-session highlight (middle), top 15 anchor prices (bottom)

D6 Microstructure cross-checks (peer-corroborated)

Three additional forensic angles surfaced from peer review of prior submissions. All three reproduce on this dataset and corroborate the D1–D5 wash-trading interpretation.

6a. Frozen orderbook (one-sided staleness)

Byte-comparing each serialized snapshot against the previous, by side. 119 of 187 pairs (63.6%) have an identical bid; only 13 of 187 (7.0%) have an identical ask — a 9.15× asymmetry. Longest frozen-bid run: 18 consecutive snapshots (2025-09-02 20:05 UTC → 2025-09-03 02:11 UTC ≈ 6h 6m). An ask repriced ~9× more often than its paired bid is not consistent with two-sided market making — corroborates D4's persistent +0.72 median bid-side depth imbalance.

6b. Benford's Law conformity

K-S test on the first-digit distribution of trade sizes (n=845): K-S = 0.0626 > critical 0.0468 at α=0.05 → reject Benford-conformity. Digits 1+2 combined = 52.3% (vs 47.7% expected) driven by an excess of leading 2's; digits 7+8+9 are under-represented by ~6 pp. Fingerprint of size generation that prefers a narrow magnitude band rather than spanning organic decades.

6c. Inter-trade interval regularity (Sep-3 14:00+ UTC, buy side)

n = 95 buys; median gap = 318 s (≈ 5 min 18 s); IQR = 295.25 – 341.0 s (50% of gaps within ±23 s of median); coefficient of variation = 0.69. Tight IQR ≈ ±8% of median is consistent with a cron-style scheduler with light jitter, not human-decision-driven order flow.

XR Cross-detector triangulation

Strict per-bin co-occurrence is sparse: only one hourly bin has both D1 and D2 firing — 2025-09-02 14:00 UTC, which is 70 minutes before the SELL burst of seven 0.00026058 ETH prints at 15:40:44. Suggestive of "bot pauses, then runs the other side" but not load-bearing on a single bin.

The two strongest individual events stand on their own evidence: (1) 12-trade SELL burst at 09-01 16:16:46, and (2) the twin 0.00026058 ETH bursts at 09-02 15:40 / 09-03 13:43. Either alone is a near-impossible event under any IID null over 72h.

Co-occurrence timeline
Cross-detector flag timeline (1h grid) — red overlay = ≥2 detectors fire simultaneously

Repro Reproducibility

git clone <fork-url>
cd <fork>/mkzung-ethbtc-analysis
make all          # install + pytest + analyze + audit + open this dashboard

# Or step-by-step
pip install -r requirements.txt
python -m pytest tests/ -v        # 46 unit tests
python analyze.py --trades ../eth-btc-trades.csv \
                  --orderbooks ../eth-btc-orderbooks.csv
python audit.py
python calibration.py             # detectors-on-clean-data calibration

REPORT.md · findings.json · audit.txt · notebooks/01_analysis.ipynb

Re-running on a fresh clone produces byte-identical findings.json (KDE null seeded at 42; doubling-ladder MC seeded). Verified.