2026-03-16

2-M16 split-fix and live-defaults milestone

Refreshed the 2-M16 public bundle with the split-fixed run, cleaner live defaults, and a much more stable mixed-rest profile.

Historical note: archived update posts preserve the figures published at that time. For the current verified run bundles, use the results page.

What changed

We replaced the older March 3 2-m16 public bundle with the current March 16 run built from the combined mixed-movement and auxiliary-rest session:

  • Session: combined_20260315_154309
  • Run: splitfix_mixedresttrain_auxholdout_rw1_seqeq_rflw05

This refresh follows a targeted audit of the mixed-session REST failure mode. The key change was a split policy that trains on all mixed-session REST events instead of withholding several of them during split and calibration. We kept the run with REST finger supervision and then froze the live default postprocess family after a broader ablation study.

Headline public results

  • Action accuracy on the primary holdout: 85.73%
  • Finger accuracy on non-REST windows: 86.33%
  • Test windows: 2,102
  • Non-REST test windows: 1,888
  • REST TPR on the primary holdout: 52.34%
  • REST F1 on the primary holdout: 0.596

Why this run matters

The main failure before this refresh was catastrophic collapse on some mixed-session REST events, where REST windows were being mapped into dominant non-REST pairs such as OPEN+THUMB. That collapse is no longer the defining behavior of the featured run.

The supplementary replay checks for the frozen live defaults are stronger than the raw holdout numbers:

  • Mixed-session replay with live defaults: 94.59% action accuracy, 91.21% joint accuracy, REST F1 0.931
  • Quiet-rest replay with live defaults: 90.37% action and joint accuracy, REST F1 0.949

Those replay numbers do not replace the primary holdout, but they matter for deployment because they reflect the exact postprocessed decoding family we intend to validate in shadow mode.

Live defaults now frozen

The frozen live candidate for the next shadow-mode session uses:

  • Postprocess enabled
  • EMA smoothing with a 5-window horizon
  • finger_mode=smooth
  • threshold_action=0.05
  • threshold_finger=0.20
  • Hysteresis disabled
  • actuation_min_prob=0.75, actuation_stability=3, actuation_cooldown_ms=250

This family came out of a 2,595-config ablation over thresholds, smoothing, hysteresis, adjacency, and finger-mode settings.

Before vs after the previous public 2-M16 bundle

Metric2026-03-03 public run2026-03-16 public runΔ
Test action accuracy82.95%85.73%+2.78 pp
Test finger accuracy on non-REST windows87.37%86.33%-1.04 pp
Test windows2,4572,102-355
Test non-REST windows1,9081,888-20
Train action accuracy90.21%89.09%-1.12 pp
Train finger accuracy89.70%84.54%-5.16 pp
Train avg loss0.80280.7688-0.0340

What to watch next

  • The public holdout is better than the previous site bundle, but REST behavior is still harder on the dedicated holdout than on replay.
  • The next milestone is shadow-mode live validation with synchronized prediction logs and video review, not another large offline sweep.
  • The website now reflects the current bundle and frozen live defaults so the next update can focus on real live-session behavior rather than stale offline artifacts.

Links