What changed
We replaced the older March 3 2-m16 public bundle with the current March 16 run built from the combined mixed-movement and auxiliary-rest session:
- Session:
combined_20260315_154309 - Run:
splitfix_mixedresttrain_auxholdout_rw1_seqeq_rflw05
This refresh follows a targeted audit of the mixed-session REST failure mode. The key change was a split policy that trains on all mixed-session REST events instead of withholding several of them during split and calibration. We kept the run with REST finger supervision and then froze the live default postprocess family after a broader ablation study.
Headline public results
- Action accuracy on the primary holdout: 85.73%
- Finger accuracy on non-REST windows: 86.33%
- Test windows: 2,102
- Non-REST test windows: 1,888
- REST TPR on the primary holdout: 52.34%
- REST F1 on the primary holdout: 0.596
Why this run matters
The main failure before this refresh was catastrophic collapse on some mixed-session REST events, where REST windows were being mapped into dominant non-REST pairs such as OPEN+THUMB. That collapse is no longer the defining behavior of the featured run.
The supplementary replay checks for the frozen live defaults are stronger than the raw holdout numbers:
- Mixed-session replay with live defaults: 94.59% action accuracy, 91.21% joint accuracy, REST F1 0.931
- Quiet-rest replay with live defaults: 90.37% action and joint accuracy, REST F1 0.949
Those replay numbers do not replace the primary holdout, but they matter for deployment because they reflect the exact postprocessed decoding family we intend to validate in shadow mode.
Live defaults now frozen
The frozen live candidate for the next shadow-mode session uses:
- Postprocess enabled
- EMA smoothing with a 5-window horizon
finger_mode=smooththreshold_action=0.05threshold_finger=0.20- Hysteresis disabled
actuation_min_prob=0.75,actuation_stability=3,actuation_cooldown_ms=250
This family came out of a 2,595-config ablation over thresholds, smoothing, hysteresis, adjacency, and finger-mode settings.
Before vs after the previous public 2-M16 bundle
| Metric | 2026-03-03 public run | 2026-03-16 public run | Δ |
|---|---|---|---|
| Test action accuracy | 82.95% | 85.73% | +2.78 pp |
| Test finger accuracy on non-REST windows | 87.37% | 86.33% | -1.04 pp |
| Test windows | 2,457 | 2,102 | -355 |
| Test non-REST windows | 1,908 | 1,888 | -20 |
| Train action accuracy | 90.21% | 89.09% | -1.12 pp |
| Train finger accuracy | 89.70% | 84.54% | -5.16 pp |
| Train avg loss | 0.8028 | 0.7688 | -0.0340 |
What to watch next
- The public holdout is better than the previous site bundle, but REST behavior is still harder on the dedicated holdout than on replay.
- The next milestone is shadow-mode live validation with synchronized prediction logs and video review, not another large offline sweep.
- The website now reflects the current bundle and frozen live defaults so the next update can focus on real live-session behavior rather than stale offline artifacts.