What changed
We updated the 2-M16 results bundle using the combined session built on 2026-03-02 and a new run generated on 2026-03-03. The public site now reflects the latest metrics, figures, report, and paper-page status placeholder.
Headline metrics (2-M16)
- Action accuracy (test): 82.95%
- Finger accuracy on non-REST windows (test): 87.37%
- Test windows: 2,457
- Non-REST test windows: 1,908
Before vs after (Feb 26 vs Mar 3)
| Metric | 2026-02-26 | 2026-03-03 | Δ |
|---|---|---|---|
| Test action accuracy | 85.88% | 82.95% | -2.93 pp |
| Test finger accuracy on non-REST windows | 87.24% | 87.37% | +0.13 pp |
| Test windows | 2,026 | 2,457 | +431 |
| Test non-REST windows | 1,857 | 1,908 | +51 |
| Train action accuracy | 88.01% | 90.21% | +2.20 pp |
| Train finger accuracy | 88.30% | 89.70% | +1.40 pp |
| Train avg loss | 0.8820 | 0.8028 | -0.0792 |
Train/test context:
- Action accuracy (train): 90.21%
- Finger accuracy (train): 89.70%
- Train average loss: 0.8028
- Action train-test gap: 7.27 pp
- Finger train-test gap: 2.33 pp
Notes
- The action accuracy dropped relative to the February 26 run, while finger accuracy on non-REST windows held steady.
- The generalization gap for action accuracy is larger in this combined-session run, so we should keep tightening split controls and validating leakage safeguards.