2026-03-03

2-M16 combined-session refresh

Published a new 2-M16 combined-session run with updated figures, report, and paper-page status artifacts.

Historical note: archived update posts preserve the figures published at that time. For the current verified run bundles, use the results page.

What changed

We updated the 2-M16 results bundle using the combined session built on 2026-03-02 and a new run generated on 2026-03-03. The public site now reflects the latest metrics, figures, report, and paper-page status placeholder.

Headline metrics (2-M16)

  • Action accuracy (test): 82.95%
  • Finger accuracy on non-REST windows (test): 87.37%
  • Test windows: 2,457
  • Non-REST test windows: 1,908

Before vs after (Feb 26 vs Mar 3)

Metric2026-02-262026-03-03Δ
Test action accuracy85.88%82.95%-2.93 pp
Test finger accuracy on non-REST windows87.24%87.37%+0.13 pp
Test windows2,0262,457+431
Test non-REST windows1,8571,908+51
Train action accuracy88.01%90.21%+2.20 pp
Train finger accuracy88.30%89.70%+1.40 pp
Train avg loss0.88200.8028-0.0792

Train/test context:

  • Action accuracy (train): 90.21%
  • Finger accuracy (train): 89.70%
  • Train average loss: 0.8028
  • Action train-test gap: 7.27 pp
  • Finger train-test gap: 2.33 pp

Notes

  • The action accuracy dropped relative to the February 26 run, while finger accuracy on non-REST windows held steady.
  • The generalization gap for action accuracy is larger in this combined-session run, so we should keep tightening split controls and validating leakage safeguards.

Links