2026-02-26

2-M16 tuning refresh

Published a new tuned 2-M16 model and compared it to the 2026-02-24 version, plus expanded diagnostics.

Historical note: archived update posts preserve the figures published at that time. For the current verified run bundles, use the results page.

What changed

We reran tuning for the 2-M16 session captured on 2026-02-16, producing a new model run on 2026-02-26. The new run uses updated loss weighting and evaluation settings (see below), and the report now exposes additional diagnostics beyond accuracy.

  • Training weights: loss_action_weight=2.0, rest_weight=3.0, uniform finger weights.
  • Split/eval settings: split_mode=group_trial, test_size=0.2, seed=42, thresholds 0.75 for action and finger, no smoothing/hysteresis.
  • Report now includes expanded metrics (F1 variants, REST TPR/FPR/precision, and overall finger accuracy).

Before vs after (2-M16)

MetricBefore (2026-02-24)After (2026-02-26)Δ
Test action accuracy89.71%85.88%-3.82 pp
Test finger accuracy on non-REST windows90.38%87.24%-3.14 pp
Train action accuracy91.19%88.01%-3.19 pp
Train finger accuracy89.41%88.30%-1.11 pp
Train avg loss0.50200.8820+0.3801
Test windows2,0402,026-14
Test non-REST windows1,8711,857-14

Expanded diagnostics (new report)

  • Action F1 (macro/weighted): 0.867 / 0.860.
  • Finger F1 (non-REST macro/weighted): 0.730 / 0.873.
  • Finger accuracy (overall): 79.96%.
  • Finger F1 (overall macro/weighted): 0.696 / 0.767.
  • REST: TPR 80.47%, FPR 0.05%, Precision 99.27%, F1 0.889.

Notes

  • The tuning refresh reduced both action accuracy and finger accuracy on non-REST windows relative to the 2026-02-24 run, so this configuration is currently a step back in headline accuracy.
  • We will keep iterating on weighting and calibration to regain accuracy while preserving better REST control and stability.

Links