2026-03-19

2-M16 winning model refresh

Published the March 19 2-M16 bundle with cleaned training data, updated applicability handling, refreshed replay metrics, and a new public figure/report bundle.

Historical note: archived update posts preserve the figures published at that time. For the current verified run bundles, use the results page.

What changed

The public 2-m16 bundle now points at the March 19 winning deployment candidate:

  • Session: combined_20260319_081200_pruned_rest_events_0_1_2
  • Run: 20260319_075520

This update replaces the March 18 public bundle with the current winning-model snapshot used by the deployment config and replay tooling.

Architecture changes now reflected on-site

  • The public model still uses active-finger decoding, so committed OPEN/CLOSE predictions always carry a real finger.
  • A new finger-applicability head now models whether a finger label is meaningful on each window, instead of pretending the active-finger head should solve REST detection by itself.
  • The deployment threshold is now tuned and published as threshold_applicability = 0.4.
  • The public site now surfaces applicability false-positive / false-negative rates and the deployment pair invariant, not just offline action and finger accuracy.

Updated headline metrics

  • Test action accuracy: 89.79%
  • Test finger accuracy on non-REST windows: 87.01%
  • Primary holdout joint accuracy: 84.66%
  • Primary holdout joint accuracy on non-REST windows: 82.55%
  • Primary holdout REST TPR / precision: 98.37% / 80.11%
  • Action ECE / finger ECE on non-REST: 2.32% / 2.73%
  • Holdout applicability FP on true REST: 18.57%
  • Holdout applicability FN on true non-REST: 2.26%

Most importantly, the published deployment invariants are now clean:

  • Committed non-REST + NONE rate: 0.00%
  • Committed REST + active-finger rate: 0.00%
  • Sent non-REST + NONE rate: 0.00%
  • Sent REST + active-finger rate: 0.00%

Updated replay ladder

  • Cleaned deployment replay: 86.64% committed joint accuracy, 93.32% would-send precision, 0.12% false REST actuation
  • Legacy combined replay: 82.98% committed joint accuracy, 89.62% would-send precision, 1.71% false REST actuation
  • March 17 realism replay: 71.96% committed joint accuracy, 62.50% would-send precision, 0.09% false REST actuation

The realism replay remains conservative because applicability recall is still weak on that session, but the zero-tolerance pair invariant holds across all published replay bundles.

Site-wide changes

  • The featured run card, results overview, and run detail page now reference the March 19 winning deployment candidate.
  • The public figure set was refreshed with the new confusion matrices, calibration figures, and report HTML.
  • Applicability diagnostics are now shown directly in the results UI and in the public metrics bundle.
  • The older topomap block is no longer part of the featured 2-m16 bundle because regenerated topomap assets were not part of this March 19 winning-model snapshot.

Links