2026-03-18

2-M16 deployment breakthrough

Published the first March 18 2-M16 public bundle with all-session training, active-finger decoding, pseudo-live replay, topomap evidence, and the corrected Step 7 speed-control path.

Historical note: archived update posts preserve the figures published at that time. For the current verified run bundles, use the results page.

What changed

This release marks a substantive architecture and evaluation update for 2-M16, not just another offline metric refresh.

The public 2-m16 bundle is now built around the March 18 deployment candidate:

  • Session: combined_20260317_211414
  • Run: 20260318_042115

The current recommendation for live use is no longer a narrower split-fixed mixed-rest run. The main checkpoint is now trained on all currently available 2-M16 sessions, because those are the only sessions we have and the smaller holdout-only model was too weak to justify shipping as the default live candidate.

Why this is a breakthrough

  • The active-finger head removes the old OPEN/CLOSE + NONE failure mode from non-REST decoding.
  • The site now publishes pseudo-live replay metrics that reuse the Step 7 decision path offline, so the public bundle can report expected control behavior instead of only raw classifier accuracy.
  • Step 7 now preserves the uncertainty-aware speed scalar all the way through the command shaper, so modulate_actuation_speed is honored end-to-end.
  • The public run bundle now includes alpha-band topomaps so readers can see the signal structure behind the decoding story rather than only confusion matrices.

Headline public results

  • Primary holdout action accuracy: 78.39%
  • Primary holdout finger accuracy on non-REST windows: 82.03%
  • Primary holdout joint accuracy: 75.19%
  • Primary holdout REST TPR: 57.11%
  • Repeated-split action accuracy mean / std: 82.23% / 4.35%
  • Repeated-split joint accuracy mean / std: 78.66% / 3.77%

These are still offline evaluation numbers, not a live-control guarantee, but they are the most appropriate public baseline for the current deployment candidate.

Replay benchmarks now published

The old site mostly stopped at holdout metrics. This update adds a fuller ladder:

  • Auxiliary quiet-rest replay: 96.03% action accuracy, REST F1 0.980
  • All-session chronological replay before actuation gating: 86.40% action, 82.70% joint
  • Pseudo-live replay on the combined corpus: 84.77% committed joint accuracy, 91.53% would-send precision on non-REST windows
  • Pseudo-live replay on the unseen March 17 mixed session: 70.44% committed action accuracy, 70.32% committed joint accuracy, 66.67% would-send precision

That unseen March 17 pseudo-live benchmark is currently the most informative public realism check because it is closer to the deployed decision path than a raw holdout split.

Validation benchmark vs deployment model

We also kept a smaller realism benchmark for configuration validation:

  • Train on 2-M16_20260216_150056_01
  • Add 2-M16_20260315_145838_01 as quiet-rest auxiliary training only
  • Replay on 2-M16_20260317_190134

That validation-only model performs much worse than the all-session deployment model:

  • Committed action accuracy: 39.05%
  • Committed joint accuracy: 36.31%
  • Would-send precision on non-REST windows: 4.55%

That is why the website now distinguishes validation benchmarks from the recommended deployment checkpoint. With the current data volume, the deployment model should use all currently available subject sessions.

What the topomaps add

The new alpha-band topomaps help explain the decoding behavior:

  • REST alpha is strongly dominated by TP10 versus TP9, so absolute maps mostly reflect scale.
  • Rest-relative action maps show OPEN and CLOSE both dominated by decreased TP10 power with smaller TP9 increases.
  • Finger-level variation is concentrated on TP10 and then TP9, while AF7 and AF8 move much less.
  • Split-halves maps show drift, but not catastrophic collapse, which supports the all-session training strategy.

These topomaps do not replace the decoder metrics, but they do make the current model story more interpretable.

Legacy comparison note

The published 1-M16 bundle remains on the website as a historical reference point. It was produced with an earlier model and legacy evaluation methods, so its numbers should not be treated as directly comparable to the current 2-M16 deployment candidate.

What changed in live control

Two practical live-control issues were resolved in this cycle:

  1. The active-finger decoding change removes non-REST NONE leakage from the published benchmark family.
  2. Step 7 speed modulation is now actually applied end-to-end, so action uncertainty can influence final hand speed instead of being silently discarded.

What to watch next

  • The next hard gate is real live validation with synchronized prediction logs and video review.
  • The public site now has the scaffolding to report those outcomes cleanly: holdout metrics, replay metrics, pseudo-live behavior, and signal evidence all live in one place.
  • If the next live session holds up, the website is now structured to show a true control milestone instead of just another offline experiment.

Links