2026-05-06

2-M16 per-finger actuation metrics

Updated the featured 2-M16 public bundle around the March 19 checkpoint with per-finger command shaping, current replay metrics, and clearer reporting that separates accuracy from throughput.

Historical note: archived update posts preserve the figures published at that time. For the current verified run bundles, use the results page.

What changed

The featured public bundle still uses the 20260319_075520 checkpoint. The new part is the deployment path around it.

The old report treated the 250 ms cooldown like a global command block. That made the replay look artificially slow and made the 10.57% would-send recall easy to misread as "10% accuracy." It was not accuracy. It was the share of true movement windows that passed a conservative send gate.

The current path uses the intended behavior:

  • A finger holds its last command briefly after it receives one.
  • A new command for that same finger can be suppressed during the hold window to avoid chatter.
  • Other fingers can still actuate during that interval when their own commands pass the gate.

Current cleaned-corpus replay

MetricCurrent valueHow to read it
Would-send precision on non-rest command windows95.37%Reliability of commands that the gate actually sends
Window-level send coverage on true non-rest windows31.56%Throughput coverage, not classification accuracy
Event hit rate91.11%543 of 596 movement events received at least one command
First-hit latency0.083 s median / 0.377 s p95Time to the first command inside a movement event
False REST actuation0.25%6 commands over 2,404 true REST windows
Committed joint action+finger accuracy86.42%Replay accuracy after command shaping

The model's held-out decoding numbers remain much higher than the old 10.57% because they answer a different question. The held-out split measures classifier correctness on windows. The would-send coverage metric measures how often the deployment gate chooses to issue a command.

Reporting policy

The site now treats pseudo-live replay as a deployment report, not a single accuracy number.

For public robot-hand claims, the useful headline set is:

  • command precision
  • false REST actuation
  • event hit rate
  • first-hit latency
  • committed joint action+finger accuracy
  • window-level send coverage, explicitly labeled as throughput

This makes the tradeoff clearer. A very conservative gate can have high precision and low false actuation while still feeling sluggish. A useful actuation path needs both reliability and responsiveness.

Verification

The 2-M16 evaluation suite was rerun for the featured checkpoint and produced the current public figures. The source repo also passes the command-shaper test update for the per-finger hold behavior and the full Python test suite.

The April 24 rollback audit remains on the site as historical model-selection context. Its old global-gate pseudo-live numbers should no longer be treated as the current deployment headline.