What changed
The featured public bundle still uses the 20260319_075520 checkpoint. The new part is the deployment path around it.
The old report treated the 250 ms cooldown like a global command block. That made the replay look artificially slow and made the 10.57% would-send recall easy to misread as "10% accuracy." It was not accuracy. It was the share of true movement windows that passed a conservative send gate.
The current path uses the intended behavior:
- A finger holds its last command briefly after it receives one.
- A new command for that same finger can be suppressed during the hold window to avoid chatter.
- Other fingers can still actuate during that interval when their own commands pass the gate.
Current cleaned-corpus replay
| Metric | Current value | How to read it |
|---|---|---|
| Would-send precision on non-rest command windows | 95.37% | Reliability of commands that the gate actually sends |
| Window-level send coverage on true non-rest windows | 31.56% | Throughput coverage, not classification accuracy |
| Event hit rate | 91.11% | 543 of 596 movement events received at least one command |
| First-hit latency | 0.083 s median / 0.377 s p95 | Time to the first command inside a movement event |
| False REST actuation | 0.25% | 6 commands over 2,404 true REST windows |
| Committed joint action+finger accuracy | 86.42% | Replay accuracy after command shaping |
The model's held-out decoding numbers remain much higher than the old 10.57% because they answer a different question. The held-out split measures classifier correctness on windows. The would-send coverage metric measures how often the deployment gate chooses to issue a command.
Reporting policy
The site now treats pseudo-live replay as a deployment report, not a single accuracy number.
For public robot-hand claims, the useful headline set is:
- command precision
- false REST actuation
- event hit rate
- first-hit latency
- committed joint action+finger accuracy
- window-level send coverage, explicitly labeled as throughput
This makes the tradeoff clearer. A very conservative gate can have high precision and low false actuation while still feeling sluggish. A useful actuation path needs both reliability and responsiveness.
Verification
The 2-M16 evaluation suite was rerun for the featured checkpoint and produced the current public figures. The source repo also passes the command-shaper test update for the per-finger hold behavior and the full Python test suite.
The April 24 rollback audit remains on the site as historical model-selection context. Its old global-gate pseudo-live numbers should no longer be treated as the current deployment headline.