AlphaHand | EEG Finger Movement Identification Research

Decision

Update, May 6, 2026: this audit is now historical context. The current public deployment metrics use the per-finger actuation replay, where the 250 ms hold applies to the same finger instead of globally blocking other fingers.

The public website should display 20260319_075520 as the main 2-M16 deployment model again.

The April 3 run, 20260403_grouptrial_rest050, is still a useful offline benchmark. It wins the standard holdout and event-level scores. It should not be the displayed live-control model because the pseudo-live actuation metrics are worse.

Model Roles

Model	Wins	Loses	Justification
`20260319_075520`	Would-send precision, false rest actuation, rest true-positive rate, action calibration, cleaned pseudo-live committed joint	Offline holdout action/joint accuracy, event-level action/joint accuracy, would-send recall	Use this as the public live-control model because avoiding false actuation matters more than maximizing offline recall for a deployed robot-hand claim.
`20260403_grouptrial_rest050`	Offline holdout action/joint/finger accuracy, event-level action/joint/finger accuracy, would-send recall	Would-send precision, false rest actuation, rest true-positive rate, action calibration	Keep this as an offline research benchmark because it proves training can improve decoding, but it needs safer gating before it should control the public deployment story.

Ranking

Metric	March 19	April 3	Winner
Holdout action accuracy	89.79%	91.83%	April 3
Holdout joint accuracy	84.66%	86.66%	April 3
Holdout non-rest finger accuracy	85.96% eval / 87.01% model card	88.11%	April 3
Event-level action accuracy	92.56%	95.87%	April 3
Event-level joint accuracy	87.60%	93.39%	April 3
Event-level non-rest finger accuracy	90.68%	94.92%	April 3
Holdout rest TPR	98.37%	94.79%	March 19
Action ECE, lower is better	2.32%	3.98%	March 19
Cleaned pseudo-live committed joint	86.64%	86.04%	March 19
Cleaned pseudo-live would-send precision	93.32%	80.06%	March 19
Cleaned pseudo-live would-send recall	10.57%	36.49%	April 3
Cleaned pseudo-live false rest actuation	0.12%	6.74%	March 19

Diagnosis

The April 3 run is not simply "worse." It is more aggressive. It sends more true movement windows, which improves would-send recall from 10.57% to 36.49%, but that comes with a large precision and rest-safety regression.

The precision drop is 13.26 percentage points:

March 19: 93.32% would-send precision
April 3: 80.06% would-send precision

The rest-safety regression is larger in practical terms:

March 19: 0.12% false rest actuation on the cleaned pseudo-live corpus
April 3: 6.74% false rest actuation on the same corpus

Two things explain the regression:

The April 3 model is less conservative around rest. Holdout rest TPR drops from 98.37% to 94.79%.
The April 3 public replay used the raw-gated Step 7 path with postprocessing disabled, while the March 19 displayed metric set uses the tuned deployment family with EMA smoothing, low action/finger thresholds, applicability gating, stability, and cooldown.

Public Model Policy

For the public model on alphahand.org, deployment safety ranks ahead of offline split accuracy. A replacement for March 19 should beat it on:

cleaned-corpus would-send precision
cleaned-corpus false rest actuation
holdout rest TPR
zero invalid committed and sent action-finger pairs

The April 3 run remains useful for contributors because it shows where offline model training improved. The next target is to keep those offline gains while restoring March 19-level actuation precision and rest safety.

Website Changes

Restored the displayed 2-M16 metrics to 20260319_075520.
Added per-event accuracy for the March 19 holdout.
Kept the April 3 comparison in the public metrics JSON as a model-selection audit.
Reframed the April 3 reference run as historical offline context rather than the featured model claim.