Evidence context

Deep dive

This page records the broader tuning campaign, model-selection logic, frozen runtime defaults, and dated evidence trail behind the current public bundle.

Featured run

2-m16 · March 19, 2026

Current featured bundle using the March 19 checkpoint with the May 2026 per-finger actuation path.

Published corpus

2 runs across 2 subjects

4,953 total held-out windows are public on the site today.

Current holdout

84.66%

Rest TPR 98.37% with applicability diagnostics carried alongside the headline metric.

Replay path

86.42% committed joint

Pseudo-live replay stays part of selection because deployment behavior matters more than one split score.

Results overview Featured run detail How it works Research repo

Search and selection scale

The March 19 checkpoint now ships with per-finger actuation defaults.

The trained checkpoint is still 20260319_075520. The public deployment bundle has been updated around it: same-finger holds prevent command chatter, other fingers can still actuate immediately, and the public metrics now report throughput separately from accuracy.

95.37%

Would-send precision

Among non-rest windows that passed the current per-finger actuation gate, 95.37% carried the correct movement command.

0.25%

False rest actuation

Only 6 of 2,404 true REST replay windows produced a movement command under the current defaults.

91.11%

Event hit rate

The replay hit 543 of 596 movement events at least once, which is the better responsiveness summary than window-level send coverage.

2,595 configs

Postprocess ablation

The March 16, 2026 website update documents a 2,595-config ablation over thresholds, smoothing, hysteresis, adjacency, and finger-mode settings.

Selection logic

Why this model won

The current report separates two model roles: April 3 remains a stronger historical offline checkpoint, while March 19 stays public because the deployment replay is stronger on precision, event hits, latency, and rest-period actuation risk.

Holdout action accuracy

March 19

89.79%

April 3

91.83% · offline winner

April 3 improves fixed-split action-head accuracy, so it remains the stronger offline checkpoint for action-state separation.

Holdout joint accuracy

March 19

84.66%

April 3

86.66% · offline winner

April 3 also improves paired action-plus-finger correctness on the holdout, which is useful for model-development tracking before actuation risk is considered.

Event-level joint accuracy

March 19

87.60%

April 3

93.39% · offline winner

April 3 stays ahead after majority voting windows by event, showing that the offline gain is not just single-window variance.

Holdout rest TPR

March 19

98.37% · safety winner

April 3

94.79%

March 19 preserves more true rest windows at the action head, reducing the chance that idle periods enter downstream actuation gates.

Pseudo-live would-send precision

March 19

95.37% · current path

April 3

80.06% · April 24 audit

The current March 19 bundle is substantially more reliable among gated command windows. The April 3 number is retained as historical rollback context, not a same-actuation-path comparison.

Pseudo-live send coverage

March 19

31.56% · throughput

April 3

36.49% · April 24 audit

This is the share of true movement windows that pass the send gate, so it measures command throughput rather than classification accuracy.

False rest actuation

March 19

0.25% · current path

April 3

6.74% · April 24 audit

The current path produced 6 movement commands over 2,404 true REST windows. That rest-period failure mode remains the main safety constraint for the public robot-hand claim.

Pseudo-live committed joint

March 19

86.42% · current path

April 3

86.04% · April 24 audit

After smoothing, applicability checks, stability, and per-finger hold logic are applied, March 19 keeps the deployment score near its held-out decoding ceiling.

Metric	March 19	April 3	Selection interpretation
Holdout action accuracy	89.79%	91.83% · offline winner	April 3 improves fixed-split action-head accuracy, so it remains the stronger offline checkpoint for action-state separation.
Holdout joint accuracy	84.66%	86.66% · offline winner	April 3 also improves paired action-plus-finger correctness on the holdout, which is useful for model-development tracking before actuation risk is considered.
Event-level joint accuracy	87.60%	93.39% · offline winner	April 3 stays ahead after majority voting windows by event, showing that the offline gain is not just single-window variance.
Holdout rest TPR	98.37% · safety winner	94.79%	March 19 preserves more true rest windows at the action head, reducing the chance that idle periods enter downstream actuation gates.
Pseudo-live would-send precision	95.37% · current path	80.06% · April 24 audit	The current March 19 bundle is substantially more reliable among gated command windows. The April 3 number is retained as historical rollback context, not a same-actuation-path comparison.
Pseudo-live send coverage	31.56% · throughput	36.49% · April 24 audit	This is the share of true movement windows that pass the send gate, so it measures command throughput rather than classification accuracy.
False rest actuation	0.25% · current path	6.74% · April 24 audit	The current path produced 6 movement commands over 2,404 true REST windows. That rest-period failure mode remains the main safety constraint for the public robot-hand claim.
Pseudo-live committed joint	86.42% · current path	86.04% · April 24 audit	After smoothing, applicability checks, stability, and per-finger hold logic are applied, March 19 keeps the deployment score near its held-out decoding ceiling.

Frozen stack

Training recipe and runtime defaults

The public bundle includes more than model weights: architecture choice, split policy, preprocessing, auxiliary quiet-rest support, and the replay defaults used for pseudo-live evaluation.

Training stack

Architecture

CNNLSTMFingerActionNet

The March 19 checkpoint combines action decoding, active-finger decoding, and a dedicated finger-applicability head.

Optimization

60 epochs · batch 64 · lr 0.001 · seed 43

These values come from the restored March 19 deployment metrics bundle.

Split policy

group_trial · test_size 0.2 · calibration_size 0.1

The holdout bundle stays tied to a fixed split while calibration is separated from the main train/test partition.

Input + preprocessing

64 x 4 windows · center_detrend

Per-window centering and detrending are frozen into the reference run's preprocessing and normalizer config.

Sampler

cleaned March 19 corpus

The deployment run uses the cleaned combined corpus with the problematic rest events pruned from the February 16 session.

Rest support

1,059 auxiliary quiet-rest windows

The auxiliary quiet-rest session is used for train-only support while the core split contributes 11,388 windows and the public test split contributes 2,301 windows.

Replay and runtime stack

Frozen live defaults

EMA · action 0.05 · finger 0.20 · applicability 0.40

The March 19 displayed metrics use the tuned postprocess family instead of the April 3 raw-gated replay path.

Actuation gates

min_prob 0.2 · stability 3 · same-finger hold 250 ms

The hold suppresses rapid flips on the same finger while allowing other fingers to actuate during that interval.

Replay cadence

0.25 s windows · 0.05 s hop

The pseudo-live replay runs the same checkpoint through the saved inference and actuation path at a replay cadence close to live use.

Cleaned replay accuracy

86.42% committed joint

This is the committed action-plus-finger score on the cleaned pseudo-live corpus; first-send timing is reported separately.

Would-send onset

0.083 s median · 0.377 s p95

Event-relative first-send timing is tracked alongside the precision and rest-safety metrics used for model selection.

Replay footprint

12,447 cleaned windows

The cleaned deployment replay uses the combined March 19 corpus, giving a broader check than the 2,301-window saved test split.

Dated trail

How the campaign evolved

The February 26, 2026 update documents the earlier large-scale weight and hyperparameter campaign: 100+ trained variants, a 30+ hour sweep, and a largest logged 90-run block.
The March 16, 2026 update documents the later deployment-facing postprocess ablation that froze the live default family after 2,595 evaluated configurations.
The March 18, 2026 update widened the selection criteria from holdout accuracy alone to include full-session replay, pseudo-live behavior, and the end-to-end Step 7 control path.
The May 6 replay update keeps March 19 as the displayed checkpoint and replaces the old global cooldown interpretation with per-finger actuation metrics.

Electrode ablation

Which Muse 2 channels carry the 2-M16 signal?

This exploratory ablation retrained the same 2-M16 recipe after changing which Muse electrodes were available to the model. The purpose is to estimate subject-specific channel importance, not to claim anatomical localization or deployment readiness. Action accuracy uses the action head directly; finger accuracy decodes the active-finger head on true non-REST windows and is not yet deployment-gated.

Study branch

ablation/electrode-channel-importance

Exploratory subject-specific channel ablation for 2-M16.

Train/test config

Same 2-M16 recipe

Each subset used the featured derived dataset, split policy, model recipe, and deterministic test evaluation.

Primary sweep

Seed 43 · 60 epochs

All, single-channel, and leave-one-out subsets were retrained.

Temporal-pair add-on

TP9+TP10

AF7 and AF8 were removed together, leaving only temporal electrodes.

Full montage

90.96% action · 86.26% finger

Fresh retrain using TP9+AF7+AF8+TP10.

TP9+TP10 only

88.31% action · 83.90% finger

Only 2.65 pp action loss and 2.36 pp finger loss versus the fresh full-montage retrain.

Frontal-drop finding

REST did not decrease

Dropping AF7 or AF8 mainly reduced OPEN accuracy, not REST accuracy.

Evidence boundary

Single seed

This is retraining-ablation evidence, not deployment-gated replay or anatomical localization.

Frontal-channel interpretation

Dropping a frontal electrode did not reduce REST accuracy. With AF7 removed, REST increased by 0.65 pp, OPEN decreased by 2.39 pp, and CLOSE decreased by 0.56 pp. With AF8 removed, REST increased by 2.28 pp, OPEN decreased by 3.37 pp, and CLOSE increased by 1.03 pp. The small action decrease from frontal-channel removal is therefore driven mainly by OPEN, not REST, and is not a uniform action decline.

The TP9+TP10-only temporal pair preserves most of the full-montage retrain: 88.31% action accuracy and 83.90% non-REST finger accuracy, only 2.65 pp and 2.36 pp below the fresh full-montage baseline.

Leave-one-out importance

Larger positive drops indicate stronger dependence on the omitted electrode.

Omitted	Kept channels	Action drop	Finger drop
TP10	TP9+AF7+AF8	12.08 pp	9.13 pp
TP9	AF7+AF8+TP10	11.73 pp	13.29 pp
AF7	TP9+AF8+TP10	1.13 pp	-0.40 pp
AF8	TP9+AF7+TP10	0.56 pp	-1.30 pp

Single-channel sufficiency

TP9 and TP10 each retain far more signal than either frontal electrode alone.

Electrode	Action acc	Finger acc
TP9	73.01%	72.22%
TP10	72.66%	67.80%
AF7	48.89%	35.46%
AF8	48.33%	36.16%

Class-level test windows

Every ablation row uses the same held-out split and the same test-window counts.

Action labels

REST, OPEN, and CLOSE head.

2,301 total

REST	307
OPEN	921
CLOSE	1073

Finger labels

True non-REST finger windows.

1,994 total

THUMB	466
INDEX	327
MIDDLE	443
RING	365
PINKY	393

Per-action accuracy and deltas

Deltas are percentage-point changes relative to the full-montage retrain.

Run	Overall	REST	OPEN	CLOSE	REST delta	OPEN delta	CLOSE delta
All (TP9+AF7+AF8+TP10)	90.96%	97.72%	94.68%	85.83%	+0.00 pp	+0.00 pp	+0.00 pp
TP9 only	73.01%	70.68%	82.19%	65.80%	-27.04 pp	-12.49 pp	-20.04 pp
AF7 only	48.89%	40.39%	47.56%	52.47%	-57.33 pp	-47.12 pp	-33.36 pp
AF8 only	48.33%	29.97%	41.15%	59.74%	-67.75 pp	-53.53 pp	-26.10 pp
TP10 only	72.66%	100.00%	83.28%	55.73%	+2.28 pp	-11.40 pp	-30.10 pp
Drop TP9	79.23%	100.00%	67.75%	83.13%	+2.28 pp	-26.93 pp	-2.70 pp
Drop AF7	89.83%	98.37%	92.29%	85.27%	+0.65 pp	-2.39 pp	-0.56 pp
Drop AF8	90.40%	100.00%	91.31%	86.86%	+2.28 pp	-3.37 pp	+1.03 pp
Drop TP10	78.88%	97.72%	75.57%	76.33%	+0.00 pp	-19.11 pp	-9.51 pp
TP9+TP10 only	88.31%	100.00%	90.34%	83.22%	+2.28 pp	-4.34 pp	-2.61 pp

Per-finger accuracy

Finger metrics are computed on true non-REST windows from the active-finger head.

Run	Overall	THUMB	INDEX	MIDDLE	RING	PINKY
All (TP9+AF7+AF8+TP10)	86.26%	100.00%	96.33%	85.10%	58.08%	89.06%
TP9 only	72.22%	100.00%	80.73%	57.79%	47.67%	71.25%
AF7 only	35.46%	61.59%	11.01%	21.44%	7.12%	66.92%
AF8 only	36.16%	47.85%	0.92%	39.73%	8.49%	73.28%
TP10 only	67.80%	86.48%	67.58%	56.88%	54.52%	70.48%
Drop TP9	72.97%	75.32%	76.45%	83.75%	39.73%	86.01%
Drop AF7	86.66%	100.00%	88.38%	81.94%	70.68%	89.57%
Drop AF8	87.56%	100.00%	93.88%	83.30%	76.99%	82.19%
Drop TP10	77.13%	98.71%	73.09%	78.56%	39.18%	88.55%
TP9+TP10 only	83.90%	99.57%	81.04%	74.72%	70.41%	90.59%

Per-finger deltas versus full montage

Positive values mean that subset improved over the fresh full-montage retrain for that finger class.

Run	Overall	THUMB	INDEX	MIDDLE	RING	PINKY
All (TP9+AF7+AF8+TP10)	+0.00 pp	+0.00 pp	+0.00 pp	+0.00 pp	+0.00 pp	+0.00 pp
TP9 only	-14.04 pp	+0.00 pp	-15.60 pp	-27.31 pp	-10.41 pp	-17.81 pp
AF7 only	-50.80 pp	-38.41 pp	-85.32 pp	-63.66 pp	-50.96 pp	-22.14 pp
AF8 only	-50.10 pp	-52.15 pp	-95.41 pp	-45.37 pp	-49.59 pp	-15.78 pp
TP10 only	-18.46 pp	-13.52 pp	-28.75 pp	-28.22 pp	-3.56 pp	-18.58 pp
Drop TP9	-13.29 pp	-24.68 pp	-19.88 pp	-1.35 pp	-18.36 pp	-3.05 pp
Drop AF7	+0.40 pp	+0.00 pp	-7.95 pp	-3.16 pp	+12.60 pp	+0.51 pp
Drop AF8	+1.30 pp	+0.00 pp	-2.45 pp	-1.81 pp	+18.90 pp	-6.87 pp
Drop TP10	-9.13 pp	-1.29 pp	-23.24 pp	-6.55 pp	-18.90 pp	-0.51 pp
TP9+TP10 only	-2.36 pp	-0.43 pp	-15.29 pp	-10.38 pp	+12.33 pp	+1.53 pp

Finger-level interpretation

Removing TP9 causes a larger overall finger loss (-13.29 pp) than removing TP10 (-9.13 pp), suggesting TP9 is the stronger temporal contributor in this split. Removing AF7 or AF8 does not cause a broad finger collapse: the aggregate finger score slightly improves in both frontal-drop runs.

Evidence boundary

AF7 and AF8 may still help specific digit boundaries, especially INDEX and MIDDLE, while adding noise or conflicting information for RING in this subject-specific split. The next step is to repeat the sweep across seeds and run reduced montages through deployment-consistent replay before making command-path claims.

Recommended material

Keep reading

Results overview

Current headline metrics, figures, caveats, and published runs.

Visualization gallery

The dedicated page for interactive PCA and UMAP latent-space views.

Featured run detail

The bundle-level audit page for the current winning checkpoint.

How it works

Capture, validation, windowing, training, evaluation, and deployment gates.

Research repo

Use the public code and artifacts to reproduce, test, or extend the project.

Per-finger actuation update

May 6, 2026 update replacing the old global cooldown interpretation with current throughput and event-hit metrics.

Rollback audit

April 24, 2026 model-selection audit comparing March 19 and April 3 with concrete rankings.

Live-inference diagnosis

April 29, 2026 follow-up documenting the April 28 same-sitting replay failure and the added preflight requirements.

Winning-model update

March 19, 2026 refresh for the restored featured deployment candidate.

Deployment breakthrough update

March 18, 2026 post that expands the selection story to replay and pseudo-live behavior.

Split-fix and live-defaults update

March 16, 2026 post documenting the 2,595-config postprocess ablation.

Older tuning update

February 26, 2026 tuning post documenting 100+ model variants, a 30+ hour sweep, and the 90-run / 33.3-hour logged block.

HTML report

Full report bundle for the current public run.

Metrics JSON

Sanitized metrics bundle for the current public run.

Earlier checkpoint

Subject 1-M16 — Run 2026-03-05

The 1-M16 bundle is still useful because it shows a second subject, an earlier tuning cycle, and the kind of metrics that were available before the broader March 2026 evaluation push. It is not the model-selection path that produced the current featured 2-M16 run.

2026-02-27 refresh

Published tuning checkpoint

The website's February 27, 2026 update documents a tuned 1-M16 rerun and compares it against the February 21 baseline.

Historical baseline

Earlier methods stack

Later March updates explicitly warn that 1-M16 used earlier methods and should not be treated as a direct comparison to the current 2-M16 deployment stack.

60 epochs

Published training budget

The public bundle preserves the core training budget: 60 epochs, batch size 64, learning rate 0.001, and seed 45.

Baseline run detail Historical update