Evidence context

Deep dive

This page records the broader tuning campaign, model-selection logic, frozen runtime defaults, and dated evidence trail behind the current public bundle.

Featured run

2-m16 · March 19, 2026

Current featured bundle using the March 19 checkpoint with the May 2026 per-finger actuation path.

Published corpus

2 runs across 2 subjects

4,953 total held-out windows are public on the site today.

Current holdout

84.66%

Rest TPR 98.37% with applicability diagnostics carried alongside the headline metric.

Replay path

86.42% committed joint

Pseudo-live replay stays part of selection because deployment behavior matters more than one split score.

Search and selection scale

The March 19 checkpoint now ships with per-finger actuation defaults.

The trained checkpoint is still 20260319_075520. The public deployment bundle has been updated around it: same-finger holds prevent command chatter, other fingers can still actuate immediately, and the public metrics now report throughput separately from accuracy.

95.37%

Would-send precision

Among non-rest windows that passed the current per-finger actuation gate, 95.37% carried the correct movement command.

0.25%

False rest actuation

Only 6 of 2,404 true REST replay windows produced a movement command under the current defaults.

91.11%

Event hit rate

The replay hit 543 of 596 movement events at least once, which is the better responsiveness summary than window-level send coverage.

2,595 configs

Postprocess ablation

The March 16, 2026 website update documents a 2,595-config ablation over thresholds, smoothing, hysteresis, adjacency, and finger-mode settings.

Selection logic

Why this model won

The current report separates two model roles: April 3 remains a stronger historical offline checkpoint, while March 19 stays public because the deployment replay is stronger on precision, event hits, latency, and rest-period actuation risk.

Holdout action accuracy

March 19

89.79%

April 3

91.83% · offline winner

April 3 improves fixed-split action-head accuracy, so it remains the stronger offline checkpoint for action-state separation.

Holdout joint accuracy

March 19

84.66%

April 3

86.66% · offline winner

April 3 also improves paired action-plus-finger correctness on the holdout, which is useful for model-development tracking before actuation risk is considered.

Event-level joint accuracy

March 19

87.60%

April 3

93.39% · offline winner

April 3 stays ahead after majority voting windows by event, showing that the offline gain is not just single-window variance.

Holdout rest TPR

March 19

98.37% · safety winner

April 3

94.79%

March 19 preserves more true rest windows at the action head, reducing the chance that idle periods enter downstream actuation gates.

Pseudo-live would-send precision

March 19

95.37% · current path

April 3

80.06% · April 24 audit

The current March 19 bundle is substantially more reliable among gated command windows. The April 3 number is retained as historical rollback context, not a same-actuation-path comparison.

Pseudo-live send coverage

March 19

31.56% · throughput

April 3

36.49% · April 24 audit

This is the share of true movement windows that pass the send gate, so it measures command throughput rather than classification accuracy.

False rest actuation

March 19

0.25% · current path

April 3

6.74% · April 24 audit

The current path produced 6 movement commands over 2,404 true REST windows. That rest-period failure mode remains the main safety constraint for the public robot-hand claim.

Pseudo-live committed joint

March 19

86.42% · current path

April 3

86.04% · April 24 audit

After smoothing, applicability checks, stability, and per-finger hold logic are applied, March 19 keeps the deployment score near its held-out decoding ceiling.

Frozen stack

Training recipe and runtime defaults

The public bundle includes more than model weights: architecture choice, split policy, preprocessing, auxiliary quiet-rest support, and the replay defaults used for pseudo-live evaluation.

Training stack

Architecture

CNNLSTMFingerActionNet

The March 19 checkpoint combines action decoding, active-finger decoding, and a dedicated finger-applicability head.

Optimization

60 epochs · batch 64 · lr 0.001 · seed 43

These values come from the restored March 19 deployment metrics bundle.

Split policy

group_trial · test_size 0.2 · calibration_size 0.1

The holdout bundle stays tied to a fixed split while calibration is separated from the main train/test partition.

Input + preprocessing

64 x 4 windows · center_detrend

Per-window centering and detrending are frozen into the reference run's preprocessing and normalizer config.

Sampler

cleaned March 19 corpus

The deployment run uses the cleaned combined corpus with the problematic rest events pruned from the February 16 session.

Rest support

1,059 auxiliary quiet-rest windows

The auxiliary quiet-rest session is used for train-only support while the core split contributes 11,388 windows and the public test split contributes 2,301 windows.

Replay and runtime stack

Frozen live defaults

EMA · action 0.05 · finger 0.20 · applicability 0.40

The March 19 displayed metrics use the tuned postprocess family instead of the April 3 raw-gated replay path.

Actuation gates

min_prob 0.2 · stability 3 · same-finger hold 250 ms

The hold suppresses rapid flips on the same finger while allowing other fingers to actuate during that interval.

Replay cadence

0.25 s windows · 0.05 s hop

The pseudo-live replay runs the same checkpoint through the saved inference and actuation path at a replay cadence close to live use.

Cleaned replay accuracy

86.42% committed joint

This is the committed action-plus-finger score on the cleaned pseudo-live corpus; first-send timing is reported separately.

Would-send onset

0.083 s median · 0.377 s p95

Event-relative first-send timing is tracked alongside the precision and rest-safety metrics used for model selection.

Replay footprint

12,447 cleaned windows

The cleaned deployment replay uses the combined March 19 corpus, giving a broader check than the 2,301-window saved test split.

Dated trail

How the campaign evolved

  • The February 26, 2026 update documents the earlier large-scale weight and hyperparameter campaign: 100+ trained variants, a 30+ hour sweep, and a largest logged 90-run block.
  • The March 16, 2026 update documents the later deployment-facing postprocess ablation that froze the live default family after 2,595 evaluated configurations.
  • The March 18, 2026 update widened the selection criteria from holdout accuracy alone to include full-session replay, pseudo-live behavior, and the end-to-end Step 7 control path.
  • The May 6 replay update keeps March 19 as the displayed checkpoint and replaces the old global cooldown interpretation with per-finger actuation metrics.

Electrode ablation

Which Muse 2 channels carry the 2-M16 signal?

This exploratory ablation retrained the same 2-M16 recipe after changing which Muse electrodes were available to the model. The purpose is to estimate subject-specific channel importance, not to claim anatomical localization or deployment readiness. Action accuracy uses the action head directly; finger accuracy decodes the active-finger head on true non-REST windows and is not yet deployment-gated.

Study branch

ablation/electrode-channel-importance

Exploratory subject-specific channel ablation for 2-M16.

Train/test config

Same 2-M16 recipe

Each subset used the featured derived dataset, split policy, model recipe, and deterministic test evaluation.

Primary sweep

Seed 43 · 60 epochs

All, single-channel, and leave-one-out subsets were retrained.

Temporal-pair add-on

TP9+TP10

AF7 and AF8 were removed together, leaving only temporal electrodes.

Full montage

90.96% action · 86.26% finger

Fresh retrain using TP9+AF7+AF8+TP10.

TP9+TP10 only

88.31% action · 83.90% finger

Only 2.65 pp action loss and 2.36 pp finger loss versus the fresh full-montage retrain.

Frontal-drop finding

REST did not decrease

Dropping AF7 or AF8 mainly reduced OPEN accuracy, not REST accuracy.

Evidence boundary

Single seed

This is retraining-ablation evidence, not deployment-gated replay or anatomical localization.

Frontal-channel interpretation

Dropping a frontal electrode did not reduce REST accuracy. With AF7 removed, REST increased by 0.65 pp, OPEN decreased by 2.39 pp, and CLOSE decreased by 0.56 pp. With AF8 removed, REST increased by 2.28 pp, OPEN decreased by 3.37 pp, and CLOSE increased by 1.03 pp. The small action decrease from frontal-channel removal is therefore driven mainly by OPEN, not REST, and is not a uniform action decline.

The TP9+TP10-only temporal pair preserves most of the full-montage retrain: 88.31% action accuracy and 83.90% non-REST finger accuracy, only 2.65 pp and 2.36 pp below the fresh full-montage baseline.

Leave-one-out importance

Larger positive drops indicate stronger dependence on the omitted electrode.

OmittedKept channelsAction dropFinger drop
TP10TP9+AF7+AF812.08 pp9.13 pp
TP9AF7+AF8+TP1011.73 pp13.29 pp
AF7TP9+AF8+TP101.13 pp-0.40 pp
AF8TP9+AF7+TP100.56 pp-1.30 pp

Single-channel sufficiency

TP9 and TP10 each retain far more signal than either frontal electrode alone.

ElectrodeAction accFinger acc
TP973.01%72.22%
TP1072.66%67.80%
AF748.89%35.46%
AF848.33%36.16%

Class-level test windows

Every ablation row uses the same held-out split and the same test-window counts.

Action labels

REST, OPEN, and CLOSE head.

2,301 total

REST307
OPEN921
CLOSE1073

Finger labels

True non-REST finger windows.

1,994 total

THUMB466
INDEX327
MIDDLE443
RING365
PINKY393

Per-action accuracy and deltas

Deltas are percentage-point changes relative to the full-montage retrain.

RunOverallRESTOPENCLOSEREST deltaOPEN deltaCLOSE delta
All (TP9+AF7+AF8+TP10)90.96%97.72%94.68%85.83%+0.00 pp+0.00 pp+0.00 pp
TP9 only73.01%70.68%82.19%65.80%-27.04 pp-12.49 pp-20.04 pp
AF7 only48.89%40.39%47.56%52.47%-57.33 pp-47.12 pp-33.36 pp
AF8 only48.33%29.97%41.15%59.74%-67.75 pp-53.53 pp-26.10 pp
TP10 only72.66%100.00%83.28%55.73%+2.28 pp-11.40 pp-30.10 pp
Drop TP979.23%100.00%67.75%83.13%+2.28 pp-26.93 pp-2.70 pp
Drop AF789.83%98.37%92.29%85.27%+0.65 pp-2.39 pp-0.56 pp
Drop AF890.40%100.00%91.31%86.86%+2.28 pp-3.37 pp+1.03 pp
Drop TP1078.88%97.72%75.57%76.33%+0.00 pp-19.11 pp-9.51 pp
TP9+TP10 only88.31%100.00%90.34%83.22%+2.28 pp-4.34 pp-2.61 pp

Per-finger accuracy

Finger metrics are computed on true non-REST windows from the active-finger head.

RunOverallTHUMBINDEXMIDDLERINGPINKY
All (TP9+AF7+AF8+TP10)86.26%100.00%96.33%85.10%58.08%89.06%
TP9 only72.22%100.00%80.73%57.79%47.67%71.25%
AF7 only35.46%61.59%11.01%21.44%7.12%66.92%
AF8 only36.16%47.85%0.92%39.73%8.49%73.28%
TP10 only67.80%86.48%67.58%56.88%54.52%70.48%
Drop TP972.97%75.32%76.45%83.75%39.73%86.01%
Drop AF786.66%100.00%88.38%81.94%70.68%89.57%
Drop AF887.56%100.00%93.88%83.30%76.99%82.19%
Drop TP1077.13%98.71%73.09%78.56%39.18%88.55%
TP9+TP10 only83.90%99.57%81.04%74.72%70.41%90.59%

Per-finger deltas versus full montage

Positive values mean that subset improved over the fresh full-montage retrain for that finger class.

RunOverallTHUMBINDEXMIDDLERINGPINKY
All (TP9+AF7+AF8+TP10)+0.00 pp+0.00 pp+0.00 pp+0.00 pp+0.00 pp+0.00 pp
TP9 only-14.04 pp+0.00 pp-15.60 pp-27.31 pp-10.41 pp-17.81 pp
AF7 only-50.80 pp-38.41 pp-85.32 pp-63.66 pp-50.96 pp-22.14 pp
AF8 only-50.10 pp-52.15 pp-95.41 pp-45.37 pp-49.59 pp-15.78 pp
TP10 only-18.46 pp-13.52 pp-28.75 pp-28.22 pp-3.56 pp-18.58 pp
Drop TP9-13.29 pp-24.68 pp-19.88 pp-1.35 pp-18.36 pp-3.05 pp
Drop AF7+0.40 pp+0.00 pp-7.95 pp-3.16 pp+12.60 pp+0.51 pp
Drop AF8+1.30 pp+0.00 pp-2.45 pp-1.81 pp+18.90 pp-6.87 pp
Drop TP10-9.13 pp-1.29 pp-23.24 pp-6.55 pp-18.90 pp-0.51 pp
TP9+TP10 only-2.36 pp-0.43 pp-15.29 pp-10.38 pp+12.33 pp+1.53 pp

Finger-level interpretation

Removing TP9 causes a larger overall finger loss (-13.29 pp) than removing TP10 (-9.13 pp), suggesting TP9 is the stronger temporal contributor in this split. Removing AF7 or AF8 does not cause a broad finger collapse: the aggregate finger score slightly improves in both frontal-drop runs.

Evidence boundary

AF7 and AF8 may still help specific digit boundaries, especially INDEX and MIDDLE, while adding noise or conflicting information for RING in this subject-specific split. The next step is to repeat the sweep across seeds and run reduced montages through deployment-consistent replay before making command-path claims.

Recommended material

Keep reading

Earlier checkpoint

Subject 1-M16 — Run 2026-03-05

The 1-M16 bundle is still useful because it shows a second subject, an earlier tuning cycle, and the kind of metrics that were available before the broader March 2026 evaluation push. It is not the model-selection path that produced the current featured 2-M16 run.

2026-02-27 refresh

Published tuning checkpoint

The website's February 27, 2026 update documents a tuned 1-M16 rerun and compares it against the February 21 baseline.

Historical baseline

Earlier methods stack

Later March updates explicitly warn that 1-M16 used earlier methods and should not be treated as a direct comparison to the current 2-M16 deployment stack.

60 epochs

Published training budget

The public bundle preserves the core training budget: 60 epochs, batch size 64, learning rate 0.001, and seed 45.