Results

Featured run

Subject 2-M16 — Run 2026-03-19

Current source: combined_20260319_081200_pruned_rest_events_0_1_2 · 20260319_075520

Action accuracy: 89.79%

Finger accuracy on non-REST windows: 87.01%

Current featured public bundle with March 19 metrics, figures, and replay summaries.

How metrics are labeled

These sections are separated so saved split metrics, holdout diagnostics, and replay summaries are not presented as the same thing.

Saved test split

89.79% / 87.01%

Action and finger accuracy

Action accuracy is reported on all test windows; finger accuracy is reported on non-REST windows.

2,301 test windows; 1,994 non-REST

Primary holdout

84.66%

Joint accuracy with REST and applicability diagnostics

Adds REST and applicability diagnostics beyond the saved test split.

Deployment pair invariant passed; 2.26% applicability FN on true non-REST

Quiet-rest replay

97.26%

Auxiliary quiet-REST check

Separate replay used to characterize REST behavior.

4.53% applicability FP on true REST

Chronological replay

84.30%

Core full-session replay

Replay across the two core movement sessions on a longer chronological trace.

95.99% REST TPR; 18.07% applicability FP on true REST

Pseudo-live replay

86.64%

Saved Step 7 decision path

Replay through the saved decision path, including would-send precision and false REST actuation.

0.12% false REST actuation; 93.32% would-send precision

Harder replay session

71.96%

March 17 realism session

A harder pseudo-live check that exposes where applicability recall is still weak.

52.98% applicability FN on true non-REST

Tip of the iceberg

The public March 19 bundle is the tip of a much larger tuning and validation iceberg.

The featured 2-M16 run was not chosen on a single accuracy number. It emerged from repeated retraining, postprocess ablations, holdout audits, chronological replay, and pseudo-live replay until the deployment pair invariants stayed clean while the broader replay ladder remained competitive.

2,595 configs

Postprocess ablation

The March 16, 2026 website update documents a 2,595-config ablation over thresholds, smoothing, hysteresis, adjacency, and finger-mode settings.

96 retained sweep runs

Archived Step 2 + Step 3 cycle

The preserved `logs/sweep/` CSVs retain 96 completed training-plus-evaluation runs from the broader 2-M16 tuning cycle.

100+ model variants

Documented in the older Feb 26 update

The February 26, 2026 2-M16 tuning update states that 100+ model variants were trained across full-dataset, non-REST event-gated, and REST event-gated regimes.

30+ hours

Continuous sweep time

That same February 26, 2026 update describes a 30+ hour sweep and highlights a 90-run block that spanned about 33.3 hours from February 25, 2026 07:49 to February 26, 2026 17:07.

How the featured run was chosen

  • The March 19 checkpoint replaced the March 18 deployment candidate after the cleaned training corpus, explicit finger-applicability head, and refreshed replay bundle all aligned better than the previous public snapshot.
  • Selection favored the combination of strong holdout metrics, stronger replay behavior on the cleaned deployment corpus, and zero committed or sent invalid action-finger pairs across the published holdout and replay bundles.
  • The model was chosen because it behaved coherently across saved split metrics, chronological replay, and pseudo-live replay, not because it won on one leaderboard number.
  • The harder March 17 realism replay is still conservative on applicability recall, but it remains part of the public selection story because it shows where the deployment stack is still weak.

Training snapshot

Architecture

CNNLSTMFingerActionNet

The March 19 winning run combines an action head, an active-finger head, and a dedicated finger-applicability head.

Optimization

60 epochs · batch 64 · lr 0.001 · seed 43

These values come from the winning run's training config and match the published March 19 metrics bundle.

Split policy

group_trial · test_size 0.2 · calibration_size 0.1

The holdout bundle stays tied to a fixed split while calibration is separated from the main train/test partition.

Input + preprocessing

64 x 4 windows · center_detrend

Per-window centering and detrending are frozen into the winning run's preprocessing and normalizer config.

Interactive latent space

Curated PCA views

Each point is one EEG window projected from the learned latent representation into a three-component PCA view. PCA is shown here because it preserves a linear comparison across the full dataset, the training split, and the held-out test split.

The train/test pair stays colored by true finger so the geometry is directly comparable across splits. Correctness and deployment-gating views are available on the visualization page.

Open visualization page

PCA · Full dataset

Full-dataset PCA colored by true finger

Each point is one EEG window, projected into three principal components and colored by the labeled finger.

Separated regions suggest structured learned organization, while overlap marks similar or harder windows.

Train/test comparison at a glance

Full dataset: 12,447 windows. Train split: 10,146 windows. Held-out test split: 2,301 windows. Scan the pair below for whether the held-out geometry still resembles the training structure under the same color coding.

Split comparison

Same projection family, same label colors, different split membership.

PCA · Train split

Train-split PCA colored by true finger

This restricts the PCA view to the 10,146 training windows while keeping the same true-finger coloring used in the full-dataset view.

Keeping train and test on the same coloring makes split-to-split geometry easier to compare directly.

PCA · Test split

Test-split PCA colored by true finger

This shows the 2,301 held-out test windows only, using the same true-finger coloring so the class structure can be compared against the training split.

If the held-out view preserves similar neighborhoods rather than collapsing, the representation is carrying structure beyond the fitting set.

Figures

These figures show error structure and confidence behavior for the featured run.

Action confusion matrix for 2-m16

Action confusion matrix

REST, OPEN, and CLOSE confusion for the featured bundle.

Finger confusion matrix for 2-m16

Finger confusion matrix (non-REST)

Finger-level confusion after REST windows are removed from the task.

Calibration figure for 2-m16

Calibration

Confidence versus observed accuracy for the featured bundle.

Uncertainty scatter for 2-m16

Confidence and uncertainty scatter

Confidence spread for the featured bundle.

Topomaps and signal evidence

These figures are signal-evidence context. They are not substitutes for the holdout and replay metrics above.

Action alpha rest-delta topomap

Action alpha rest-delta topomap

Rest-relative alpha maps for REST, OPEN, and CLOSE in the March 19 winning session. OPEN and CLOSE both show the dominant TP10 decrease and smaller TP9 increase that characterize the current 2-M16 action story.

Finger alpha rest-delta topomap with NONE reference

Finger alpha rest-delta topomap with NONE reference

Finger-level rest-delta maps, including NONE as the explicit REST reference. The strongest variation remains concentrated on TP10 and TP9, which helps explain why lateral Muse 2 channels carry most of the finger-separation load.

Published runs

The featured bundle is shown in context. The 1-M16 bundle is an earlier baseline and should not be treated as a direct comparison to the current 2-M16 run.

RunRoleAction accuracyFinger accuracy on non-RESTTest windowsLinks

2-m16

March 19, 2026

Featured run89.79%87.01%2,301

1-m16-500

March 5, 2026

Historical baseline83.94%80.61%2,652