Tip to iceberg

Deep dive

The main results page stays compact on purpose. This page is where the deeper process lives: the broader tuning campaign, the way the model was selected, the runtime defaults that were frozen for replay, and the dated trail that shows how the current bundle displaced earlier candidates.

Featured run

2-m16 · March 19, 2026

Current featured public bundle with March 19 metrics, figures, and replay summaries.

Published corpus

2 runs across 2 subjects

4,953 total held-out windows are public on the site today.

Current holdout

84.66%

REST TPR 98.37% with applicability diagnostics carried alongside the headline metric.

Replay path

86.64% committed joint

Pseudo-live replay stays part of selection because deployment behavior matters more than one split score.

Search and selection scale

The public March 19 bundle is the tip of a much larger tuning and validation iceberg.

The featured 2-M16 run was not chosen on a single accuracy number. It emerged from repeated retraining, postprocess ablations, holdout audits, chronological replay, and pseudo-live replay until the deployment pair invariants stayed clean while the broader replay ladder remained competitive.

2,595 configs

Postprocess ablation

The March 16, 2026 website update documents a 2,595-config ablation over thresholds, smoothing, hysteresis, adjacency, and finger-mode settings.

96 retained sweep runs

Archived Step 2 + Step 3 cycle

The preserved `logs/sweep/` CSVs retain 96 completed training-plus-evaluation runs from the broader 2-M16 tuning cycle.

100+ model variants

Documented in the older Feb 26 update

The February 26, 2026 2-M16 tuning update states that 100+ model variants were trained across full-dataset, non-REST event-gated, and REST event-gated regimes.

30+ hours

Continuous sweep time

That same February 26, 2026 update describes a 30+ hour sweep and highlights a 90-run block that spanned about 33.3 hours from February 25, 2026 07:49 to February 26, 2026 17:07.

Selection logic

Why this model won

  • The March 19 checkpoint replaced the March 18 deployment candidate after the cleaned training corpus, explicit finger-applicability head, and refreshed replay bundle all aligned better than the previous public snapshot.
  • Selection favored the combination of strong holdout metrics, stronger replay behavior on the cleaned deployment corpus, and zero committed or sent invalid action-finger pairs across the published holdout and replay bundles.
  • The model was chosen because it behaved coherently across saved split metrics, chronological replay, and pseudo-live replay, not because it won on one leaderboard number.
  • The harder March 17 realism replay is still conservative on applicability recall, but it remains part of the public selection story because it shows where the deployment stack is still weak.

Evidence ladder

What had to stay coherent

Saved test split

89.79%

87.01% finger on non-REST

2,301 held-out windows

Primary holdout

84.66%

98.37% REST TPR

2.26% applicability FN on true non-REST

Pseudo-live replay

86.64%

93.32% would-send precision

0.12% false REST actuation

Harder realism replay

71.96%

9.72% committed finger on non-REST

52.98% applicability FN on true non-REST

Frozen stack

Training recipe and runtime defaults

The winning run is not just a weights file. It includes the architecture choice, split policy, preprocessing, auxiliary quiet-REST support, and the specific replay defaults that were carried into pseudo-live evaluation.

Training stack

Architecture

CNNLSTMFingerActionNet

The March 19 winning run combines an action head, an active-finger head, and a dedicated finger-applicability head.

Optimization

60 epochs · batch 64 · lr 0.001 · seed 43

These values come from the winning run's training config and match the published March 19 metrics bundle.

Split policy

group_trial · test_size 0.2 · calibration_size 0.1

The holdout bundle stays tied to a fixed split while calibration is separated from the main train/test partition.

Input + preprocessing

64 x 4 windows · center_detrend

Per-window centering and detrending are frozen into the winning run's preprocessing and normalizer config.

Sampler

core_event_equalized

Training equalizes the core REST-event mass while still keeping the auxiliary quiet-rest corpus train-only.

REST support

1,059 auxiliary quiet-rest windows

The auxiliary quiet-rest session is used for train-only support while the core split contributes 11,388 windows and the public test split contributes 2,301 windows.

Replay and runtime stack

Frozen live defaults

EMA smoothing (5) · action 0.05 · finger 0.20 · applicability 0.40

The March 19 bundle freezes the same deployment-facing thresholds reflected in the replay artifacts, report HTML, and website.

Actuation gates

min_prob 0.2 · stability 3 · cooldown 250 ms

These are the saved Step 7 decision defaults for the current deployment candidate.

Replay cadence

0.25 s windows · 0.05 s hop · 10 MC passes

The pseudo-live replay runs the same checkpoint through the saved inference and actuation path at a replay cadence close to live use.

Replay latency

127.21 ms mean · 127.29 ms p95

The current cleaned-corpus pseudo-live replay logs stable prediction latency across 12,447 windows.

Would-send onset

0.083 s median · 0.317 s p95

These onset figures come from the current cleaned-corpus pseudo-live replay and are exposed in the public benchmark ladder.

Replay footprint

12,447 windows over 3,046.15 s

The cleaned deployment replay is long enough to expose transition behavior, actuation suppression reasons, and latency stability rather than only short held-out windows.

Dated trail

How the campaign evolved

  • The February 26, 2026 update documents the earlier large-scale weight and hyperparameter campaign: 100+ trained variants, a 30+ hour sweep, and a largest logged 90-run block.
  • The March 16, 2026 update documents the later deployment-facing postprocess ablation that froze the live default family after 2,595 evaluated configurations.
  • The March 18, 2026 update widened the selection criteria from holdout accuracy alone to include full-session replay, pseudo-live behavior, and the end-to-end Step 7 control path.
  • The March 19, 2026 update finalized the current winning bundle by moving to the cleaned corpus and publishing applicability diagnostics directly alongside the deployment pair invariant.

Recommended material

Keep reading

Earlier checkpoint

Subject 1-M16 — Run 2026-03-05

The 1-M16 bundle is still useful because it shows a second subject, an earlier tuning cycle, and the kind of metrics that were available before the broader March 2026 evaluation push. It is not the model-selection path that produced the current featured 2-M16 run.

2026-02-27 refresh

Published tuning checkpoint

The website's February 27, 2026 update documents a tuned 1-M16 rerun and compares it against the February 21 baseline.

Historical baseline

Earlier methods stack

Later March updates explicitly warn that 1-M16 used earlier methods and should not be treated as a direct comparison to the current 2-M16 deployment stack.

60 epochs

Published training budget

The public bundle preserves the core training budget: 60 epochs, batch size 64, learning rate 0.001, and seed 45.