2026-04-29

2-M16 live inference diagnosis after the April 28 recording

A corrected same-sitting replay showed that the latest live-control failure was not just an event-label problem: the deployed model classified every corrected movement window as REST, with evidence of low-amplitude temporal channels in the live input distribution.

Historical note: archived update posts preserve the figures published at that time. For the current verified run bundles, use the results page.

Overview

After the April 28, 2026 2-M16 recording, we found two separate issues:

  1. The event file for the new session had an invalid first mark that became REST-like data.
  2. Even after repairing those events, the live-control model still failed to recognize the intended movements from that sitting.

The corrected event sequence is now the intended protocol:

  • thumb close
  • thumb open
  • index close
  • index open
  • middle close
  • middle open
  • ring close
  • ring open
  • pinky close
  • pinky open

That repair was necessary, but it did not explain the live-control failure by itself.

Key Finding

The repaired session was replayed through the live-inference path as if it were streaming. On the 291 corrected movement windows, the deployed March 19 model chose REST as the raw top action for every window.

Raw action chosen by the modelCorrected movement windows
REST291 of 291
OPEN0 of 291
CLOSE0 of 291

This means the failure happened before robot-hand actuation logic. Cooldowns, stability checks, and actuation thresholds could not recover the movement because the model's action head was already treating the signal as REST.

Same-Sitting Replay Results

The full corrected sitting contains 2,226 replay windows:

Window sourceCountShare
REST / non-event time1,93586.93%
OPEN events1456.51%
CLOSE events1466.56%

Several model variants were tested. None should replace the current deployment model.

Replay candidateTraining / roleMovement recallMovement precisionFalse REST actuationReviewer takeaway
March 19 deploymentCurrent public baseline0.00%0.00%0.155%Safe, but missed every corrected movement event.
April 4 archivedEarlier conservative reference0.00%0.00%0.052%Same failure pattern as the deployed model.
March + April adaptationOld corpus plus April 28 session0.00%n/a0.000%Offline metrics improved, but live-style movement recall did not.
April-only aggressiveApril 28 session, lower REST weight17.18%15.72%12.92%Movement appeared, but false actuation was far too high.
April-only conservativeApril 28 session, moderate REST weight11.34%22.45%5.37%Still unsafe and still missed most movement.

The important result is not simply that one model underperformed. Conservative models stayed at REST, while session-only tuning produced too much false actuation. That combination points to a data-quality and robustness problem rather than a simple threshold adjustment.

Signal Quality Evidence

The latest live input distribution was compared to the offline reference distribution used by the deployed model. The report now flags the run as shifted_low_amplitude.

Muse channelLive RMS ratio vs referenceInterpretation
TP90.460Much quieter than expected
AF70.894Slightly quieter
AF81.016In range
TP100.718Quieter than expected

Two channels were materially quieter than expected, with TP9 especially low. This is consistent with the model preferring REST during intended movement. The likely causes are headset contact, placement, or session-to-session amplitude shift.

We also checked whether the channels were secretly ordered incorrectly. Testing all channel permutations did not recover usable movement recall, so the evidence does not support channel order as the main cause.

Engineering Fixes

Several safeguards were added during this investigation:

  • Step 1 now rejects invalid open/close marks when no finger is selected, instead of allowing them to become REST-like events.
  • The UI launch path now gives clearer warnings around keyboard capture and actuation configuration.
  • Live distribution analysis now catches channel-local low-amplitude problems, not only broad signal collapse.
  • Live preflight output now includes per-channel RMS ratios for faster diagnosis.
  • Pseudo-live replay and archived-model evaluation now handle model-local temperature files more reliably.
  • Dataset merging now tolerates expected scalar metadata differences.

These changes make future failures easier to detect and prevent, but they do not make the April 28 low-count session sufficient for a new deployment model.

Decision

No newly trained April 28 model is being promoted.

The current March 19 deployment model remains the conservative baseline. This investigation adds a stricter next requirement: before live robot-hand control, the system needs a short same-sitting calibration and preflight check that confirms both signal quality and movement recall.

The next model should only be accepted if it passes both conditions:

  • It sends meaningful non-REST actuation during same-sitting pseudo-live replay.
  • It keeps false REST actuation low during rest-by-exclusion replay.

This is a useful negative result. It separates an event-recording bug from a deeper live-signal robustness problem, and it gives a concrete path for the next training session.