Optimizing MediaPipe for the assessment of hand trajectories using a touchscreen shape-tracing task


MediaPipe is an artificial intelligence-based system that offers a markerless, lightweight approach to motion capture. Yet, its optimal pipeline and accuracy against a known standard for investigations of upper-limb movements is unknown. We aimed to 1) determine optimal post-processing parameters for assessing hand/arm movements via MediaPipe, and 2) evaluate MediaPipe against a known standard. Participants (N = 4) performed one block (10 trials) of a touchscreen-based shape-tracing task. Trials were captured by a camera (GoPro Hero8, 30FPS), with videos cropped to 2sec clips (one trial per video). 1) Two clips were processed via MediaPipe to obtain x-y coordinate data of hand trajectories. The following post-processing parameters were applied: MediaPipe and touchscreen generated coordinates were resampled using spline interpolation (250 datapoints; original reference frames) and normalised. Following Procrustes transformations, root mean squared error (RMSE; primary outcome measure) was calculated for coordinates generated by MediaPipe vs. the touchscreen computer. RMSE decreased with post-processing (RMSEraw=103.3±7.71px, RMSEpost-processed=1.3±0.15px, d>1). 2) We applied our pipeline to 25 clips and conducted an equivalence test between coordinates generated by MediaPipe vs. the touchscreen computer. Preliminary findings indicate accuracy differed between MediaPipe and the touchscreen computer, but the true difference was between 0-2px (t(24) = -31.0, p < .001; mean RMSE=1.2, 90%CI[1.19, 1.27]). This work identifies key post-processing parameters for MediaPipe applied to evaluate upper-limb movements. Future work will quantify the extent to which MediaPipe differs from a known standard across the full dataset, overall informing applications of MediaPipe in investigations of upper-limb movement.