PianoMotion10M:
Dataset and Benchmark for Hand Motion Generation in Piano Performance

Zhejiang University1, Hangzhou Dianzi University2
ganqijun@zju.edu.cn
MY ALT TEXT

Fig. 1. Overview of our framework. We collect videos of professional piano performances from the Internet and process them to construct a large-scale dataset, PianoMotion10M, which comprises piano music, MIDI files and hand motions. Building upon this dataset, we establish a benchmark for generating hand motions from piano music.

Abstract

Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-hand motion generation benchmark to guide hand movements and fingerings for piano playing. To this end, we collect an annotated dataset, PianoMotion10M, consisting of 116 hours of piano playing videos from a bird's-eye view with 10 million annotated hand poses. We also introduce a powerful baseline model that generates hand motions from piano audios through a position predictor and a position-guided gesture generator. Furthermore, a series of evaluation metrics are designed to assess the performance of the baseline model, including motion similarity, smoothness, positional accuracy of left and right hands, and overall fidelity of movement distribution. Despite that piano key presses with respect to music scores or audios are already accessible, PianoMotion10M aims to provide guidance on piano fingering for instruction purposes.

MY ALT TEXT

Table. 1 .Comparison between different hand and motion datasets. The proposed PianoMotion10M dataset consists of piano music with corresponding hand poses for hand motion generation. Existing hand-image datasets are listed in the first four rows, and music-motion datasets are presented in the subsequent four rows for reference.

Generated Hand Motion of our Baseline

Copyright

All datasets and benchmarks on this page are copyright by us and published under the CC BY-NC 4.0 International License. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes.