We present High-resolution guitar transcription, a method for training a guitar transcription model with excellent performance on real-world recordings. We use a domain adaptation approach to train a model on a small dataset of high-quality solo guitar transcriptions, based on the "High-resolution piano transcription" model by Kong et al..
Building on the work of Maman and Bermano, we align existing guitar transcriptions to the model activations of the piano transcription model. We then use these aligned transcriptions to train a new model, which is able to transcribe the entire GuitarSet in a zero shot setting with state-of-the-art accuracy.
Using our alignment method, we take a transcribed score and match it to the audio recording with high accuracy.
Here we show a piece from our training data. The original audio is from "Johnny Smith - Autumn Nocturne" and the video shows the aligned transcription. Note how the fine-alignment process recovers the micro-timing variations of chord onsets, despite these notes appearing in the same time instant in the original score.
The source data was obtained from professional transcriber, François Leduc. The GuitarPro files are commercially available from his website and for ease of reproduction are listed as follows:
Due to diverse training conditions, we are able to transcribe different types of guitar. In all of the following examples the original audio can be heard on the left channel, while the transcribed audio (synthesised as piano) can be heard on the right channel.
We also include some examples of our model applied to settings that are very different from the training data (out-of-distribution).
As mentioned in the paper, we identified two alignments in GuitarSet where all the notes appeared to be aligned to the second note onset, instead of the first note onset. This introduces a constant delay in the annotations rendering them inaccurate. The files are:
04_BN3-154-E_comp
(off by + 0.409s)04_Jazz1-200-B_comp
(off by + 0.309s)This work was supported by the UKRI Centre for Doctoral Training in Artificial Intelligence and Music. XR is supported by UK Research and Innovation [grant number EP/S022694/1].
MIDI visualisations were created using ffmpeg, sfizz, Salamander Piano samples and MIDI Visualizer.
Automatic Music Transcription is a large field, with lots of excellent work in recent years.
High-Resolution Piano Transcription with Pedals by Regressing Onset and Offset Times introduces the piano transcription model which we build on. They demonstrate that their model is robust to mis-aligned labels, due to a regression technique inspired by YOLOV3.
Unaligned Supervision For Automatic Music Transcription in The Wild provided inspiration for our alignment approach where a score is aligned to model activations from an existing transcription model.
Our work is specific to the guitar, but several other audio-to-MIDI solutions are available for other instruments.
TBC