Convolutional generative adversarial network, via transfer learning, for traditional Scottish music generation

Marchetti, Francesco and Wilson, Callum and Powell, Cheyenne and Minisci, Edmondo and Riccardi, Annalisa; Romero, Juan and Martins, Tiago and Rodríguez-Fernández, Nereida, eds. (2021) Convolutional generative adversarial network, via transfer learning, for traditional Scottish music generation. In: Artificial Intelligence in Music, Sound, Art and Design. Lecture Notes in Computer Science . Springer, ESP, pp. 187-202. ISBN 9783030729141 (

[thumbnail of Marchetti-etal-EvoMUSART-2021-Convolutional-generative-adversarial-network-via-transfer-learning]
Text. Filename: Marchetti_etal_EvoMUSART_2021_Convolutional_generative_adversarial_network_via_transfer_learning.pdf
Accepted Author Manuscript

Download (958kB)| Preview


The concept of a Binary Multi-track Sequential Generative Adversarial Network (BinaryMuseGAN) used for the generation of music has been applied and tested for various types of music. However, the concept is yet to be tested on more specific genres of music such as traditional Scottish music, for which extensive collections are not readily available. Hence exploring the capabilities of a Transfer Learning (TL) approach on these types of music is an interesting challenge for the methodology. The curated set of MIDI Scottish melodies was preprocessed in order to obtain the same number of tracks used in the BinaryMuseGAN model; converted into pianoroll format and then used as training set to fine tune a pretrained model, generated from the Lakh MIDI dataset. The results obtained have been compared with the results obtained by training the same GAN model from scratch on the sole Scottish music dataset. Results are presented in terms of variation and average performances achieved at different epochs for five performance metrics, three adopted from the Lakh dataset (qualified note rate, polyphonicity, tonal distance) and two custom defined to highlight Scottish music characteristics (dotted rhythm and pentatonic note). From these results, the TL method shows to be more effective, with lower number of epochs, to converge stably and closely to the original dataset reference metrics values.