Open Access BASE2021

VQCPC-GAN: VARIABLE-LENGTH ADVERSARIAL AUDIO SYNTHESIS USING VECTOR-QUANTIZED CONTRASTIVE PREDICTIVE CODING

Abstract

International audience ; Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the "image data". However, in the (musical) audio domain, it is often desired to generate output of variable duration. This paper presents VQCPC-GAN, an adversarial framework for synthesizing variablelength audio by exploiting Vector-Quantized Contrastive Predictive Coding (VQCPC). A sequence of VQCPC tokens extracted from real audio data serves as conditional input to a GAN architecture, providing step-wise time-dependent features of the generated content. The input noise z (characteristic in adversarial architectures) remains fixed over time, ensuring temporal consistency of global features. We evaluate the proposed model by comparing a diverse set of metrics against various strong baselines. Results show that, even though the baselines score best, VQCPC-GAN achieves comparable performance even when generating variable-length audio. Numerous sound examples are provided in the accompanying website, 1 and we release the code for reproducibility. 2 Index Terms-Generative Adversarial Networks, Audio Synthesis, Vector-Quantized Contrastive Predictive Coding * Nistal received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068. 1 sonycslparis.github.io/vqcpc-gan.io 2 github.com/SonyCSLParis/vqcpc-gan

Problem melden

Wenn Sie Probleme mit dem Zugriff auf einen gefundenen Titel haben, können Sie sich über dieses Formular gern an uns wenden. Schreiben Sie uns hierüber auch gern, wenn Ihnen Fehler in der Titelanzeige aufgefallen sind.