Artificial Synaesthesia: An exploration of machine learning image synthesis for soundscape audio visualisation
Within soundscape research, audio visualisations are generally targeted towards scientific audiences and not produced with artistic intentions. Machine learning is utilised in this field to increase efficiency when dealing with large amounts of data, recognizing patterns, and classifying audio. Although machine learning is effective for these purposes, it also offers image synthesis capabilities which have not been taken advantage of in current audio visualisation production. This research aims to answer the question, ‘How might machine learning image synthesis be used to visualise soundscape audio?’.
Through an iterative design process, a design pipeline was developed to generate visualisations of audio using Pix2Pix (Isola, et al., 2016), a conditional adversarial neural network. Through a process of extracting audio features, converting these into simple grid images and feeding them into a trained machine learning model, a new visual interpretation of the audio can be experienced in the form of images and videos. The video design outputs communicate visual change of audio through the interrelated transformation of colour, shape, detail, and size of flower-like figures. These outputs aim to bring attention to the value of soundscapes through visually demonstrating their unique qualities. The method developed has not been previously documented according to the available literature and marks an exciting exploration of the new application of machine learning image-to-image translation as a creative tool for audio visualisation.