From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction
Visual multimedia have become an inseparable part of our digital social lives, and they often capture moments tied with deep affections. Automated visual sentiment analysis tools can provide a means of extracting the rich feelings and latent dispositions embedded in these media. In this work, we explore how Convolutional Neural Networks (CNNs), a now de facto computational machine learning tool particularly in the area of Computer Vision, can be specifically applied to the task of visual sentiment prediction. We accomplish this through fine-tuning experiments using a state-of-the-art CNN and via rigorous architecture analysis, we present several modifications that lead to accuracy improvements over prior art on a dataset of images from a popular social media platform. We additionally present visualizations of local patterns that the network learned to associate with image sentiment for insight into how visual positivity (or negativity) is perceived by the model. ; This work has been developed in the framework of the BigGraph TEC2013-43935-R project, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). It has been supported by the Severo Ochoa Program's SEV2015-0493 grant awarded by the Spanish Government, the TIN2015-65316 project by the Spanish Ministerio de Economía y Competitividad and contracts 2014-SGR-1051 by Generalitat de Catalunya. The Image Processing Group at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and X used in this work and the support of BSC/UPC NVIDIA GPU Center of Excellence. ; Peer Reviewed ; Postprint (author's final draft)