author:"Giró Nieto, Xavier" | Pollux - Fachinformationsdienst Politikwissenschaft

Open Access#12017

From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction

Campos Camunez, Victor; Jou, Brendan; Giró Nieto, Xavier

Visual multimedia have become an inseparable part of our digital social lives, and they often capture moments tied with deep affections. Automated visual sentiment analysis tools can provide a means of extracting the rich feelings and latent dispositions embedded in these media. In this work, we explore how Convolutional Neural Networks (CNNs), a now de facto computational machine learning tool particularly in the area of Computer Vision, can be specifically applied to the task of visual sentiment prediction. We accomplish this through fine-tuning experiments using a state-of-the-art CNN and via rigorous architecture analysis, we present several modifications that lead to accuracy improvements over prior art on a dataset of images from a popular social media platform. We additionally present visualizations of local patterns that the network learned to associate with image sentiment for insight into how visual positivity (or negativity) is perceived by the model. ; This work has been developed in the framework of the BigGraph TEC2013-43935-R project, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). It has been supported by the Severo Ochoa Program's SEV2015-0493 grant awarded by the Spanish Government, the TIN2015-65316 project by the Spanish Ministerio de Economía y Competitividad and contracts 2014-SGR-1051 by Generalitat de Catalunya. The Image Processing Group at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and X used in this work and the support of BSC/UPC NVIDIA GPU Center of Excellence. ; Peer Reviewed ; Postprint (author's final draft)

Zugriff(Open Access)

BASE

Exportieren

Open Access#22017

From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction

Campos Camunez, Victor; Jou, Brendan; Giró Nieto, Xavier

Visual multimedia have become an inseparable part of our digital social lives, and they often capture moments tied with deep affections. Automated visual sentiment analysis tools can provide a means of extracting the rich feelings and latent dispositions embedded in these media. In this work, we explore how Convolutional Neural Networks (CNNs), a now de facto computational machine learning tool particularly in the area of Computer Vision, can be specifically applied to the task of visual sentiment prediction. We accomplish this through fine-tuning experiments using a state-of-the-art CNN and via rigorous architecture analysis, we present several modifications that lead to accuracy improvements over prior art on a dataset of images from a popular social media platform. We additionally present visualizations of local patterns that the network learned to associate with image sentiment for insight into how visual positivity (or negativity) is perceived by the model. ; This work has been developed in the framework of the BigGraph TEC2013-43935-R project, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). It has been supported by the Severo Ochoa Program's SEV2015-0493 grant awarded by the Spanish Government, the TIN2015-65316 project by the Spanish Ministerio de Economía y Competitividad and contracts 2014-SGR-1051 by Generalitat de Catalunya. The Image Processing Group at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and X used in this work and the support of BSC/UPC NVIDIA GPU Center of Excellence. ; Peer Reviewed ; Postprint (author's final draft)

Zugriff(Open Access)

BASE

Exportieren

Open Access#32020

Mask-guided sample selection for semi-supervised instance segmentation

Bellver Bueno, Míriam; Salvador Aguilera, Amaia; Torres Viñals, Jordi; Giró Nieto, Xavier

The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-020-09235-4 ; Image segmentation methods are usually trained with pixel-level annotations, which require significant human effort to collect. Weakly-supervised pipelines are the most common solution to address this constraint because they are trained with lower forms of supervision, such as bounding boxes or scribbles. Semi-supervised methods are another option, that leverage a large amount of unlabeled data and a limited number of strongly-labeled samples. In this second setup, samples to be strongly-annotated can be selected randomly or with an active learning mechanism that chooses the ones that will maximize the model performance. In this work, we propose a sample selection approach to decide which samples to annotate for semi-supervised instance segmentation. Our method consists in first predicting pseudo-masks for the unlabeled pool of samples, together with a score predicting the quality of each mask. This score is an estimate of the Intersection Over Union (IoU) of the segment with the ground truth mask. We study which samples should be annotated based on the quality score, leading to an improved performance for semi-supervised instance segmentation with low annotation budgets. ; This work was partially supported by the Spanish Ministry of Economy and Competitivity under contracts TIN2012-34557 by the BSC-CNS Severo Ochoa program (SEV-2011-00067), and contracts TEC2013-43935-R and TEC2016-75976-R. It has also been supported by grants 2014-SGR-1051 and 2014-SGR-1421 by the Government of Catalonia, and the European Regional Development Fund (ERDF). We would also like to acknowledge the valuable discussions with Victor Campos. ; Peer Reviewed ; Postprint (published version)

Zugriff(Open Access)

BASE

Exportieren

Open Access#42018

Skip RNN: learning to skip state updates in recurrent neural networks

Campos Camunez, Victor; Jou, Brendan; Giró Nieto, Xavier; Torres Viñals, Jordi; Chang, Shih-Fu

Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. Source code is publicly available at https://imatge-upc.github.io/skiprnn-2017-telecombcn/ ; This work was partially supported by the Spanish Ministry of Economy and Competitivity and the European Regional Development Fund (ERDF) under contracts TEC2016-75976-R and TIN2015- 65316-P, by the BSC-CNS Severo Ochoa program SEV-2015-0493, and grant 2014-SGR-1051 by the Catalan Government. Víctor Campos was supported by Obra Social "la Caixa" through La Caixa-Severo Ochoa International Doctoral Fellowship program. We would also like to thank the technical support team at the Barcelona Supercomputing Center. ; Postprint (published version)

Zugriff(Open Access)

BASE

Exportieren

Open Access#52017

Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster

Campos Camunez, Victor; Sastre, Francesc; Yagües, Maurici; Bellver, Míriam; Giró Nieto, Xavier; Torres Viñals, Jordi

Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. In this work, we explore how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the final accuracy of the models is studied. ; This work is partially supported by the Spanish Ministry of Economy and Competitivity under contract TIN2012-34557, by the BSC-CNS Severo Ochoa program (SEV-2011-00067), by the SGR programmes (2014-SGR-1051 and 2014-SGR-1421) of the Catalan Government and by the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economia y Competitividad and the European Regional Development Fund (ERDF). We also would like to thank the technical support team at the Barcelona Supercomputing center (BSC) especially to Carlos Tripiana. ; Peer Reviewed ; Postprint (published version)

Zugriff(Open Access)

BASE

Exportieren

Open Access#62017

Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster

Campos Camunez, Victor; Sastre, Francesc; Yagües, Maurici; Bellver, Míriam; Giró Nieto, Xavier; Torres Viñals, Jordi

Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. In this work, we explore how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the final accuracy of the models is studied. ; This work is partially supported by the Spanish Ministry of Economy and Competitivity under contract TIN2012-34557, by the BSC-CNS Severo Ochoa program (SEV-2011-00067), by the SGR programmes (2014-SGR-1051 and 2014-SGR-1421) of the Catalan Government and by the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economia y Competitividad and the European Regional Development Fund (ERDF). We also would like to thank the technical support team at the Barcelona Supercomputing center (BSC) especially to Carlos Tripiana. ; Peer Reviewed ; Postprint (published version)

Zugriff(Open Access)

BASE

Exportieren

Open Access#72022

Pixinwav: Residual steganography for hiding pixels in audio

Geleta Geleta, Margarita; Puntí Álvarez, Cristina; McGuinness, Kevin; Pons Puig, Jordi; Canton Ferrer, Cristian; Giró Nieto, Xavier

Steganography comprises the mechanics of hiding data in a host media that may be publicly available. While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spectrograms. Among our results, we find that the residual steganography setup we propose allows an encoding of the hidden image that is independent from the host audio without compromising quality. Accordingly, while previous works require both host and hidden signals to hide a signal, PixInWav can encode images offline—which can be later hidden, in a residual fashion, into any audio signal. ; Work partially supported by the European Union through the Erasmus+ student mobility program, Science Foundation Ireland (SFI) under grant numbers SFI/15/SIRG/3283 and SFI/12/RC/2289 P2, and the Spanish Research Agency (AEI) under project PID2020117142GB-I00 of the call MCIN/ AEI /10.13039/501100011033. ; Peer Reviewed ; Postprint (author's final draft)

Zugriff(Open Access)

BASE

Exportieren

Open Access#82019

RVOS: end-to-end recurrent network for video object segmentation

Ventura Royo, Carles; Bellver, Míriam; Girbau Xalabarder, Andreu; Salvador Aguilera, Amaia; Marqués Acosta, Fernando; Giró Nieto, Xavier

Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. We train RVOS for zero-shot video object segmentation and are the first ones to report quantitative results for DAVIS-2017 and YouTube-VOS benchmarks. Further, we adapt RVOS for one-shot video object segmentation by using the masks obtained in previous time steps as inputs to be processed by the recurrent module. Our model reaches comparable results to state-of-the-art techniques in YouTube-VOS benchmark and outperforms all previous video object segmentation methods not using online learning in the DAVIS-2017 benchmark. Moreover, our model achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU. ; This research was supported by the Spanish Ministry ofEconomy and Competitiveness and the European RegionalDevelopment Fund (TIN2015-66951-C2-2-R, TIN2015-65316-P & TEC2016-75976-R), the BSC-CNS SeveroOchoa SEV-2015-0493 and LaCaixa-Severo Ochoa Inter-national Doctoral Fellowship programs, the 2017 SGR 1414and the Industrial Doctorates 2017-DI-064 & 2017-DI-028from the Government of Catalonia ; Peer Reviewed ; Postprint (published version)

Zugriff(Open Access)

BASE

Exportieren

Open Access#92021

How2Sign: A large-scale multimodal dataset for continuous American sign language

Cardoso Duarte, Amanda; Palaskar, Shruti; Ventura Ripol, Lucas; Ghadiyaram, Deepti; DeHaan, Kenneth; Metze, Florian; Torres Viñals, Jordi; Giró Nieto, Xavier

One of the factors that have hindered progress in the areas of sign language recognition, translation, and production is the absence of large annotated datasets. Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was further recorded in the Panoptic studio enabling detailed 3D pose estimation. To evaluate the potential of How2Sign for real-world impact, we conduct a study with ASL signers and show that synthesized videos using our dataset can indeed be understood. The study further gives insights on challenges that computer vision should address in order to make progress in this field. Dataset website: http://how2sign.github.io/ ; This work received funding from Facebook through gifts to CMU and UPC; through projects TEC2016-75976-R, TIN2015- 65316-P, SEV-2015-0493 and PID2019-107255GB-C22 of the Spanish Government and 2017-SGR-1414 of Generalitat de Catalunya. This work used XSEDE's "Bridges" system at the Pittsburgh Supercomputing Center (NSF award ACI- 1445606). Amanda Duarte has received support from la Caixa Foundation (ID 100010434) under the fellowship code LCF/BQ/IN18/11660029. Shruti Palaskar was supported by the Facebook Fellowship program. ; Peer Reviewed ; Objectius de Desenvolupament Sostenible::10 - Reducció de les Desigualtats ; Objectius de Desenvolupament Sostenible::4 - Educació de Qualitat::4.5 - Per a 2030, eliminar les disparitats de gènere en l'educació i garantir l'accés en condicions d'igualtat a les persones vulnerables, incloses les persones amb discapacitat, els pobles indígenes i els nens i nenes en situacions de vulnerabilitat, a tots els nivells de l'ensenyament i la formació professional ; Objectius de Desenvolupament Sostenible::10 - Reducció de les Desigualtats::10.2 - Per a 2030, potenciar i promoure la inclusió social, econòmica i ...

Zugriff(Open Access)

BASE

Exportieren

Open Access#102021

H3D-Net: Few-shot high-fidelity 3D head reconstruction

Ramon Maldonado, Eduard; Triginer Garcés, Gil; Escurt i Gelabert, Janna; Pumarola Peris, Albert; García Giráldez, Jaime; Giró Nieto, Xavier; Moreno-Noguer, Francesc

Recent learning approaches that implicitly represent surface geometry using coordinate-based neural representations have shown impressive results in the problem of multi-view 3D reconstruction. The effectiveness of these techniques is, however, subject to the availability of a large number (several tens) of input views of the scene, and computationally demanding optimizations. In this paper, we tackle these limitations for the specific problem of few-shot full 3D head reconstruction, by endowing coordinate-based representations with a probabilistic shape prior that enables faster convergence and better generalization when using few input images (down to three). First, we learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations. At test time, we jointly overfit two coordinate-based neural networks to the scene, one modeling the geometry and another estimating the surface radiance, using implicit differentiable rendering. We devise a two-stage optimization strategy in which the learned prior is used to initialize and constrain the geometry during an initial optimization phase. Then, the prior is unfrozen and fine-tuned to the scene. By doing this, we achieve high-fidelity head reconstructions, including hair and shoulders, and with a high level of detail that consistently outperforms both state-of-the-art 3D Morphable Models methods in the few-shot scenario, and non-parametric methods when large sets of views are available. ; This work has been partially funded by the Spanish government with the projects MoHuCo PID2020-120049RBI00, DeeLight PID2020-117142GB-I00 and Maria de Maeztu Seal of Excellence MDM-2016-0656, and by the Government of Catalonia under 2017 DI 028. ; Peer Reviewed ; Postprint (author's final draft)

Zugriff(Open Access)

BASE

Exportieren

Open Access#112020

Enhancing online knowledge graph population with semantic knowledge

Fernández Cañellas, Dèlia; Rimmek, Joan Marco; Espadaler Rodés, Joan; Garolera Huguet, Blai; Barja Romero, Adrià; Codina, Marc; Sastre Rienitz, Marc; Giró Nieto, Xavier

Fernández Cañellas, Dèlia; Rimmek, Joan Marco; Espadaler Rodés, Joan; Garolera Huguet, Blai; Barja Romero, Adrià; Codina, Marc; Sastre Rienitz, Marc; Giró Nieto, Xavier; Riveiro, Juan Carlos; Bou Balust, Elisenda

Knowledge Graphs (KG) are becoming essential to organize, represent and store the world's knowledge, but they still rely heavily on humanly-curated structured data. Information Extraction (IE) tasks, like disambiguating entities and relations from unstructured text, are key to automate KG population. However, Natural Language Processing (NLP) methods alone can not guarantee the validity of the facts extracted and may introduce erroneous information into the KG. This work presents an end-to-end system that combines Semantic Knowledge and Validation techniques with NLP methods, to provide KG population of novel facts from clustered news events. The contributions of this paper are two-fold: First, we present a novel method for including entity-type knowledge into a Relation Extraction model, improving F1-Score over the baseline with TACRED and TypeRE datasets. Second, we increase the precision by adding data validation on top of the Relation Extraction method. These two contributions are combined in an industrial pipeline for automatic KG population over aggregated news, demonstrating increased data validity when performing online learning from unstructured web data. Finally, the TypeRE and AggregatedNewsRE datasets build to benchmark these results are also published to foster future research in this field. ; This work was partially supported by the Government of Catalonia under the industrial doctorate 2017 DI 011. ; Peer Reviewed ; Postprint (author's final draft)

Zugriff(Open Access)

BASE

Exportieren

Filter

Format

Medientyp

Sprache

Jahre

From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction

From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction

Mask-guided sample selection for semi-supervised instance segmentation

Skip RNN: learning to skip state updates in recurrent neural networks

Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster

Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster

Pixinwav: Residual steganography for hiding pixels in audio

RVOS: end-to-end recurrent network for video object segmentation

How2Sign: A large-scale multimodal dataset for continuous American sign language

H3D-Net: Few-shot high-fidelity 3D head reconstruction

Enhancing online knowledge graph population with semantic knowledge

Suchergebnisse

Filter

Format

Medientyp

Sprache

Jahre

Kontakt

Hilfe