What is it about?
A real-world application or setting involves interaction between different modalities (e.g., video, speech, text). In order to process the multimodal information automatically and use it for an end application, Multimodal Representation Learning (MRL) has emerged as an active area of research in recent times. However, in practice, the data acquired from different sources are typically noisy. In some extreme cases, a noise of large magnitude can completely alter the semantics of the data leading to inconsistencies in the parallel multimodal data. In this paper, we propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique. In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality, and subsequently, the contribution from each modality is dynamically varied while estimating the joint distribution.
Featured Image
Photo by Possessed Photography on Unsplash
Why is it important?
This work addresses an important problem in AI domain. For an AI system to be seamless in usage, it should be able to make use of information coming from different modalities. This work is a step towards this goal. We propose how to reliably combining information coming from different modalities in real world settings.
Perspectives
Read the Original
This page is a summary of: Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments, November 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3536221.3556596.
You can read the full text:
Resources
Contributors
The following have contributed to this page