Towards a systematic multi-modal representation learning for network data

Zied Ben Houidi; Raphael Azorin; Massimo Gallo; Alessandro Finamore; Dario Rossi

doi:10.1145/3563766.3564108

What is it about?

Learning the right representations from complex input data is the key ability of successful machine learning (ML) models. Unlike computer vision and natural language processing which target single well-defined modalities, network ML problems have a mixture of data modalities (numerical measurements, textual logs, host identifiers etc.). Yet, instead of exploiting such abundance, practitioners tend to rely on sub-features thereof, reducing the problem to a single modality for the sake of simplicity. In this paper, we advocate for exploiting all the modalities naturally present in network data. As a first step, we observe that network data systematically exhibits a bi-modal mixture of quantities (e.g., measurements), and entities (e.g., IP addresses, names, etc.). We propose to systematically leverage language models to learn entity representations, whenever significant sequences of such entities are historically observed. On two uses cases, we show that such entities encoding benefits and naturally augments classic quantity-based models.

Photo by Jordan Harrison on Unsplash

Why is it important?

Learning good representations from complex input data is the key ability of biological and artificial intelligent agents likewise. There, we notice that each data modality has its own best type of representations (e.g. edge and shape detectors for images) and best learning strategy (e.g., CNN or ViT for vision). If future networks will be autonomously driven by AI, then huge amounts of hybrid data will be continuously exposed to AI models for further actions and various downstream tasks. In such a future, our problem would become the most important in networking: how to build multi-modal learning strategies that can extract the best features from such mixture of quantities and entities.

Perspectives

In order to take full advantage of emerging ML techniques, the networking community must rethink its “retina” i.e., input data format, and “visual cortex” i.e., representation learning strategy. Given that the input is multi-modal, so must be the learning strategy. We hope that our bi-modal approach will open the way for future developments that take all data modalities into account.
Zied Ben Houidi
Huawei Technologies Co Ltd

This page is a summary of: Towards a systematic multi-modal representation learning for network data, November 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3563766.3564108.
You can read the full text:

Read

Contributors

The following have contributed to this page

Beyond vision and language, multi-modal representation learning from network logs

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Beyond vision and language, multi-modal representation learning from network logs

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management