GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text

Pengfei Liu; Yiming Ren; Jun Tao; Zhixiang Ren

doi:10.1016/j.compbiomed.2024.108073

What is it about?

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%–10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.

Photo by National Cancer Institute on Unsplash

Why is it important?

Molecular science covers a broad spectrum of fields that study the structures, properties, and interactions of molecules. It is a interdisciplinary field that draws on chemistry, physics, biology, and computer science. Molecular science is pivotal in drug discovery applications, such as target identification and validation, structure-based drug design, and side effect prediction. However, most existing methods of discovering new molecules or tweaking existing ones can be time-consuming, expensive, and prone to failure [1]. More recently, computational methods have shown significant advantages in molecule generation and tweaking [2]. These techniques enable rapid identification and optimization of potential drug candidates. However, these computational methods are limited by substantial computational demands.

Perspectives

•We propose a multimodal large molecular model for graph, image, and text inputs. •We propose an innovative modality fusion mechanism with cross-attention. •Our any-to-language strategy can fully exploit current large language models. •We achieve excellent performance in molecular generation and property prediction.
Zhixiang Ren
Peng Cheng Laboratory

This page is a summary of: GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text, Computers in Biology and Medicine, March 2024, Elsevier,
DOI: 10.1016/j.compbiomed.2024.108073.
You can read the full text:

Read

Contributors

The following have contributed to this page

Zhixiang Ren
Peng Cheng Laboratory

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management