What is it about?

The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information significantly improves Molecular Property Prediction (MPP) for both regression and classification tasks. Specifically, regression improvements, measured by reductions in Root Mean Square Error (RMSE), are up to 4.0%, while classification enhancements, measured by the area under the receiver operating characteristic curve (ROC-AUC), are up to 1.7%. Additionally, we discover that, as measured by ROC-AUC, augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2% and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%. The two consolidated insights offer crucial guidance for future advancements in drug discovery.

Featured Image

Why is it important?

Domain knowledge enhancement: Feature engineering: By utilizing chemical domain knowledge to design relevant features (e.g., functional groups, ring systems, molecular fragments), machine learning models can better capture the relationships between molecular structures and properties. Substructure analysis: Breaking down molecules into meaningful substructures allows for targeted analysis of specific chemical features that contribute to a particular property. Expert interpretation: Chemists can interpret model predictions and identify key molecular features driving the predicted property, facilitating further optimization. Domain knowledge enhancement: Feature engineering: By utilizing chemical domain knowledge to design relevant features (e.g., functional groups, ring systems, molecular fragments), machine learning models can better capture the relationships between molecular structures and properties. Substructure analysis: Breaking down molecules into meaningful substructures allows for targeted analysis of specific chemical features that contribute to a particular property. Expert interpretation: Chemists can interpret model predictions and identify key molecular features driving the predicted property, facilitating further optimization.

Perspectives

Domain knowledge enhancement: Feature engineering: By utilizing chemical domain knowledge to design relevant features (e.g., functional groups, ring systems, molecular fragments), machine learning models can better capture the relationships between molecular structures and properties. Substructure analysis: Breaking down molecules into meaningful substructures allows for targeted analysis of specific chemical features that contribute to a particular property. Expert interpretation: Chemists can interpret model predictions and identify key molecular features driving the predicted property, facilitating further optimization.

Zhixiang Ren
Peng Cheng Laboratory

Read the Original

This page is a summary of: Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey, Big Data Mining and Analytics, September 2024, Tsinghua University Press,
DOI: 10.26599/bdma.2024.9020028.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page