What is it about?

Materials that are designed to perform a special function are called functional materials. These materials can be useful as energy storage devices or new medicine. New functional materials are discovered using machine learning models. By studying well known materials and their properties, these models learn to predict new functional molecules. But first, the molecule needs to be represented as a string of alphabets, numbers, and special characters. This helps the model understand what molecule it is. In chemistry, this is done with SMILES (simplified molecular input line entry system). In SMILES, alphabets represent a chain of atoms, brackets denote branches in the main chain and numbers represent rings. But, if a bracket or number is missed or misplaced, the SMILES string becomes an invalid molecule. This study presents a new way of representing the molecules. This is called SELFIES (self-referencing embedded strings). SELFIES represent the branches and rings by their lengths. This makes every SELFIES string a valid molecule. The authors show that SELFIES has more storage memory than SMILES. In addition, it can be applied to any machine learning model.

Featured Image

Why is it important?

Every combination of characters in SMILES do not represent a valid molecule. A machine learning model that output its result as a SMILES string can, therefore, represent an incorrect molecule. In contrast, all SELFIES strings represent valid molecules. It is versatile and can be used by different machine learning models. SELFIES produces twice as many diverse molecules than SMILES. Further, it makes the model outputs easy to interpret. KEY TAKEAWAY: SELFIES offers a robust way to represent molecules. It is easy to interpret and use in different models to predict new functional materials.

Read the Original

This page is a summary of: Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation, Machine Learning Science and Technology, October 2020, Institute of Physics Publishing,
DOI: 10.1088/2632-2153/aba947.
You can read the full text:

Read

Contributors

Be the first to contribute to this page