What is it about?

Leverage the usage of large language models (LLMs) in molecular biology: protein phase transitions, focusing on aggregation—a key mechanism in age-related diseases. We compare the performance of the LLM to a biophysical-based model and define a "transition score" to quantify the propensity of a protein to undergo phase transition, focusing on aggregation.

Featured Image

Why is it important?

Our modeling approach enables the prioritization of protein condensates and their related genes which are associated with age-related diseases. We demonstrate that between the two main proteins involved in Alzheimer's disease (AD), one is more prone to form aggregates and hence might be more associated with the early-to-middle stages of the disease. This protein is significantly down-regulated in AD cases compared to controls, suggesting a natural defense mechanism and potential drug targets to slow disease progression. We also highlight the usage of the LLM in evaluating how sequence variants affect aggregation, an operation useful for protein design. The proposed modeling approach demonstrates the usefulness of fine-tuning a LLM for downstream tasks where only small datasets are available.

Perspectives

Based on the results of this paper, we anticipate that the use of LLMs will increase in the biophysics/ molecular biology fields of study.

Mor Frank

Read the Original

This page is a summary of: Leveraging a large language model to predict protein phase transition: A physical, multiscale, and interpretable approach, Proceedings of the National Academy of Sciences, August 2024, Proceedings of the National Academy of Sciences,
DOI: 10.1073/pnas.2320510121.
You can read the full text:

Read

Contributors

The following have contributed to this page