What is it about?
Transformers are the model architectures of large language models (LLMs) that empower the rapid development of generative AI in recent years. We are interested in studying why and how LLMs sometimes solve novel tasks and generate novel texts, even when they are not explicit trained to do so. We focus on how Transformers learn to compose basic functions in order solve complicated tasks.
Featured Image
Photo by Mark Rasmuson on Unsplash
Why is it important?
Despite empirical advances of LLMs, there is a lack of scientific insights about why they appear smart. Classical statistical analyses fall short of providing an explanation, because current LLMs can solve certain tasks even when they are not explicitly told to do so. Our paper provides a detailed analysis by probing the model and making comprehensive measurements to find a new explanation.
Perspectives
I think LLMs represent a new paradigm in data science---not only introduce powerful tools and methods, but also brings intellectual excitement. Unlike classical statistical generalization, which focuses on training models for specific tasks, LLMs exhibit a new form of generalization: they learn latent rules and apply them across diverse tasks.
Yiqiao Zhong
University of Wisconsin Madison
Read the Original
This page is a summary of: Out-of-distribution generalization via composition: A lens through induction heads in Transformers, Proceedings of the National Academy of Sciences, February 2025, Proceedings of the National Academy of Sciences,
DOI: 10.1073/pnas.2417182122.
You can read the full text:
Contributors
The following have contributed to this page







