Out-of-distribution generalization via composition: A lens through induction heads in Transformers

Jiajun Song; Zhuoyan Xu; Yiqiao Zhong

doi:10.1073/pnas.2417182122

What is it about?

Transformers are the model architectures of large language models (LLMs) that empower the rapid development of generative AI in recent years. We are interested in studying why and how LLMs sometimes solve novel tasks and generate novel texts, even when they are not explicit trained to do so. We focus on how Transformers learn to compose basic functions in order solve complicated tasks.

Photo by Mark Rasmuson on Unsplash

Why is it important?

Despite empirical advances of LLMs, there is a lack of scientific insights about why they appear smart. Classical statistical analyses fall short of providing an explanation, because current LLMs can solve certain tasks even when they are not explicitly told to do so. Our paper provides a detailed analysis by probing the model and making comprehensive measurements to find a new explanation.

Perspectives

I think LLMs represent a new paradigm in data science---not only introduce powerful tools and methods, but also brings intellectual excitement. Unlike classical statistical generalization, which focuses on training models for specific tasks, LLMs exhibit a new form of generalization: they learn latent rules and apply them across diverse tasks.
Yiqiao Zhong
University of Wisconsin Madison

This page is a summary of: Out-of-distribution generalization via composition: A lens through induction heads in Transformers, Proceedings of the National Academy of Sciences, February 2025, Proceedings of the National Academy of Sciences,
DOI: 10.1073/pnas.2417182122.
You can read the full text:

Read

Contributors

The following have contributed to this page

Yiqiao Zhong
University of Wisconsin Madison

Why do large language models appear intelligent? An in-depth analysis of their internal mechanism

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Why do large language models appear intelligent? An in-depth analysis of their internal mechanism

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management