What is it about?

Measurement is regarded the hallmark of modern science because it enabled the successes of the physical sciences. Statistics (e.g., psychometrics) is considered psychology's primary means for measuring the phenomena of the human mind given that these are not directly observable in others. However, physical measurement is also targeted at phenomena that are not directly observable or not accurately enough (e.g., mass, temperature, density, electrical conductivity). Moreover, measurement had been established long before statistics was invented. Therefore, statistics neither is measurement nor is statistics necessary for measurement. * Statistics is not measurement This article shows that measurement and statistics involve different scientific activities that are designed to obtain different kinds of information and for different purposes. Statistics deals with structural relations in data regardless of what these data represent. Statistical modelling (e.g., in psychometrics) is useful for pragmatic purposes, such as to discriminate well and consistently between cases and in ways considered important (e.g., social relevance, relations to future outcomes). Measurement, by contrast, establishes traceable relations between the phenomena studied and the data representing information about them. Traceability allows for justified inferences from the results obtained (estimates) back to the quantities to be measured in the study phenomena (measurands). These fundamental differences are masked by the peculiarities of human language. Language is an essential means for studying human mind—and thus, for psychological research. * Confusing language with reality Language allows us to refer to anything that we can think of and even in its absence. We all can readily understand what is being described in questionnaires. The inbuilt semantics of our language, however, misleads many to mistake descriptions of the study phenomena (e.g., in rating scales) for the phenomena described themselves (e.g., thoughts, feelings)—thus, to confuse the word with the thing, language with reality. In studies using verbal scales, this often entails that judgements of verbal statements are mistaken for measurements of the phenomena described. Hence, there is a gap between psychologists' statistical models, on the one side, and the quantities to be determined in the study phenomena, on the other. Bridging this gap requires measurement. * Justified inferences from findings to study phenomena The article explains the basic principles of measurement, such as data generation traceability, numerical traceability and calibration. It shows how these principles guide the development of alternative methods in psychology that allow for generating data that can be traced back to the phenomena to be studied—and thus, for obtaining valid information about individuals. Importantly, to systematically anchor an analytical model in the empirical study phenomena, data generation must be based on individuals' spontaneous and unrestricted responses. Standardized rating scales, by contrast, have broad meaning and are therefore interpreted and used by raters differently. Because of this, ratings cannot be traced back to the empirical phenomena that raters wanted to express in them. * Crucial for real-world research This has long been recognised in real-world research where justified inferences from empirical data to the raters’ reality are of utmost importance—in clinical research. Patients' interpretation of their own health problems is known to vary over time. With more experience, patients understand their symptoms better, weigh their importance differently and consider different standards of comparison. Such response shifts entail that changes in patients' self-ratings often do not reflect actual changes in their health problems. This challenges the reliability, validity and utility of standardised scales for evidence-based evaluations of clinical theories, treatments and therapies. * AI for psychological research - a caveat Analysing individuals' unrestricted verbal responses efficiently is no longer time consuming. Nowadays, artificial language systems (e.g., NLP, LLMs) can be used efficiently, such as for summarising responses, filtering out key words and counting their frequencies. By contrast, using AI to generate or edit rating scales and constructs—as this is increasingly done—cannot establish traceable relations to the study phenomena. Standardising descriptions merely builds on the inbuilt semantics of language and therefore leads to confuse language with reality—and thus, judgments of descriptions with measurements of the phenomena described.

Featured Image

Why is it important?

Measurement is not just any activity to generate numerical data but involves defined traceable processes that justify the high public trust placed in it. Psychology’s ‘measurement’ jargon alludes to this scientific authority of genuine measurement. This misleads scientists and practitioners alike to assume that statistical results could be attributed to the individuals studied—although the processes that are necessary for justifying such attributions are neither empirically nor theoretically established.

Perspectives

Statistics is useful for pragmatic purposes and in its own right. But it should not be mistaken for measurement. Statistics neither analyses nor establishes empirical relations to the phenomena to be measured. That's the task of measurement.

Dr Jana Uher
University of Greenwich

Read the Original

This page is a summary of: Statistics is not measurement: The inbuilt semantics of psychometric scales and language-based models obscures crucial epistemic differences, Frontiers in Psychology, June 2025, Frontiers,
DOI: 10.3389/fpsyg.2025.1534270.
You can read the full text:

Read

Contributors

The following have contributed to this page