What is it about?

According to Stevens’s classification of measurement, continuous data can be either ratio or interval scale data. The relationship between two continuous variables is assumed to be linear and is estimated with the Pearson correlation coefficient, which assumes normality between the variables. If researchers use conventional statistics (t-test or analysis of variance) or factor analysis of correlation matrices to study gender or race differences, the data are assumed to be continuous and normally distributed. If continuous data are discretized, they become ordinal; thus, discretization is widely considered to be a downgrading of measurement. However, discretization is advantageous for data analysis, because it provides interactive relationships between the discretized variables and naturally measured categorical variables such as gender and race. Such interactive relationship information between categories is not available with the ratio or interval scale of measurement, but it is useful to researchers in some applications. In the present study, Wechsler intelligence and memory scores were discretized, and the interactive relationships were examined among the discretized Wechsler scores (by gender and race). Unlike in previous studies, we estimated category associations and used correlations to enhance their interpretation, and our results showed distinct gender and racial/ethnic group differences in the correlational patterns.

Featured Image

Why is it important?

Unlike previous human intelligence studies, our present study showed distinctive gender differences. In addition there were clear differences in cognitive ability performance levels (low, medium, high) across ethnic/racial groups (Blacks, Hispanics, and Whites). Such clear differentiations appeared in the gender x race interactions was because of the application of correspondence analysis to Wechsler intelligence/memory data.

Perspectives

The ordinary parametric approaches, such as analysis of (co)variance paradigms (univariate or multivariate), regression (univariate and multivariate) or factor analysis of inter-variable correlations, require normality of data or independence of predictors in regression, but correspondence analysis (CA) which may be considered as a multivariate analysis method does not require any assumptions required in these conventional analytic methods because CA analyzes counts as input data, rather than actual measurements. Therefore, CA is highly related with chi-square test, but unlike chi-square test, CA can analyze multiple related categorical variables (stacked variables column-wise as shown in the present study); in other words, independence is not required between categories because CA does not test any statistical significance either between categories or between (categorical) variables; CA is an exploratory method. Since CA is designed for analysis of two/multi-way contingency tables, one has to discretize continuous data (as done in our study) wisely. We usually recommend to use z-scores and z-score unites. We used the z-score of -1 or less as "low"; -1 <= z-score <= 1 as "medium"; and z-score > 1 as "high" performance; however, researchers can discretize the continuous data according to their preference. We see much potential in CA for social science research in future because it could open untravelled paths which have never been travelled before with the conventional analyses.

Dr. Se-Kang Kim
Fordham University

Read the Original

This page is a summary of: Gaining from discretization of continuous data: The correspondence analysis biplot approach, Behavior Research Methods, November 2018, Springer Science + Business Media,
DOI: 10.3758/s13428-018-1161-1.
You can read the full text:

Read

Contributors

The following have contributed to this page