No Measurement in Education Research — James E. Christensen

The Claim

Measurement is one of the most frequently invoked concepts in educational research. Researchers speak of measuring student achievement, measuring learning outcomes, measuring teacher effectiveness, measuring the impact of interventions. International assessments such as PISA, TIMSS, and NAEP are described as measuring educational quality across systems and nations. The entire apparatus of standardised testing is framed as an instrument of measurement.

James E. Christensen's argument is that this usage is systematically mistaken. Educational research does not measure, in the precise scientific sense of that term. What it does — and what it does with considerable sophistication — is enumerate, rate, and rank. These are legitimate and valuable activities. But calling them measurement obscures important limitations and generates misleading inferences about the precision and generalisability of educational research findings.

What Measurement Actually Requires

In the natural sciences, measurement has a precise technical meaning. To measure a quantity is to determine its magnitude on a ratio scale — a scale that has a true zero point (representing the complete absence of the quantity) and equal intervals throughout, such that one can legitimately say that one value is twice another, or that the difference between two values is equal to the difference between two other values.

Length, mass, time, temperature (on the Kelvin scale), electrical charge, and force are measurable quantities in this sense. When a physicist measures the length of a rod as 2.4 metres, the claim has precise mathematical content: the rod is exactly 2.4 times as long as the unit rod, and one can perform arithmetic operations on this value with full confidence in their validity.

The question Christensen poses is: do the quantities that educational researchers purport to measure have these properties? Do student achievement, learning, intelligence, motivation, engagement, or teaching effectiveness have true zero points? Are the intervals on the scales used to assess them equal throughout? Can one legitimately say that a student who scores 80 on a test has twice the achievement of a student who scores 40?

The answer, on examination, is no. The constructs that educational researchers study do not have natural ratio-scale properties. There is no true zero of learning — a student who knows nothing about a subject still has a cognitive state, and the claim that their learning of that subject is literally zero is not well-defined. The intervals on standardised test scales are not genuinely equal — the difference between a score of 40 and 50 does not represent the same quantity of learning as the difference between 70 and 80. And it is not meaningful to say that one student's achievement is twice another's.

What Educational Research Actually Does

If educational research does not measure, what does it do? Christensen identifies three activities that are routinely mislabelled as measurement:

Enumeration

Enumeration is counting: determining how many instances of a category are present. Counting the number of students who pass or fail an examination, the number of correct answers on a test, the number of students in a school who read at or above a specified level — these are enumerations. They are precise and informative, but they are not measurements of a continuous quantity. A count of correct answers tells you how many items a student answered correctly; it does not tell you the magnitude of their knowledge or their learning.

Much of what passes for educational measurement is enumeration. Standardised tests produce counts of correct responses; these counts are then treated as if they were measurements of underlying quantities of knowledge, ability, or achievement. The statistical manipulation of these counts — computing means, standard deviations, effect sizes, and regression coefficients — creates an appearance of measurement precision that the underlying data do not actually possess.

Rating

Rating is the assignment of values on an ordinal scale — a scale that orders items from less to more but does not have equal intervals or a true zero. Likert scales, which ask respondents to rate their agreement or satisfaction on a five-point or seven-point scale, are the most familiar example. Rating scales for teacher quality, school climate, student engagement, and similar constructs are ordinal: a rating of 4 is higher than a rating of 3, but the difference between 4 and 3 is not necessarily equal to the difference between 3 and 2.

The mathematical operations that require equal intervals — addition, subtraction, computation of means and standard deviations — are not strictly valid for ordinal data. The practice of computing means and standard deviations from Likert-scale data, which is ubiquitous in educational research, implicitly treats ordinal ratings as if they were interval measurements. This is a systematic logical error, however widespread its practice.

Ranking

Ranking is the ordering of items from highest to lowest, without any implication about the magnitude of the differences between them. League tables of school performance, rankings of students by test score, and ordered lists of nations by educational attainment are rankings. They tell you who is first, second, and third; they do not tell you how large the differences between them are, or whether those differences are meaningful.

Rankings are frequently presented and interpreted as if they were measurements. A nation ranked first in reading achievement is described as if its achievement were measurably superior to nations ranked second and third — but the ranking itself provides no information about the magnitude of the differences. Two nations may be effectively identical in their students' reading abilities while being separated in the rankings by a single scale point; two others may be genuinely and substantially different. The ranking obscures this information.

Why the Distinction Matters

The distinction between measurement and enumeration-rating-ranking is not merely a semantic point. It has substantive implications for how educational research findings are interpreted and used.

When educational researchers and policymakers treat ordinal ratings and enumerations as if they were ratio-scale measurements, they draw inferences that the data cannot support. They compute effect sizes and claim that an intervention produced a "measurable improvement" of a specified magnitude. They compare national averages and claim that one country's educational system is producing a quantifiably higher level of learning than another. They perform cost-benefit analyses that assign monetary values to educational outcomes, as if those outcomes had been measured on a scale with known units.

These inferences exceed what the data actually show. The result is a systematic overconfidence in the precision and generalisability of educational research findings — and a corresponding underinvestment in the more qualitative, interpretive, and contextual forms of inquiry that are better suited to the actual nature of educational phenomena.

Implications for Educology

From an educological perspective, the distinction between measurement and enumeration-rating-ranking reflects a deeper point about the nature of educational phenomena. Education is a system of human transactions involving intentional activity, social relationships, and the development of knowing. These phenomena are not physical quantities; they do not have magnitudes in the way that length and mass do. They can be described, classified, compared, and evaluated — but they cannot, in the strict sense, be measured.

This does not mean that quantitative methods are inappropriate in educational research. Enumeration, rating, and ranking are valuable tools. But their value depends on using them for what they actually are — tools for counting, ordering, and comparing — rather than pretending that they yield measurements in the natural-scientific sense.

Christensen's argument is ultimately a call for methodological honesty: for educational researchers to describe what they are doing accurately, to make inferences that are warranted by their methods, and to acknowledge the limitations of quantitative approaches to the study of inherently human phenomena. A more honest methodological vocabulary would not diminish the value of educational research — it would enhance it, by making clear what the research can and cannot tell us.

Bibliography

Christensen, J. E. (2021). There Is No Measurement in Research about Education. jamesechristensen.com.

Stevens, S. S. (1946). On the Theory of Scales of Measurement. Science, 103(2684), 677–680.

Michell, J. (1999). Measurement in Psychology: A Critical History of a Methodological Concept. Cambridge University Press.

Luce, R. D., & Tukey, J. W. (1964). Simultaneous Conjoint Measurement: A New Type of Fundamental Measurement. Journal of Mathematical Psychology, 1(1), 1–27.

Christensen, J. E. (2013). Education, Knowledge and Educology. Lulu Press.

Bibliography

Christensen, J. E. (2021). There Is No Measurement in Research about Education. jamesechristensen.com. Stevens, S. S. (1946). On the Theory of Scales of Measurement. Science, 103(2684), 677–680. Michell, J. (1999). Measurement in Psychology: A Critical History of a Methodological Concept. Cambridge University Press. Luce, R. D., & Tukey, J. W. (1964). Simultaneous Conjoint Measurement: A New Type of Fundamental Measurement. Journal of Mathematical Psychology, 1(1), 1–27. Christensen, J. E. (2013). Education, Knowledge and Educology. Lulu Press.

There Is No Measurement in Research about Education, but There Is Plenty of Enumeration, Rating and Ranking

The Claim

What Measurement Actually Requires

What Educational Research Actually Does

Enumeration

Rating

Ranking

Why the Distinction Matters

Implications for Educology

Bibliography

Bibliography

Join the r.Educology Research Network