Here is a very raw and unfinished “trying to wrap my head around some rather confusing issues” post. I have been thinking about levels of equivalence or invariance in cross-cultural measurement. I have been a wee bit unhappy with a couple of conceptual problems in the framework, but particularly the most general or abstract level of ‘functional equivalence’ has intrigued me for a while. Traditionally, it is more of a philosophical or theoretical statement of the similarity of functions of a psychological construct in different cultural groups. In other words, a particular behaviour serves the same functions in two or more cultural contexts.
I have been following some of the discussions on IOnet and the posts by Paul Barrett as well as the more biologically oriented personality literature. Following a few of these leads, I recently started reading some more conceptual and philosophical papers on the philosophy of measurement in psychology. More specifically, I just finished reading Joel Michell’s Quantitative science and the definition of measurement in psychology and Michael D. Maraun’s Measurement as a Normative Practice. These papers are superbly well written (as far as you can say that about these kinds of papers) and express quite a few of my growing concerns about psychological research in very clear terms. I started off wondering about functional equivalence, but got much bigger issues to chew on now.
Michell’s main logical argument is as follows (from his very concise reply to a number of commentaries, p. 401):
Premise 1. All measurement is of quantitative attributes.
Premise 2. Quantitative attributes are distinguished from non-quantitative attributes by the possession
of additive structure.
Premise 3. The issue of whether or not any attribute possesses additive structure is an empirical one.
Conclusion 1. The issue of whether or not any attribute is measurable is an empirical one.
Premise 4. With respect to any empirical hypothesis, the scientific task is to test it relative to the evidence.
Premise 5. Quantitative psychologists have hypothesized that some psychological attributes are
Final thesis. The scientific task for quantitative psychologists is to test the hypothesis that their
hypothesized attributes are measurable (i.e. that they possess additive structure).
The major task for psychology is to actually prove that anything that we do has a quantitative structure. Much of his review is taking to task the legacy of Fechner and especially Stevens (for those of you who ever suffered through some advanced methods classes… these names should be painfully familiar). It was an eye opener to see the larger context and the re-interpretation of stuff that I just took for granted as a student and never really questioned later on in my professional life. Fechner’s legacy leading to a so-called quantitative imperative (e.g., Spearman, Cattell, Thorndike) was challenged in the early to mid-parts of the last century (the so-called Ferguson Committee), but Stevens became the most successful defender of this empiricist tradition. He argued in a representational theory of measurement that measurement is the numerical representation of empirical relations. There is a
‘kind of isomorphism between (1) empirical relations among objects and events and (2) the properties of…’ numerical systems (Stevens, 1951, p. 1). From this starting point he developed his theory of the four possible types of measurement scales (nominal, ordinal, interval and ratio)’ (Michell, page 370). This is the foundation of any scale development in psychology. In a second argument beautifully laid out by Michell, it then becomes clear that these numerical representations due to their assumed isomorphic relations then both define the relations represented and represent them. Given this operationism, ‘any rule for assigning numerals to objects or events could be taken as providing a numerical representation of at least the equivalence relation operationally defined by the rule itself.’ (Michell, p. 371).
And this loop is where we are stuck. We take a few items or questions, administer them to a bunch of people, factor analyze them to get a simple structure and voila… we have measured depression, anxiety, dominance, identity… you name it. Or take implicit measures… you present a number of stimuli with no inherent coherent meaning and present them to individuals to measure their accuracy or reaction speed or whatever you want. Take the score and you have some measure of implicit bias, cognitive interference, etc. There is no relation between the empirical reality and the numerical representation as scores anymore. The question of whether the phenomenon of interest can be quantified has disappeared.
How does the DSM V fit in here? Well, it could be seen as just the latest installment of the same confusion. We don’t know what exactly we are measuring (see for example this article on grief as a case in point).
The issue is that we need to test whether psychological constructs can actually be quantified. As simple or complex as that. As much as I agree, I can’t stop scratching my head and wondering how the heck we are going to do that. How would you be able to examine whether any psychological construct (which is basically just an idea in our beautiful minds that we try to use and build some kind of professional convention around it) is actually quantifiable or not? The responses by a number of eminent psychometricians to this challenges suggested that nobody was able to come with an example to show that this has worked in a wider context within mainstream psychology.
Enter the second paper. Approaching the problem using Wittgenstein’s philosophy of measurement as normative practice (comparing it to the logical structure of language), Maraun argues that measurement needs to be rule-based or normative. You need to start with a definition that then leads to a specific set of rules or norms of how to measure this particular phenomenon just defined. The definition and the set of rules are the most basic form of expression. There is nothing simpler or more basic than this. Once these norms are established, any other person should be able to arrive at a similar result, that even if based on a different metric should still be convertible (e.g., from meters to feet). In psychology in contrast, we have no rules. We have a test or an experiment that is being conducted and the results are examined against another set of empirical observations to claim that the results are valid. According the practice of measurement in physics, empirically based arguments are not relevant for claiming that something has been measured. Measuring a number of items that factor together and then correlating it with some other instrument similarly derived does not mean that anything meaningful has been measured. Observing some kind of empirical pattern in an experiment does not constitute measurement if it is then validated or compared to a different set of empirical observations. The issue is that the concept is not sufficiently precise defined to lead to a set of rules that govern its measurement.
There a number of other points in that paper around validity, nomological networks, covariance structure and the like. Again, I keep scratching my head. These guys got a point… but how to get out of it. Maraun is very pessimistic. He argues:
Simply put, measurement requires a formalization which does not seem well suited to what Wittgenstein calls the ‘messy’ grammars of psychological concepts, grammars that evolved in an organic fashion through the ‘grafting of language onto natural (“animal”) behaviour’ (Baker & Hacker, 1982). One aspect of this mismatch arises from the flexibility in the grounds of instantiation of many psychological concepts, the property that Baker and Hacker (1982) call an open-circumstance relativity (see also Gergen, Hepburn, & Comer Fisher, 1986, for a similar point). Take, for example, the concept dominance. Given the appropriate background conditions, practically any ‘raw’ behaviour could instantiate the concept. Hence, Joe’s standing with his back to Sue could, in certain instances, be correctly conceptualized as a dominant action. On the other hand, Bob’s ordering of someone to get off the phone is not a dominant action if closer scrutiny reveals the motivation for his behaviour to be a medical emergency which necessitated an immediate call for an ambulance. The possibility for the broadening of background conditions to defeat the application of a psychological concept is known as the defeasibility of criteria (Baker & Hacker, 1982). Together, open-circumstance relativity and the defeasibility of criteria suggest that psychological concepts are simply not organized around finite sets of behaviours which jointly provide necessary and sufficient conditions for their instantiation (Baker & Hacker, 1982). Yet, this is precisely the kind of formalization required if a concept is to play a role in measurement. (p. 457-458).
Maybe what we are studying is just the social construction of meanings of psychological concepts as expressed in the heads of individuals? Is this a feasible reconciliation? From a researcher perspective it might be a worthwhile endeavor (think of discourse analysts embracing factor analysis… the thought is actually quite amusing). However, this approach leaves our search for a) latent variables and b) measurement invariance completely meaningless.
The reading continues. Some random thoughts at 1am while I am writing these notes:
a) The search for quantitative latent constructs in psychology probably should (?) or could (?) start from basic biological principles. In essence, we assume that there is something ‘latent’ out there if we use EFA or CFA or any of the typical covariance structure tests. If there are biological mechanisms that lead to certain psychological phenomena, we can study the biological principles and their interaction with the social environment that lead to psychological realities. Then we could get around the quantification problem. Problem… what biological principles and at what level of specificity?
b) The use of covariance analyses provide simple structures of language concerning folk concepts. This may be useful and meaningful for understanding how people in a specific context interpret items or questions. It is probably more of a sociological analysis of meaning conventions than a psychological analysis. This could be useful or interesting for research purposes, but it is not quite how we commonly understand or interpret the results when we are using these kinds of techniques.
Or am I missing something? How can this measurement paradox be tackled?