I've been thinking and talking quite a bit about the often-ignored practice of empirically-validating the experimental manipulations that psychologists use, so I decided to blog about it here. Am I qualified to do so? Not to the extent that this topic deserves. In graduate school, I took an excellent Personality Psychology course with Dr. Suzanne Segerstrom, who made construct validation accessible and a priority to understand. I've started to do my own validation research, examining the potential validity of the Taylor Aggression Paradigm, and I regularly supplement my own ignorance by collaborating with clinical and personality psychologists who are more expert in this area. With that in mind, here we go unto the breach.
Experimental psychologists (like myself) often develop experimental manipulations intended to cause changes in individuals' thoughts, feelings, and behavior. However, these manipulations are often considered 'valid' based on a few criteria:
-face validity (does it appear to manipulate what it intends to manipulate?)
-manipulation checks
-whether it achieved its predicted outcome
Let me give you an example. My lab often uses an essay-feedback paradigm to provoke participants into acting aggressively. In this manipulation, participants get very nasty or very nice feedback on an essay they just wrote. We say that this is valid because:
-it looks like a valid way to increase aggression (e.g., participants fume and scoff when the get harsh feedback [see Jon Stewart below for a depiction of such responses])
-it increases scores on a manipulation check questionnaire, in which participants self-report how provoked they felt by the feedback
-it increases aggressive behavior in the lab
Experimental psychologists (like myself) often develop experimental manipulations intended to cause changes in individuals' thoughts, feelings, and behavior. However, these manipulations are often considered 'valid' based on a few criteria:
-face validity (does it appear to manipulate what it intends to manipulate?)
-manipulation checks
-whether it achieved its predicted outcome
Let me give you an example. My lab often uses an essay-feedback paradigm to provoke participants into acting aggressively. In this manipulation, participants get very nasty or very nice feedback on an essay they just wrote. We say that this is valid because:
-it looks like a valid way to increase aggression (e.g., participants fume and scoff when the get harsh feedback [see Jon Stewart below for a depiction of such responses])
-it increases scores on a manipulation check questionnaire, in which participants self-report how provoked they felt by the feedback
-it increases aggressive behavior in the lab
But is that really enough to say I've validated my experimental manipulation? I don't think so. Here's why:
1. We can't always trust our intuitions about face validity. Just because something appears to have certain properties doesn't mean it always does. Our personal biases make us see things inaccurately, and we may see that a manipulation is face valid because we really, really want it to be.
2. Manipulation checks are a good idea when it comes to validating experimental manipulations. However, the manipulation check measures are often not validated themselves, beyond appearing to measure the construct that they measure (see previous statement about issues with face validity).
3. Saying an experimental manipulation is valid because it had the intended effect on your outcome is the same as saying 'it works because it worked' and is tautological. There are many other reasons it could've had the desired effect that have nothing to do with your hypotheses (e.g., the deceptive elements of your manipulation may have been laughably transparent, putting participants in a humorous mood state).
So, what can we do about this? When I tear down, I also like to build up, so I make some suggestions below that came from conversations with and readings of personality and clinical psychologists who have dedicated many decades to psychometric validation. I was implicitly aware of these issues over my short career, but when it came to applying better validation techniques:
1. We can't always trust our intuitions about face validity. Just because something appears to have certain properties doesn't mean it always does. Our personal biases make us see things inaccurately, and we may see that a manipulation is face valid because we really, really want it to be.
2. Manipulation checks are a good idea when it comes to validating experimental manipulations. However, the manipulation check measures are often not validated themselves, beyond appearing to measure the construct that they measure (see previous statement about issues with face validity).
3. Saying an experimental manipulation is valid because it had the intended effect on your outcome is the same as saying 'it works because it worked' and is tautological. There are many other reasons it could've had the desired effect that have nothing to do with your hypotheses (e.g., the deceptive elements of your manipulation may have been laughably transparent, putting participants in a humorous mood state).
So, what can we do about this? When I tear down, I also like to build up, so I make some suggestions below that came from conversations with and readings of personality and clinical psychologists who have dedicated many decades to psychometric validation. I was implicitly aware of these issues over my short career, but when it came to applying better validation techniques:
A few things we can do to promote the validation of experimental manipulations:
1. Use (and create!) validated manipulation checks and outcome measures.
A great deal of psychometric validation work has focused on personality trait questionnaires and clinical assessment tools. However, there is a dearth of validation work being done on questionnaires that measure state-level processes and on behavioral measures. We need to spend more time developing, systematizing, and validating these measures as a foundation on which to build validated experimental manipulations.
Caveat: The inimitable Dr. Sanjay Srivastava pointed out that this creates a problematic loop, in which we need to validate state measures, which would require validating them with validated experimental manipulations, which would require the use of validated state measures to validate the manipulations, so on and so on.
1. Use (and create!) validated manipulation checks and outcome measures.
A great deal of psychometric validation work has focused on personality trait questionnaires and clinical assessment tools. However, there is a dearth of validation work being done on questionnaires that measure state-level processes and on behavioral measures. We need to spend more time developing, systematizing, and validating these measures as a foundation on which to build validated experimental manipulations.
Caveat: The inimitable Dr. Sanjay Srivastava pointed out that this creates a problematic loop, in which we need to validate state measures, which would require validating them with validated experimental manipulations, which would require the use of validated state measures to validate the manipulations, so on and so on.
2. Identify the nomological network around your manipulation.
Lee Cronbach and Paul Meehl coined the nomological network to represent the constellation of variables (latent and observed) that orbit around a given construct, and the relationships they share. This approach is useful here because it asks you to go beyond the manipulation check and to consider the effect of a given manipulation on other variables to best triangulate its actual construct validity.
In practical terms, this means that we should examine the effect of a manipulation on just our manipulation check and outcome of interest, but also on theoretically-related variables that the manipulation should also influence. Additionally, we should also examine the effect of the manipulation on variables it is *not* supposed to affect.
Going back to our example with the essay feedback paradigm, I should test whether the manipulation increases feelings of provocation (the manipulation check) and aggression (the outcome of interest), but also increases states that are theoretically, positively-linked to aggression (e.g., anger, approach motivation, hostility), decrease those that are negatively-linked to aggression (e.g., empathic concern, inhibition), and have no effect on states that do not relate to aggression (e.g., mating preferences, working memory); this last part is particularly challenging to identify for me.
If you're looking to graphically map your given theory's constructs and the relations between them, Dr. Kurt Gray's Theory Mapping website, is an awesome tool for this.
3. Train graduate students in experimental psychology in validation techniques.
I don't have any data on this, but my sense of experimental psychology graduate training is that psychometric validation is not a core feature. It's a focus of clinical and personality psychology programs, but experimental psychologists may sometimes fail to emphasize training in these areas and give more focus to things like developing realistic and deceptive manipulations (which is, of course, also very important). There is a cost to this. Indeed, part of our replication crisis may be due to the use of unvalidated experimental manipulations. Our graduate Methods courses could combat this by including training in how to validate measures and manipulations.
4. Set aside time, prestige, and journal pages for studies that purely focus on validating a manipulation.
It can be frustrating to set aside time to validate experimental manipulations, instead of simply using them immediately in a study and relying on manipulation checks. However, if we can properly incentivize such work, then it will not require trading-off between more 'substantive' projects and rigorous validation work. As examples of how this is going, editors from the journals Assessment, Advances in Methods and Practices in Psychological Science, and the Journal of Research in Personality have all welcomed such submissions (see below).
A couple of final thoughts:
We often assume that our manipulations (essay feedback) exert their effects on our outcomes-of-interest (increased aggression) because they operate on a specific mechanism (increased provocation). However, we need to test these assumptions, which can be done by experimentally manipulating the proposed mediator (which, of course, presupposes the existence of a validated manipulation of the mediator) and would likely require massive samples.
A lingering question I have is 'do we need to re-validate manipulations when we use them in conjunction with other manipulations?' If I combine a rejection manipulation with another one that, say, induces feelings of empathy, do I need to re-validate the rejection manipulation to ensure that, even in the presence of experimentally-increased empathy, that it still does its job?
In closing, I realize that this is yet another blog post that suggests we fix something in experimental psychology that will be very difficult to fix and opens up its own host of methodological questions and issues. Further, I did this by pointing out a bunch of ideas that are not mine or new by any means. Further(er), I am guilty of publishing experimental manipulations without validating them, and it's likely I will publish more papers that fail to do so. Even so, I am hopeful that if I/we put a bit more time and energy into validating our experimental manipulations, that our inferences will be improved and therefore our understanding.
We often assume that our manipulations (essay feedback) exert their effects on our outcomes-of-interest (increased aggression) because they operate on a specific mechanism (increased provocation). However, we need to test these assumptions, which can be done by experimentally manipulating the proposed mediator (which, of course, presupposes the existence of a validated manipulation of the mediator) and would likely require massive samples.
A lingering question I have is 'do we need to re-validate manipulations when we use them in conjunction with other manipulations?' If I combine a rejection manipulation with another one that, say, induces feelings of empathy, do I need to re-validate the rejection manipulation to ensure that, even in the presence of experimentally-increased empathy, that it still does its job?
In closing, I realize that this is yet another blog post that suggests we fix something in experimental psychology that will be very difficult to fix and opens up its own host of methodological questions and issues. Further, I did this by pointing out a bunch of ideas that are not mine or new by any means. Further(er), I am guilty of publishing experimental manipulations without validating them, and it's likely I will publish more papers that fail to do so. Even so, I am hopeful that if I/we put a bit more time and energy into validating our experimental manipulations, that our inferences will be improved and therefore our understanding.