Have your cake and eat it, too! Practical reform in social psychology

This week’s edition of Psych Wednesdays was written by Michael Kraus. It was originally published on Psych Your Mind on February 21, 2013.

The cake we can (1) have, and (2) eat!

If you have been following recent headlines in the social sciences then you are aware that the field of social psychology has been in some rough water over the past three years. In this time period, we’ve had our flagship journal publish a series of studies providing evidence that ESP exists (and then refuse to publish non-replications of these studies). We’ve suffered through at least three instances of scientific fraud perpetrated by high profile researchers who engaged in egregious scientific misconduct. We’ve had an entire popular area of research come under attack because researchers have failed to replicate its effects. And several respected members of the science community have had some harsh words to say about the discipline and its methods.

Listing all of these events in succession makes me feel a bit ashamed to call myself a social psychologist. Clearly our field has been lacking both oversight and leadership if all of this could happen in such a brief period. Now, I’m not one to tuck my tail between my legs. Instead, I’ve decided to look ahead. I think there are relatively simple changes that social psychologists (even ones without tenure) can make in their research that can shore up our science going forward.

I. Small Effect Size, Large Sample Size

When I was an undergraduate, I remember my first exposure to psychological research. I was disappointed when I learned how little we could explain about human behavior with even the best social experiments. We all would like to think that a couple of social variables could easily explain 50-70% of our behavior. The problem of course, is that reported effects in psychology explain less of our behavior than we expect–much less. The average size of an effect across 100 years of social psychology is r = .21 (Richard et al., 2003). That’s 4% of the variance in behavior explained. This analysis reveals that experiments typically have a smallish effect on our behavior. Now, if you know something about statistics, then you know that to detect such a small effect size reliably, a researcher needs both a reliable measure and many observations. In short, small effects require large samples.

Now contrast this reality with the pages of some of our top journals. We’ve read four different papers at random for the social journal clubthat I supervise at the University of Illinois, and all four of the papers did not have sufficient sample size in at least one of the reported studies to be able to find the average effect in social psychology (r = .21). This means that these studies were not designed properly. Running consistently under-powered experiments is a sloppy research practice that has crept into our field at alarming rates and it needs to stop now.

The good news is that collecting larger samples is fairly straightforward for many types of research. I think I didn’t see a single talk at our recentsocial psychology conference where some data was not collected via the internet. Internet samples are an easy way for people to achieve large samples. I collect fairly high investment data–using autonomic physiology and coding of nonverbal behavior–but I’m still going to push myself and my graduate students to spend a little extra time collecting a bit more data in each study.

The biggest challenge for data collection occurs in neuroscience–where a single participant costs a sizable amount of money (an arm or a leg? I actually have no idea what this research costs). For social neuroscientists, I think priority in research should focus either on making measurement more precise, or a discussion needs to start about ways in which institutions can pool their neuroscience resources. The President isn’t willing to give up on neuroscience, so we shouldn’t either!

II. (Effect) Size Matters

Personality psychologists love to take us social psychologists “to the woodshed” when it comes to effect size. Whereas personality psychologists are usually working in the domain of correlations, where the size of an effect is always explicitly visible, social psychologists have been engaging in a lazy practice of failing to report effect size in their experiments (me included). Reports of effect size should accompany all results in empirical papers. Period.

Reporting effect size is essential for two reasons. First, it is important to report the size of an effect because that estimate allows us to get a sense of whether or not we should care about the results in a study. If for example, you conduct gene research and find an association between a candidate gene and a phenotype, but that association is essentially explaining .01% of the variance in that phenotype, you could either claim that genes influence personality (as some do). Alternatively, because the effect is so small, you might conclude instead, that candidate genes don’t directly influence personality in a meaningful way.

The second reason reporting effect sizes is important is that it can draw attention to peculiarly large effects in our field. Recall that the average effect in social psychology explains 4% of the variance in behavior. If a study finds that X explains 20% of Y with a tragically small number of participants, then it should be met with suspicion by other researchers. What I mean is, effect size estimates can help reviewers see that reported effects are larger than would be expected in our field (i.e., too good to be true). At the very least, reviewers could ask researchers to directly replicate large effect size findings in studies using larger samples–where the effect can be estimated with more precision.

III. Methods in Manuscript Review

Last year I reviewed somewhere between 50-60 journal articles from psychology journals. Early indications this year suggest that I’ll be reviewing about that many articles this year. Basically, if you have submitted a journal article to a scientific journal in the past year, chances are good that I’ve read it!

In the manuscript review process, the comments that I typically see have to do with the theory and the cleanliness of the results. Rarely do researchers pay attention to precision in measurement and method. I think reviewers have a responsibility to hold research papers to high methodological standards. This means evaluating the measurement instruments used in the study for precision, and the design of the study for statistical power. Manuscripts shouldn’t be rejected because they do not show clean results. Manuscripts should absolutely be rejected if the study is designed without using good methods. I think it’s time the field started dinging researchers for running studies with small samples in the manuscript review process.

In my own reviews I’ve started this process. So far, I’ve been thanked by a couple of editors for a power analysis I conducted in my review of an article, and ignored by a couple others. I am going to keep going though, and I hope more people will join me!

IV. Stop P-Hacking

I’ve dealt with this issue in several posts on this blog so I won’t get into too many details here. In general, if researchers would be more honest with their methods, and less motivated to generate clean findings over real findings, then I think the field would be in better shape. Most of p-hacking deals with attempting to use unnatural statistical means to transform a non-significant finding into a significant one.  This practice leads to big short term gains (more publications=getting a job) and big long term costs (others might fail to replicate one’s p-hacked effects, field is flooded with biased studies).

The good news is that the legal way to increase one’s chances of obtaining significant results is to collect data using larger samples. This way, researchers can spend some extra effort in data collection, and then save some effort in data manipulating to achieve statistical significance. We could discuss at length how the incentives in the system lead researchers to engage in questionable data practices, but I think using larger samples mitigates most needs to engage in p-hacking. I’m determined to continue to be productive in my research by collecting larger samples.

V. What I’m Not Doing

What I have proposed in points I. through IV. are some very simple changes that every social psychologist of any rank could do to reform social psychology. These aren’t the only solutions for the problems of the field, just the one’s that I think are the easiest to implement that are likely to have the most positive impact. I am not keen on using Bayesian statistics to solve our problems–for some reason, all the people who advocate Bayes’ theorem seem either unwilling or unable to explain why it would solve method problems in our science. I’m also not willing tothrow out the history of our field and start over. That’s super depressing. I’m not a believer in utopias of any kind–even sciency ones. I’m also not going to stop using null hypothesis testing. I think we are often interested in finding out whethere X causes Y to go up or down. As long as we start to care about effect size, null hypothesis testing can stick around. Also, dude, that’s the scientific method we learned in 5th grade! Nostalgia for the win! Finally, I’m not waiting around for sweeping change to come down from above. That sort of change just doesn’t seem to be happening at an acceptable speed.

Recent events in social psychology have led me to be treated to my fair share of finger wagging (both actually and figuratively) by my colleagues in other fields of psychology. I’m taking action because I’m ready for that to stop! When I think about the field, I’m hopeful that the future will reveal a stronger science that uses better methods and collects better data. I’m glad to be a part of that journey. Onward!

Richard, F., Bond, C., & Stokes-Zoota, J. (2003). One Hundred Years of Social Psychology Quantitatively Described. Review of General Psychology, 7 (4), 331-363 DOI: 10.1037/1089-2680.7.4.331

Leave a Reply

1 comment

  1. Kristina K

    This applies to ALL fields of science. Extremism has more buyers, and p-value hacking obliges, impacting the manuscript review process. Large effects are more likely seen in smaller sample sizes, and so the process continues as these studies are incentivized with publications while well-conducted non-significant studies are denied and deemed as nothing novel. Glad other fields are considering this: http://www.hexxiiiz.com/FOS