Saturday, January 9, 2016

Asymmetric funnel plots without publication bias

In my last post about standardized effect sizes, I showed how averaging across trials before computing standardized effect sizes such as partial \(\eta^2\) and Cohen's d can produce arbitrary estimates of those quantities. This has drastic implications for meta-analysis, but also for the interpretations of these effect sizes.  In this post, I use the same facts to show how one can obtain asymmetric funnel plots — commonly taken to indicate publication bias — without any publication bias at all. You should read the previous post if you haven't already.

A funnel plot is a commonly-used meta-analytic technique for the detection of bias in a subset of the scientific literature. The basic thinking is that if a literature is unbiased, the average estimates of an effect should not depend on the sample size (or some other measure of the "precision" of a study). For a given sample size, estimates of the effect size should be spread around the true effect size, with this spread decreasing as sample size gets larger.

Publication bias, which is often assumed to manifest itself as 1) a tendency for statistically significant results to be published, and 2) a tendency for researchers to publish effects consistent with their theoretical outlook, will result in asymmetric funnel plots. Read this Neuroskeptic post about a paper by Shanks and colleagues for an example how asymmetric funnel plots are used to argue for publication bias. Notice that the plots use a standardized effect size on the x axis.

A (not so) hypothetical paradigm

Since many priming effects have been called into question of recently, I will use a priming example. Suppose we are interested in an emotional face priming: we ask participants to perform a lexical decision task, but prior to every trial we "subliminally" (ie, very quickly) present either an excited face or a sad face, thinking that the excited face will speed performance on the task. Participants perform a number of trials in both priming conditions, which are averaged to obtain two "observations" per participant: an average RT in each condition. This is very common in the psychological literature. A paired t test is used to assess the effect of the prime.

Now suppose this same paradigm is used across many labs, with only variation in sample sizes. Each lab reports the standard statistics: the mean difference in RTs across participants, its standard error, and the t statistic. A skeptic comes along, collects all the statistics across all the papers, and computes Hedge's g standardized effect size (a variation on the standardized difference score) from the t statistic. They produce the funnel plot shown below by plotting the sample size1 (number of participants) against the standardized effect size:
This is a massively asymmetric funnel plot, and would likely be taken as strong evidence of publication bias. However, because I simulate the data, I know that there is no publication bias at all. This is merely an artifact of averaging and standardized effect sizes. You can obtain my simulation code here: github gist

Why is the funnel plot asymmetric? In all studies, the total number of trials performed was approximately the same: 2000 trials. The way these broke down across participants was different. Some studies had 100 trials per condition and 10 participants; others, 10 trials per condition and 100 participants. The standard deviation of the difference scores around their mean is a function of the number of trials performed per participant. When the number of trials is high, the standardized effect size is high, just as discussed in the previous blog post. But here, because the total amount of "effort" per study is conserved (that is, all studies have the same number of total trials), the studies with larger numbers of trials per participant have a smaller number of participants. The funnel plot therefore looks problematic, but it is an artifact.

One wonders if this Cross Validated query was related to this artifact.

Creating a funnel plot from the raw effect sizes removes the asymmetry; a funnel plot with the standard error on the y axis also does so.

This does not mean that using the standard error on the y axis fixes the problem. Consider another way number of trials and number of participants can be divided: positively correlated, rather than negatively as before. That is, studies that run more participants also run more trials per condition. The funnel plots end up looking very strange, with an asymmetry that is reverse of the one we expect. Larger effect sizes are obtained with larger numbers of participants.
Without reflection, this pattern might be offered as evidence that there was something very strange happening in a literature. But there's nothing strange here, except with the analysis. If there were publication bias, though, this artifact might actually mask it.

Wrap up

I suspect there are other artifacts one could generate using standardized effect sizes in a meta-analysis2. How can we keep from getting fooled? In some cases, perhaps the correction I mentioned in the previous post might be of use. Since a funnel plot is often used for detecting problematic bias in a literature rather than estimating the effect size, the fact that there is no "true" effect size is not problematic.

For future research, data sharing and reporting of different effect size measures will help. Modifications of Cohen's d and Hedge's g exist which will reduce this problem (see "Computing d and g from studies that use pre-post scores or matched groups", for instance), but these modified statistics cannot be computed from typically-reported statistics. The fact that we need statistics that are not typically reported in order to perform reasonable meta-analyses raises the question of whether current reporting practices really allow a cumulative science.


1Funnel plots can be created with a variety of statistics on the y axis. Different researchers make different recommendations for both axes (see, for instance, Peters et al 2006), and as we will see, this can have a dramatic effect on the conclusions.

2Sterne et al (2011) note minor asymmetries caused by a correlation between an effect and a standard error, as can be caused in estimation of extreme proportions or similar parameters, but nothing as dramatic or fundamental as shown here. Their asymmetries are mostly problematic for asymmetry tests, which can pick up minor asymmetries with larger samples.


  1. A variation on the theme of small studies are more rigorous causing larger effects in smaller studies. In your case, if one suspects that some rigor variable is causing an artifact (in your case, trails per subject), one could regress it out first.

  2. I can't see why, in the first example, one would call any of the trials more "rigorous" than any others. They all have the same number of trials, and as one can see from Figure 2 (right) they all have essentially the same standard error. This artifact would be solely attributable to funnel-analyst carelessness.

    There's no reason to regress it out, though; one knows how it should affect the effect size (by the square root of the number of trials). The correction I suggested in the previous post --- in this case, dividing the standardized effect size by the square root of the number of trials in a study --- makes the asymmetry go away.

  3. Hi Richards,
    both in the fixed, random and mixed models m-a, each study effect size is weighted by the inverse of variance which take in account the study numerosity. This does not solve the problem of the number of trials, but I wonder if it mitigates the limitations you raised.

  4. Writing your personal statement... The personal statement is your opportunity to sell yourself in the application process, and it generally falls into one of two categories - A comprehensive personal statement - This allows you maximum freedom in terms of what you write and is the type of statement often prepared for application forms See more statement of purpose sample for mba