Show full content
Welcome to our latest author interview! Recently I interviewed Scott Peacor on Peacor et al. (2025 Ecology), “Ecological meta-analyses often produce unwarranted results”. It shows that ecological meta-analyses frequently fail to correct for non-independence of effect sizes reported in the same paper, and that this failure has serious consequences. And it shows this is an entertaining way. I asked Scott about how a non-statistician ended up writing a statistical methods paper, the biggest problems with ecological meta-analyses, the use of rhetoric in scientific papers, and more. Come for the discussion of methodological issues in meta-analyses, stay for the advice (?) on how to wear down your collaborators, what scientific writers can learn from advertisers, and more.
The interview was conducted over Zoom. Both the questions and answers have been edited for clarity and brevity.
Jeremy: What’s the backstory of that Peacor et al. (2025)? How did that paper come about?
Scott: Yeah, I’m not a statistician. I did poke around and unfortunately never published a paper on my consternation with how we are doing ANOVA. Ecologists were not paying attention to what the underlying biological model is and how the model determines whether you have an interaction term or not. And then when I saw meta-analysis coming out—this was decades ago now—I would say I got kind of irritated. It seemed like what was coming out of them was way too much confidence relative to what was going into them. And, you know, I’m not a statistician, I’m not an expert in hierarchical models, but if somebody tells me a Little League team just beat a major pro team, even if I don’t know the rules of baseball, I can probably figure out something’s going on. And it really seemed like something was going on here. I thought this was going to be something that concerned everybody when they saw these papers. But that wasn’t the case. These papers have been cited thousands of times, and there has not been concern from the community.
I wasn’t interested in sussing it out myself, but nobody else was doing it. And so I decided to work on it. It took me a really long time, decades. And again, since I’m not a hierarchical statistics expert, and I like to work with others, I convinced Jim Bentz, who is an expert hierarchical modeler, to work with me. That was difficult. He was not interested. But what I would do is I’d put a paper on his desk and I knew he would look at it and he would get upset and he’d say, but Scott, I still don’t want to work on it. But after 3 or 4 meta-analysis papers on his desk, I wore him down. And I’m friends with Craig Osenberg, and Jim actually was from grad school too.
And then Jeremy, it really was eye-opening in trying to get funding. We wanted a grad student or postdoc to work on this, not us. And after the first rejection from NSF, I looked at DEB the last 5 years, and at the time I didn’t see any grants that were on how do we do science, how do we do experiments, and so on. Meta-analysis was being used so much, and we were saying, hey, there are some problems with the way we’re doing meta-analysis, let’s look at this. And NSF were not interested because it’s not novel. We tried to address that a second time, didn’t work again. And we were persistent. I didn’t give up. And the third time they gave us half what we were asking for.
I want my work to have a positive influence, not just have fun. This is not fun for me. I don’t really like statistics and I don’t like criticizing people. I don’t get a kick out of it. And so designing a statistic project to show that we ecologists are doing something wrong is not my idea of fun.
Jeremy: You said that the genesis of this was decades ago. Is it your sense that the statistical practice around meta-analysis have changed between the time when you started thinking about this and the time when the paper came out? Because like you, I’ve looked at a lot of ecological meta-analyses. I haven’t done a count, but my offhand impression is that a fair number of them these days are doing what you recommend and using hierarchical models to control for sources of non-independence. And that is a change. That would not have been the case back in the early ‘90s when meta-analysis first took off in ecology.
Scott: I’ll answer that question in two ways. Have things gotten better? In one way, I think they have gotten better. We now have easily accessible software to do hierarchical meta-analyses. So in that way we’ve gotten better. Have we become more reflective as a group of ecologists? I don’t think so. I think that MetaWin came out and we as ecologists know better than anybody what the problem with non-independence is and yet we had papers coming out using MetaWin that had 50 data points from the same paper that all looked exactly the same for the most part relative to the other papers, and we had ecologists that are completely willing to use that package knowing the problem with non-independence. Now, of course, you can’t expect a field ecologist to be at the level of a statistician, but you can expect them to have a statistician on their paper if they’re using a complicated tool that hasn’t been fully sussed out.
Jeremy: I wanted to ask you a little bit about the rhetoric in this paper, which I found entertaining. Because you do some analyses where you ask, “How often will we find a statistically significant effect of a moderator variable that truly has no effect, if we fail to correct for non-independence of effect sizes reported in the same paper?” But rather than just randomly generating a moderator variable that’s unassociated with the dependent variable, which I think is what most people would’ve done, you use moderator variables like, “Did the author’s last name contain an odd or even number of vowels?” It reminded me a bit of Fourcade et al. 2018, which you cite, which generated nonsense environmental variables for species distribution models by superimposing Old Master paintings on maps of Europe. So tell me a little bit about the thought process behind your use of nonsense moderator variables. Was it a deliberate attempt to grab the reader’s attention?
Scott: I’m certainly not trying to talk down to the reader or anything like that. It’s a matter of experience. When we were starting to study this, I read this book called Made to Stick. It’s about advertising. It turns out that just changing one word or something in an ad could sell 100% more of the product. I would talk with people about this work and I would say, “Hey, we randomised this and 40% of the randomised combinations came out significant.” Oh, yeah, they’d say, that’s interesting. But if I said, “Hey, I split them up by red state and blue state,” they’d say, “What?! What?! That was significant?!” [laughs] That’s just how our brains work. It was from experience talking about this that I realised, it gets the brain thinking more about how this just doesn’t make sense that these nonsense variables are having an effect. What was interesting with blue versus red states is I actually had people saying, “Well, in red states, you know, you’re in the interior of the continent, blue states are more along the coast. Did you …” They were trying to find an explanation for it, you know? [laughs] And so I had to have some moderator variables that readers would completely not be able to come up with an explanation for. “People with odd numbers of vowels in their last name, you know, maybe tend to work in forests rather than grasslands…” [laughs]
Some of my co-authors weren’t as enthusiastic about those nonsense variables. But one of our reviewers on our first submission really was very positive about it. That was encouraging, it meant that I wasn’t trying to do something nefarious or whatever. And yeah, figure 1 or 2, I forget, the one with the nonsense variables, is one of my favorite figures I’ve ever published. But it’s completely unneeded for the science, it’s just for fun, for salesmanship.
Jeremy: Here’s a nitty-gritty technical follow-up. For the datasets you’re looking at in this paper, using a hierarchical mixed effects model that accounts for non-independence of effect sizes from the same paper greatly reduces the Type I error rate, but doesn’t entirely reduce it to the intended level of 5%. You end up with a Type I error rate around 13%, still solidly above 5%. How much should we worry about that 13% and what, if anything, should we do about it? Could you try to take some still more sophisticated statistical approach that would somehow get that Type I error rate down to the desired level of 5%? Or should authors and readers just kind of informally apply a mental discount factor? Just keep in mind that the Type I error rate is probably closer to 0.1 or 0.15 than to 0.05, due to non-independence that can’t be easily accounted for statistically.
Scott: I certainly think we should worry about it. We talk about in the paper about a number of reasons for this 13%, which is a rough number. I want to also keep in mind that this 13% is conservative. If there are other sources of non-independence that are not captured in that 13%, it would be larger than that. So I would say, yeah, if you’re going to just use standard meta-analysis techniques and fit a model that includes a random effect for paper, maybe assume alpha is 0.2 or something.
Jeremy: In the discussion section of your paper, you raise the possibility of other sources of non-independence besides just non-independence of effect sizes reported in the same paper. You mention, for instance, the possibility of non-independence of effect sizes from different papers from the same lab group…
Scott: Earl Warner, Rick Relyea, myself, we do experiments pretty much the same way, you know? [laughs]
Jeremy: Okay, so there’s a good example. Anyway, I’ve been thinking about this, because we do already have a quite dramatic example of that sort of non-independence. It’s Clements et al 2022 Plos Biology, their meta-analysis of effects of ocean acidification of fish behaviour. They find that studies from one particular lab group, the Munday lab, really stand out as having very different results from others. And so that makes me wonder if there are any other examples–hopefully examples that don’t involve data fabrication as in the papers from the Munday lab! But I dunno, I worry a little that we’d really be opening a can of worms if we start expecting meta-analysts to routinely test for some lab groups consistently obtaining different results than others.
Scott: This problem you’re mentioning, I do think it’s likely a big problem in meta-analysis and we should look at it.
Jeremy: Do ecological meta-analyses have too much heterogeneity? Meta-analysis started out in medicine, of course, and in the big Cochran compilation of medical meta-analyses, the median heterogeneity, I2, is only 22%, which in ecology would be absurdly low. In ecology, it’s usually over 90%–over 90% of the variance in effect size is due to heterogeneity, not sampling error. I’ve talked to ecological meta-analysts who are totally comfortable with high heterogeneity, who take the view that that’s why you use a hierarchical mixed effects model–to account for heterogeneity, however much of it there might be. And I’ve talked to others who take the opposite view, that if your heterogeneity is anywhere near 90% that means you’re comparing apples and oranges and bricks in the same meta-analysis. Where do you stand on this?
Scott: I think it’s a matter of what makes sense, scientifically. It just does not make sense to me to put, say, metrics of population growth rate and population size from different studies in the same meta-analysis looking at the effect of some factor.
Jeremy: I agree with you on that. My own view on that would be that, I don’t think you should mix, say, population growth rate and population size in the same meta-analysis. But that the reason you shouldn’t is to do with ecological interpretability and not because you’re gonna be jacking up heterogeneity. Because I think our I2 values in ecology are going to be very high, even if you do take a lot of care to only include in your meta-analysis studies that are scientifically comparable.
Ok, next question: speaking as someone who, like you, has reanalyzed lots of ecological meta-analyses, what tends to worry me most isn’t non-independence of effect sizes within papers, though I do worry about that. It’s just having so few effect sizes from so few papers in the first place. The median ecological meta-analysis only includes something like 60 effect sizes from about 22 papers or so. 25% of ecological meta-analyses have 10 papers or fewer in them. So for most research topics in ecology we just don’t have all that many papers to go on. And further, the 22 papers included in the median meta-analysis were published over a period of 20 years. So ecologists collectively publish about one paper per year on a typical research topic. That’s just not much, right? It’s not a recipe for rapid scientific progress on most of the topics that ecologists study. Is that a problem? And if so, what, if anything, could be done about it?
Scott: Yeah, I feel like you’re asking me if I was NSF director, where would I put funds? And you know, one of the beautiful things about ecology is we’re all studying very different systems, and one of the difficult things about ecology is we’re all studying very different systems. So I don’t know the answer to that. I guess my feeling is it would be good if we had a little bit more top-down approach and somehow created more uniform studies.
In my view, the biggest issue we have in meta-analysis is that we’re using effect size metrics that don’t match the biology. Craig Osenberg has been banging this drum for a long time. I am taking some credit for convincing him to have another go. And so we are writing another paper with examples showing that when you use a typical ecological metric like log ratio or Hedges’ d you get the wrong answer I hope people will have a look at that paper because I think it is really important to pick the right underlying model for your statistical test or your meta-analysis.



