Michael Spagat is a professor of economics at Royal Holloway, University of London. He has written extensively upon survey work done in Iraq, specifically on the two Lancet reports on estimated deaths in Iraq following the 2003 invasion, and two other studies done on child mortality rates in the country during the 1990s sanctions period. He’s also studied the Iraq Living Conditions Survey, the Iraq Family Health Survey, and the 2013 PLOS Medicine Journal Survey on Iraqi fatalities. Professor Spagat has found anomalies with almost all of these papers that undermine their findings. Unfortunately, most of his work is only known in academia. What follows is an interview with Prof. Spagat exploring his critiques of these famous papers.
1. In October 2013 a new survey was released on estimated deaths in Iraq after 2003, which was published in the PLOS Medicine Journal. One of the authors, Gilbert Burnham was also a co-author of the two Lancet reports on Iraqi fatalities. The press reported that the new survey believed 500,000 Iraqis had died as a result of the war since the U.S. invasion, but its estimate for violent adult deaths was actually 132,000. The headline estimate was for what they called “excess deaths” which they view as caused by the war, although many of these are non-violent. What did you think of that new paper?
The Hagopian et al. report (PLOS) did two separate surveys simultaneously. One was a sibling survey. The other was the more typical household survey. These are two different methods to cut up the population into mutually exclusive groups that exhaust the whole. So you either have a bunch of households that, hopefully, don’t overlap or groups of individuals matched with their siblings. The traditional household survey, which has pretty much monopolized the media attention of the study, didn’t give an estimate for violent deaths. The central violent-death estimate for the sibling survey, 132,000 non-elderly adults, is a bit below the Iraq Body Count (IBC) number for civilians plus combatants. (IBC focuses on civilian deaths but they now also publish counts of combatants killed. The IBC total for the Hagopian et al. period is around 160,000, including children and the elderly.)
To their credit, the PLOS authors do post their data online, so you can do your own analysis. I’ve taken the opportunity to investigate the household survey data together with a Royal Holloway PhD student, Stijn Van Weezel. The data yield an estimate of around 200,000 violent war-related deaths, i.e., about 40,000 higher than the IBC number for civilians plus combatants (160,000).
Hagopian et al. have stressed what they call “excess deaths” rather than violent deaths. Excess deaths are meant to be both violent and non-violent deaths that have been caused by the war. Unfortunately, the excess-death concept is a pretty squishy one based on a poorly designed counter factual exercise. The calculation hinges on constructing a death rate that would have occurred, theoretically, if there had never been a war. This is something we can never measure directly. We’d like to run history twice, once as it actually happened with a war, and once without the war. We would then measure death rates under both scenarios and call the difference between the two the excess death rate caused by the war. Obviously, we can’t ever do this exercise or even anything that resembles it.
There are, nevertheless, a couple of approaches that have been tried with the hope of pinpointing the causal effect of a war on death rates. The most common one is to measure both a pre-war death rate and a during-war death rate and to then assume that the difference between the two is caused by the war and nothing else.
Unfortunately, such a before-and-after exercise is problematic. It pretty much boils down to saying that because “b” comes after “a”, then “a” has caused “b.” This is a known logical fallacy. In the case of Iraq a lot that can affect death rates has happened since March 2003. The start of the war is the most obvious and dramatic factor but it is only one of these things. In conflict situations there can be an event like a drought that directly causes deaths but also exacerbates tensions, leading eventually to war. If you attribute all increased deaths to just the war then you’re missing the fact that there was also a drought that was also probably causing deaths. You wind up exaggerating the number or deaths caused by the war.
In fact, the standard excess-deaths concept leads to an interesting conundrum when combined with an interesting fact exposed in the next-to-latest Human Security Report; in most countries child mortality rates decline during armed conflict (chapter 6). So if you believe the usual excess-death causality story then you’re forced to conclude that many conflicts actually save the lives of many children. Of course, the idea of wars savings lives is pretty hard to swallow. A much more sensible understanding is that there are a variety of factors that determine child deaths and that in many cases the factors that save the lives of children are stronger than the negative effects that conflict has on child mortality.
Anyway, Hagopian et al. didn’t bother much with the above reflections but, rather, charged straight in and estimated 400,000 excess deaths. However, they have quite a crazy confidence interval around this estimate - 50,000 to 750,000. So even if you accept their notion of excess deaths at face value you still have to say that this is not a very informative estimate.
My student Stijn and I are taking a different tack in our analysis. We say that if the war is causing non-violent death rates to increase then you would expect non-violent deaths to increase more in the violent parts of Iraq then they do in the non-violent parts of Iraq. To the contrary, we find this just isn’t so. At least in our preliminary analysis, there seems to be very little correlation between violence levels and changes in non-violent death rates. This should make us wonder whether there is any reality behind the excess deaths claims that have been based on this Iraq survey. In fact, we should question the conventional excess-deaths idea in general.
Nevertheless, the authors and the media have stressed this excess death estimate while obscuring the great uncertainty that surrounds it. Remember, the estimate is 400,000, give or take 350,000. Yet somehow the authors were able to talk that up to 500,000 deaths and assert this number as a sort of minimum. Thus, the uncertainty was expunged, and then there was inflation from 400,000, for which there is some supporting data, to 500,000, which is more of a speculation than a finding. Obviously 500,000 is a media friendly number - people like the idea of half a million.
Although the household survey of Hagopian et al. tells us little about excess non-violent deaths it does bring to bear some useful evidence about violent deaths. The new study suggests that the full number of violent deaths in the Iraq war is a bit higher than the IBC number (200,000 versus 160,000 civilians and combatants). Much other evidence points in this direction but such an understanding has not been universal. Strangely, though, Hagopian et al. seem to believe their findings are at odds with IBC, perhaps because they are unclear in their minds about the distinction between excess deaths and violent deaths.
However, the new survey completely flies in the face Burnham et al. study (second Lancet). In fact, another problem with the media campaign surrounding the PLOS report was that Gilbert Burnham tried to claim their new study is consistent with the Burnham et al. one. It isn’t.
I have looked a little bit at just the time period that was covered by the Burnham et al. 2006 study that had found 600,000 violent deaths. The Hagopian et al. data will come in at around 100,000 deaths for that same time period. So there is a factor-of-six discrepancy between the two. To say these are consistent with each other is really farfetched.
Comparing the way the Hagopian et al. survey has been presented, and the way the Roberts et al. 2004 Lancet survey was presented is also interesting. In both cases you have a central estimate of excess deaths with almost comical uncertainty surrounding it. For Roberts et al. this was an estimate of 98,000 with a confidence interval of 8,000 to 194,000. Then there is a public relations campaign that erases the uncertainty, leaving behind just the central estimate - 100,000 for Roberts et al. and 400,000 for Hagopian et al. Finally, the central estimate is promoted as a sort of minimum, with the “likely” number being even higher than their central estimate. Actually, Hagopian et al. went one step further, inflating up by another 100,000 before declaring a minimum of 500,000.
2. When you read the PLOS report it seemed like they definitely recognized all the criticisms of the 2006 Lancet paper, because it said they did all these steps to avoid those problems. Then when they went to the media they said there was no problem with the Lancet paper at all, and our new report backs it up. It seemed like what they said to the press, and what they actually wrote were two different things.
Right, I completely agree with that. Of course, if the numbers had come out similar between the two survey then they would have, said “look, Burnham et al. was criticized for all these reasons. We fixed all of those things, but it didn’t make a difference, so the criticism was not important.” In fact, what happened was that the Hagopian et al. report fixed most of those things and then the numbers plummeted. Unfortunately, the authors don’t yet seem willing to come to terms with this fact in the public dialogue.
3. Let’s turn to the two Lancet reports. One of your main critiques of the 2006 Lancet report was what you called the “main street bias.” Could you explain what that was and what you thought were the major problems with that Lancet paper?
The main street bias critique is that the 06 Lancet surveyed main thoroughfares where there would be higher likelihood of violence and thus overestimate for deaths in Iraq (Defence And Peace Economics)
That was among the critiques, and that was the first one I made together with some other people. That was just from reading the description of the sampling method Burnham et al. wrote in the paper. This originally arose in a discussion that Neil Johnson and I were having with Burnham mediated by a reporter at Science magazine. He got our input, and then he forwarded that on to Burnham, and then Burnham made a response that came back to us….and so on and so forth. We read carefully what the report said, which was that the people who did the interviews selected a main street at random from a list of main streets. Then interviewers selected a random cross street to that main street, and did their interviews along that random cross street. We argued that such places would tend to be more violent than average, so sampling using that method would tend to overestimate.
At some point in that discussion (that eventually turned into an article for Science magazine) Burnham said they actually didn’t do what was described in the Lancet paper. He said there was a sentence that had been cut from the paper at the demands of the editors to save space, although the paper was actually well below the maximum length for a paper in the Lancet, and there was a lot of text left in the final paper that was certainly more superfluous than ensuring an accurate description of their sampling methodology. That last sentence supposedly said that if there were streets that were not cross streets to main streets those were included in the procedure as well. I thought it really made no sense to have such a procedure. You take a main street at random. Then you choose a cross street to the main street, but if there are streets that aren’t cross streets to main streets, they also included those in some unspecified way. Of course, pretty much wherever you go there are going to be streets that aren’t cross streets to main streets, so why do you even bother to select a main street and a cross street? Inevitably you’ll just find out that there are other kinds of streets as well so you’ll then have to figure out a way to include these too. And how can you operate without a well-defined procedure for selecting streets? That was the moment I realized that something weird was going on with this survey. It seemed that Burnham didn’t even know what his field teams were doing. It also seemed like he was willing to change arguments on the fly without knowing what he was talking about.
We informed the Lancet that we had been told that the authors hadn’t followed their published sampling procedures so maybe there should be a correction, but there was never any correction.
Neil Johnson and some other colleagues still wanted to pursue the logic of what would be implied if the field teams actually followed the procedures they claimed to have followed. We worked this out in more detail and developed a plausible range of assumptions that suggested that the impact of following main-street-biased procedures could potentially be quite large. We suggested likely scenarios that could lead to overestimations by even a factor of three. It seems that in practice Burnham et al. overestimated by a factor of six or so. Perhaps main-street bias can explain a good chunk of this overestimation. I don’t think it really has the potential to explain all of it. At the end of the day I’m not confident that main-street-bias explains much of anything given that we have a glaring ambiguity about what actually happened on the ground in this survey. Burnham says they didn’t actually do what they claimed to have done in the published paper, but he has never specified a viable alternative. Where does that leave us in the end?
4. Do you have any other critiques of the 2006 Lancet?
There are many others. For example, there's a long sad story having to do with the trend in that survey. If you go back on the Lancet website there’s a podcast that was put out right when the Burnham et al. study was published. Gilbert Burnham was asked by an interviewer how he can be confident in these phenomenally high numbers that are so far out of line with other sources. His answer was that he is very confident, because although the numbers are considerably higher than the Iraq Body Count numbers the trends match IBC’s trends quite closely. So that was the confirmation - they got the same trends as IBC.
Then there’s a graph in the paper (figure 4) where they compare the trends from IBC and their own trends. I never understood that graph until there were letters in the Lancet about it. One of the authors was Jon Pederson who was the main person behind the Iraq Living Conditions survey, and Josh Dougherty of IBC also had a letter about this. There were many flaws with the graph, but a crucial one was how they compared the trends. They have three time periods, each of 13 months. Their own (Burnham et al.) figures are just what you’d expect – one for the first 13 months, one for the second and one for the third. But the IBC figures are cumulative. So the first IBC figure covers a 13-month period just like the comparable Burnham et al. figure. However, the second IBC figure covers 26 months and is compared with a 13-month Burnham et al. figure. The third IBC figure covers 39 months and is compared with a 13-month Burnham et al. figure. In short, they present a graph comparing cumulative figures with non-cumulative figures! And do you know what? The IBC cumulative figures sky rocket up just like the non-cumulative Burnham et al. figures. And that’s the confirmation that makes them so confident in their outlying numbers. However, if you compare like with like you see that the Burnham et al. numbers rise much faster than IBC’s, and follow a different pattern.
There was never any follow up to that interview. If you ever interview Gilbert Burnham you might want to ask him: “now that the basis for your confidence in your numbers has been exposed as false will you now be changing your position?”
5. There were two others surveys, the Iraq Living Conditions Survey and the Iraq Family Health Survey. They had radically different findings than the Lancet surveys had. A lot of people compared those, so what were the differences between those other surveys and the two Lancet ones?
The Iraq Family Health Survey (IFHS) covered the same time frame as the 2006 Burnham et al. study. They published a central estimate of 150,000 violent deaths. That would compare to the 600,000 in the Burnham et al., so those were apart by a factor of four. That said, the people who did the IFHS really went into contortions to try to raise their number up as high as possible, so the real distance is actually greater than a factor of four.
The main estimate in the IFHS report was calculated in a different way than is normal. If they had done the usual thing their estimate would have come out around 100,000 or even 80,000. So they did two things to push their number upward. One was to adjust for clusters that had been selected in their randomization procedures, but where they had not been able to complete their interviews because they considered those places too dangerous to enter at the time the survey was done. So they applied an adjustment that had the effect of raising their estimate from about 80,000 up to 100,000. That was not a crazy thing to do although it was quite a dramatic adjustment. It had the implication that the clusters in Baghdad where they hadn’t managed to interview were about four times as violent as the ones where they did. That is a rather bold assumption to make, but leave that aside.
Next the IFHS did an arbitrary fudge upward of an additional 50%. They basically just declared without evidence that surveys tend to under estimate violent deaths. So they raised their number from 100,000 to 150,000 with hardly an attempt at justification.
I would argue that in reality the IFHS found around 80,000 to 100,000 - take your pick.
Even if you accept the fudge up to 150,000 the IFHS is still completely out of line with the Burnham et al. survey, and not just for the overall number. For example, the Burnham et al. survey had a few governorates with incredibly high numbers that aren’t at all supported by other evidence. Burnham et al. also had a dramatic upward trend that isn’t matched by the IFHS or IBC or any kind of other measurement that’s been taken there.
The Iraq Living Conditions Survey really only covered slightly more than the first year of the war. The first Lancet survey by Roberts et al. covered a bit more, about the first 18 months, so they’re not exactly comparable. The best way to think about the first Lancet survey is that it produced virtually no information. They had an estimate of 98,000 excess deaths with a confidence interval of 8,000 to 194,000. Right off the bat it’s just kind of useless because estimates with that kind of uncertainty tell you nothing. They didn’t actually calculate the confidence interval correctly either. If it is calculated correctly it comes out even wider than what was published, although in the end this probably doesn’t even matter.
The Iraq Living Conditions Survey didn’t estimate excess deaths so it is a little bit hard to compare it with Roberts et al. However, you can sort of bridge the gap because there was some data released on Roberts et al., and you can use it to get rid of the deaths after the time period the Iraq Living Conditions Survey was finished. Then you need to focus just on violent deaths. Roberts et al. then has about 70% more violent deaths than the Iraq Living Conditions did. They are not really compatible with one another, but they’re not wildly out of line either. It’s the Burnham et al. survey that is seriously at odds with everything else.
I prefer to focus more on violent deaths. Certainly if you’re trying to compare all of the different sources you have to do this. In some sense you can say that all the excess deaths estimates are kind of compatible with one another because the confidence intervals are so wide that the only reasonable conclusion is that we’ve hardly got any idea about excess deaths, even if you accept that the whole notion of excess deaths as defined in this paper makes sense.
6. You also had problems with how they estimated their excess deaths. You had an article “The Iraq Sanctions Myth” that was talking about a letter published in the Lancet by Sarah Zaidi in 1995 that claimed that half a million children, died due to sanctions. The other was “Sanctions and Childhood Mortality in Iraq” by Mohamed Ali and Iqbal Shah that was also in Lancet in 2000. Subsequent work extrapolated from this work and found 400,000-500,000 excess child deaths in Iraq from 1990-1998. You said that there were problems with their estimates, so a lot of the subsequent surveys were using problematic surveys from before to figure out what the death rate was in Iraq before the war to make their estimate for the excess deaths afterward as well right?
Comparison of survey estimates on child mortality in Iraq during and after the sanctions period (Pacific Standard Magazine)
To answer your last question first, sanctions-era estimates have not been carried forward to feed into excess death estimates made during the war. All the estimates discussed earlier in the interview have used their own surveys to estimate pre-war death rates.
I’m also very critical of the sanctions era estimates of how many children were supposedly killed due to sanctions. These numbers were first based on a survey done and later retracted by Sarah Zaidi. She subcontracted her field work to some government workers in Iraq and, on the basis of the data they gathered, estimated half a million excess child deaths. This number was then cited by Leslie Stahl in her famous interview with Madeleine Albright. Stahl actually won two awards for that interview including an Emmy, but the basis for it turned out to be a survey that was later retracted. The story is that some people found anomalies in the survey. So Zaidi, to her credit, went to Baghdad herself and re-interviewed many of the same households. She found that a lot of the deaths the Iraqi surveyors had reported simply weren’t there. The data were just wrong so this calculation falls even before you question that whole methodology of looking at pre versus post as I do earlier in this interview.
However, the critique of the excess-deaths concept certainly does apply to child deaths in Iraq in the 1990’s. It is not convincing to assume that any differences between pre and post child death rates are due entirely to sanctions. There was so much going on in Iraq besides just sanctions. There was the first Gulf War, there were uprisings both in the south and the north that were suppressed, etc. To the extent that there was an increase in child death rates there could have been a lot of causes besides just sanctions. However, in this case you can just leave that whole critique aside, because the basic measurement was wrong.
Shortly after the Zaidi survey was retracted, UNICEF did a new survey, again subcontracting the fieldwork to Iraqi government officials. They found basically the same thing that Zaidi had found initially, which should have raised red flags straight away. One person goes in and conducts a survey that pretty clearly was manipulated. I don’t think this was Zaidi’s fault and I’ve always praised her for correcting the record, which is rare. However, if the corrected record is true then why is someone else finding something that would completely contradict this newly corrected record? You might also ask why, if we already saw Iraqi government workers manipulate one survey, does UNICEF then create an opportunity for the same thing to happen again? In this particular case we also need to consider that it was a central policy of the Iraqi government to convince the outside world to drop sanctions against it. One of the arguments they were using was that sanctions were hurting Iraqi civilians, in particular Iraqi children. Why then give that government an opportunity to do a UN-sponsored survey to reinforce their foreign policy position? How confident can you be in these results?
So UNICEF got similar results to the ones that Zaidi had just retracted. And those UNICEF results remained the conventional wisdom for several years, going right up to the beginning of the 2003 war and beyond. It was widely believed that sanctions were responsible for the deaths of hundreds of thousands of Iraqi children, but the problem is that since then there have been four further surveys that have all failed to find the massive and sustained spike in the child mortality rate in the 1990’s that Zaidi had found and lost and that the UNICEF survey had supposedly rediscovered. At this point there’s so much evidence piled up against the UNICEF survey that I don’t think a rational individual can believe any more in the sanctions-excess-child-death story that we were sold before the war. You don’t have to even question the excess death concept to grasp this point. All you have to do is look at what all the surveys find. In order to get this massive number of excess deaths you have to have a huge and sustained spike in the child death rate after the sanctions come in, and this simply doesn’t happen in any of the surveys since the UNICEF one from the late '90s.
7. I want to try to address some of the arguments made by people who defend the two Lancet surveys. Some of the most common ones that I’ve heard were that it was published in the Lancet that is a respected journal, it was peer reviewed, and that they did it during a war so you’re never going to get perfect work during that time. Given all that people say that others shouldn’t be so critical of the two surveys. What do you think of that kind of defense?
First of all, saying that something has to be right or is probably right because it has been peer reviewed is quite a weak defense. Peer review is a good thing, and it is a strength of scientific journals that there is that level of scrutiny, but if you look at the list of scientific claims that have turned out to be wrong and that have been published in peer reviewed journals….well…the list just goes on and on and on. Publishing in a peer reviewed journal is no guarantee that something is right. Some of the people who do the referee reports are more conscientious than others. In almost no cases does refereeing ever include an element of replication. Often referees don’t even know enough about literature cited to judge whether claims about the current state of knowledge are accurate or otherwise. Mostly people just assume what they’re being told by the authors of the paper is correct and valid. Peer review is better than no peer review, but it hardly guarantees that something is going to be correct. (Let’s not forget the graph discussed earlier in this interview which survived the Lancet’s peer review procedures.)
Journal peer review is just the beginning of a long peer review process. Thinking that journal peer review is the end of this process is a serious misunderstanding. Peer review is an ongoing thing. It is not something that ends with publication. Everything in science is potentially up for grabs, and people are always free to question. Anyone might come up with valid criticisms.
If you look at Burnham et al. there have been a number of peer reviewed articles that have critiqued it, and said it is wrong. So if you think peer review has to always be correct then you’re immediately in a logical conundrum because you’ve got peer reviewed articles saying opposite things. What do you do now?
As for the Lancet, as a scientific journal over the last decade or more it has had quite a spotty record. Much of what it has published has turned out to be wrong. The Lancet is not considered one of the more reliable scientific journals and it has a reputation for sensationalism. You have to remember that at the end of the day the Lancet is a profit making operation. It is chockablock full of advertising. Library subscriptions are extremely expensive. It brings in millions of pounds of revenue. Sensationalism sells, so by some metric Richard Horton has been a successful journal editor, because he’s gotten a lot of media attention. It’s good for subscriptions, good for advertising, but articles in the Lancet still need to be scrutinized on a case-by-case basis, as is the case with any other journal.
I’m happy to give people credit for doing difficult research in war zones. And I’m happy to admire the courage of people who do dangerous field work. But doing courageous field work doesn’t make your findings correct and we shouldn’t accept false claims just because someone had the guts to go out in the field and gather data. Science is a ruthless process. We have to seek the truth. Courage is not an adequate rebuttal to being wrong.
Boseley, Sarah, “UK scientists attack Lancet study over death toll,” Guardian, 10/23/06
Burnham, Gilbert, Doocy, Shannon, Dzeng, Elizabeth, Lafta, Riyadh, Roberts, Les, “The Human Cost of the War in Iraq, A Mortality Study, 2002-2006,” Bloomberg School of Public Health Johns Hopkins University, School of Medicine Al Mustansiriya University, 9/26/06
Burnham, Gilbert, Lafta, Riyadh, Doocy, Shannon, Roberts, Les, “Mortality after the 2003 invasion of Iraq: a cross-sectional cluster sample survey,” The Lancet, 10/11/06
Giles, Jim, “Death toll in Iraq: survey team takes on its critics,” Nature, 3/1/07
Johnson, Neil, Spagat, Michael, Gourley, Sean, Onnela, Jukka-Pekka, and Reinert, Gesine, “Bias in Epidemiological Studies of Conflict Mortality,” Journal of Peace Research, September 2008
Kaplan, Fred, “Number Crunching Taking another look at the Lancet’s Iraq study,” Slate, 10/20/06
Onnela, J.-P., Johnson, N.F., Gourley, S., Reinert, G., and Spagat, M., “Sampling bias in systems with structural heterogeneity and limited internal diffusion,” EPL, January 2009
Roberts, Les, Lafta, Riyahd, Garfield, Richard, Khudhairi, Jamal, Burnham, Gilbert, “Mortality before and after the 2003 invasion of Iraq: cluster sample survey,” The Lancet, 10/29/04
Spagat, Michael, “Ethical and Data-Integrity Problems In The Second Lancet Survey of Mortality in Iraq,” Defense and Peace Economics, February 2010
- “The Iraq Sanctions Myth,” Pacific Standard Magazine, 4/26/13
- “Mainstreaming an Outlier: The Quest to Corroborate the Second Lancet Survey of Mortality in Iraq,” Department of Economics Department, University of London, February 2009