Mark Van Der Laan is a Professor of Biostatistics and
Statistics at the University of California, Berkeley. In 2005 he won a presidential
award for his work. In 2006 he wrote a number of articles about the two surveys
on deaths in Iraq since the 2003 invasion that publicly became known as the
Lancet reports. The first Lancet paper was published in October 2004 and estimated 98,000 excess
deaths in the 18 months following the overthrow of Saddam, excluding
the province of Anbar. The second one argued there were 654,965 killed from March
2003 to July 2006. Van Der Laan was one of many who questioned the reliability
of these polls. Unfortunately, those critiques remained mostly academic, and
were never heard of by most of the public. Today as violence is increasing in
Iraq and the insurgency is making a comeback, the Lancet studies are being
brought up again even though they have major flaws. Here is Prof. Van Der Laan
explaining his views of the Lancet reports.
The two Lancet papers on deaths in Iraq created huge
controversies with their large estimates of 98,000 killed from 2003-2004 and
654,965 from 2003-2006 (BBC)
1. Both Lancet reports used a very small sample size. The
1st Lancet went to 33 clusters of 30 households each with each house
representing roughly 739,000 people. The 2nd Lancet included 47
clusters of 40 households, for an average of 577,000 each. What can happen if
only a few people are polled in a rather large population?
A statistical procedure maps the sample into an estimate of
the number of deaths and a 95% confidence interval that is constructed in such
a way that it will contain the true number of deaths (approximately) 95% of the
times. This confidence interval takes into account the uncertainty in the
estimate, and is therefore by far the most important output of a study. When
the sample size (33 and 47) is small, the estimates of the number of deaths
have a large standard error, and as a consequence the confidence intervals will
be wide. In these cases the study might provide little information. This is
exactly what happened in the first Lancet study in which case the confidence
interval was given by 8,000-194,000 showing that the study could only claim
with 95% confidence that there were more than 8000 deaths, while the Iraq body
count at that time was around 25,000. So the only sound conclusion of this
first Lancet study is that it failed to provide any new information, even if we
ignore the potential biases of this study.
Because of its large range, the first Lancet study could
only predict that 8,000 or more Iraqis were killed from 2003-2004. Van Der Laan’s
work with the second Lancet data found that its range was only around 290,000
dead, not its reported 426,369-793,663. (Reuters)
2. You mentioned the very wide ranges for the possible
number of excess deaths in Iraq after 2003 that the Lancet papers came up with.
The first one had a range of 8,000-194,000 killed, while the second one was
from 426,369-793,663. You received some data on the 2006 Lancet from a few of
the authors, and did your own statistical analysis. First, are their problems
with having such a wide range, and second, what did you find from your study of
the 2nd Lancet numbers?
These ranges of excess deaths are so called confidence
intervals and in any scientific journal these represent the only reliable
output of a study: i.e. the estimate of the number of excess deaths is itself
not meaningful and can only be interpreted in combination with an estimated standard
error of this estimate. So the scientific value of the results of the first
Lancet study for the scientific community is that the number of deaths was
somewhere between 8000-194,000, and therefore this study had no news to report.
To prevent such studies that represent a waste of resources, one typically
first carries out so called sample-size calculations that can determine what
sample size is needed to end up with confidence intervals of small enough width
so that the study can provide some valuable information to the larger
scientific community. Clearly, that was not done.
However, as long as the study, including the statistical
procedures employed, is scientifically sound, at least we can trust the
confidence intervals. Our biggest concern with both studies was that that they
were suffering from serious biased sampling or measurement error causing these
out of whack estimates and biased confidence intervals, as was confirmed by the
much larger and more reliable Iraq Living Condition study in 2004 that
contradicted the first Lancet study, and the WHO study in 2008 that
contradicted the second Lancet study.
Having said this, as a statistician I was interested in the
development of statistical methods for construction of confidence intervals
that can be trusted when the sample size is so small, assuming a reliable
random sample. The confidence intervals reported in the study rely on sample
averages across the 33 or 47 clusters to have a so-called normal distribution.
Therefore, I presumed that, even if we would ignore the potential biases of
these studies, the reported confidence intervals should still be quite
unreliable due to the small sample size and the highly non-normal type
distribution of the number of deaths across the clusters of households: the
number of deaths across clusters had many zeros and many large numbers, clearly
showing that assuming this normality assumption is unreasonable. Therefore, I
became interested as a statistician in developing more robust confidence
intervals that could be used in future studies of this type. For a robust
confidence interval method I found a lower bound of around 100,000 for the
second Lancet study. Subsequently, with a post-doctoral student of mine at the
time, Michael Rosenblum, we wrote an article “Confidence Intervals for
the Population Mean Tailored to Small Sample Sizes, with applications to survey
sampling” also developing semi-robust confidence intervals that rely not as
much on the normality assumption, but are not assumption free either, and
applied them to this data set from the second Lancet study. Using this
semi-robust method we found a lower bound for the confidence interval around
290,000. Again, these modifications of the confidence intervals are assuming
that the sample is a random sample and are thus by no means meant to correct
for the other potential biases of the studies.
3.
You had problems with some of the fieldwork. Many studies explain how they do
their survey work. The Lancet papers did not. Later, the authors said that
their teams were able to complete a cluster of 40 houses in one day in 2006.
Was that enough time to question a household?
This is hard to judge without an explanation of how this
study was exactly run. Clearly, this is not enough time if it was done by one
single team: as we wrote “They moved from one household to the other
within the context of tribal communities brimming with distrust, explained
their mission and succeeded in gaining access to the living quarters, got into
a person’s confidence, asked for intimidate experiences, listened to personal
stories of loss and grief – and all this within 18 minutes per household,
assuming a 12 hour workday.” I read in your blog that the authors have
been giving contradictory statements about this simple fact (i.e., single or
multiple teams) as well.
The biggest concern we had at the time that we needed to
know how the experiment was carried out in order to judge if the sample is a
reliable random sample. For example, what questionnaire was used, how where the
interviews conducted, were the interviewers supervised, were the houses a
priori randomly sampled with no preference for areas with high violence, were
the counted deaths independently verified, when visiting a village, how did one
arrange to only count the a priori specified households without upsetting the
local population due to having to skip households in which violent deaths
occurred, and so on. Many have tried to get answers, but crucial information
was simply not provided.
It is the responsibility of the designers of the study to
document the operations of the study and to be completely open about it so that
the scientific community is able to assess the scientific validity of the
study. To make a long study short, Professor Spagat has investigated the lack
of scientific validity of these Lancet studies in detail and published
on it. Eventually, after many years of pointing out the issues, and
pressure from various journalists, the American Association for Public Opinion
Research conducted an 8 month study and stated that the main author of the
Lancet Study repeatedly refused to make public essential facts about his
research on civilian deaths in Iraq. Shortly thereafter Johns Hopkins
University itself where the main author is a faculty member publicly stated
that this study had violated scientific standards, and the main author was
censured accordingly by its own university.
Indeed, when Leon Dewinter asked me to evaluate these
studies from a statistical perspective, I wanted to get answers about how the
fieldwork was conducted. Not getting these answers, and the manner in which the
articles were written, made me uncomfortable about the validity of these
studies. Having heard a radio interview from one of the authors stating that he
was afraid due to tensions in the village, laying low in his car to stay out of
sight, while the field workers were doing their interviews without supervision,
did only make me more suspicious about the scientific validity of the
operations.
4.
The authors claimed that violence was equally spread throughout Iraq, and that
they would cover all areas of the country as a result. What are some problems
with this argument?
I do not believe they claimed that violence was equally
spread throughout the country, since everybody knew that was not the case.
However, regarding the design of the study, we found it remarkable that a priori
knowledge about the places in which violence has been prevalent was not used in
the design of the analysis: a better design would have been a stratified sample
that samples frequently in areas in which violence was known to be prevalent
and samples less in areas in which it has been relatively calm. Especially,
given that the design sampled few clusters, such considerations are extremely
important and would have resulted in smaller confidence intervals.
As we wrote in our article, “Instead the authors advertise
the selected design of the study as the accepted standard, which would then
wrongly imply that it makes sense to sample as many clusters of households in
areas in which violence is non-existent as in violent areas, as long as these
areas have the same population size.” Their design should have used this
available information.
5.
The Lancet authors wrote that 80% of the deaths recorded in 2004 had death
certificates. You didn’t think that was a believable figure, and even if it was
it offered a missed opportunity for the researchers. Can you explain those two
points?
Firstly, in order to trust the deaths that were counted in
these Lancet studies it is important that these were verified in some reliable
way. The authors reported that 80% of the counted deaths were verified with a
death certificate. If we accept that number, and we also accept their
predicted number of 650,000 “excess deaths” in Iraq in the post-invasion
period, as reported by the second Lancet survey, then we should also accept
that more than 500,000 additional death certificates have been issued by
government organizations. As Leon de Winter and I wrote in our article: “In
other words, these government organizations are somehow hiding all of these
certificates from the public.” In addition, if 80% of the deaths are actually
reported to the Government and result in an official death certificate, why are
we then not just counting the death certificates, which would be a much better
study than trying to obtain insight through random sampling of relatively few
households.
6.
You broke down one 12-month period from the 2nd Lancet. There were
an estimated 330,000 people killed from June 2005-June 2006. That breaks down
to 27,500 per month, 6,875 per week, 982 per day. According to the authors that
would also mean 40,000 died from Coalition air strikes, 60,000 from car bombs,
40,000 from explosions, and 174,000 from gunshots. Based upon press reports,
Iraq Body Count only had 22,030 deaths for that period, or an average of 60 per
day. The media can never cover all casualties in a country, especially in a war
zone, but do you think that it could miss that magnitude of violence?
As Leon and I noted in our article: “These are extremely
large numbers – and we would have to believe that the hundreds of independent
radio stations, TV-stations, newspapers and magazines that operated in Iraq did
not notice these massacres. According to this survey, American air strikes must
have erased whole neighborhoods without the press noticing.” As you can
imagine, the Iraq Body Count was not happy with these claims by the Lancet
study. As we now know, the much larger and better designed WHO-study in 2008
came up with a much smaller estimate of around 150,000 for a larger population
and using an significant ad hoc upward adjustment that was not used in the
Lancet instead of 650,000, and as Professor Spagat shows the two central
estimates of these two studies actually differ by a factor 6.6 when put on a
comparable basis.
7.
Most studies take months of peer review before they are published. The first
Lancet was finished in September 2004, and published the next month. The second
Lancet seemed to have a quick turn around from its completion to it appearing
in The Lancet as well. What was happening in the U.S. when these papers were
published, and do you think that was a coincidence?
It is unheard of to publish a paper within a month, let
alone such a high profile paper that is known to have such an impact on the
country. And one should seriously wonder if it was a coincidence that four days
after The Lancet took the world’s headlines with this survey, the American
people would vote for its new president. We also have to keep in mind that both
the editor of the Lancet and one of the authors were politically active. It
makes one wonder if the study was done under enormous time pressure and the
publication was pushed out that fast to get it out before the election. The
second Lancet study was again published in October right before the 2006
elections. Both publications made national headlines, and had an enormous
impact on society, while both publications should have raised serious warnings.
For example, the first Lancet study contradicted the much larger Iraq Living
Conditions study that was occurring at the same time. The Lancet should have
been alarmed and in that manner they could have prevented the second Lancet
study, which even made more dramatic claims, and again was heavily contradicted
by a much larger reliable study.
8.
You mentioned a few other surveys conducted in Iraq during this time period
that also tried to estimate the number of killed during the war. Can you
explain some of their findings, and how they compared to the Lancet in terms of
their fieldwork?
Right before the first Lancet study, there had been a much
larger Iraq Living Conditions 2004 survey sampling 22,000 households instead of
less than 1,000. It reported a predicted death count of 24,000 with a 95
percent confidence interval from 18,000-29,000, completely aligned with the
Iraq Body Count death toll. These results were published shortly after the
publication of the first Lancet study. That is, a much more reliable study
yields a predicted death count a factor 4 smaller than the predicted death
count in The Lancet’s 2004 article. Of course, one should wonder why the
100,000 from the Lancet study became world news, in spite of the confidence
interval 8,000-194,000, while this large study was conveniently ignored by the
media and the authors of the Lancet study.
Regarding the WHO Study that contradicted the second Lancet
study, I quote from a letter by Neil Johnson, David Kane, Seppo Laaksonen, Mark
van der Laan, Peter Lynn, Fritz Scheuren, and Michael Spagat, we had submitted
to Science magazine asking for an independent investigation of the second
Lancet study:
“John Bohannon’s article “Calculating Iraq’s Death Toll: WHO
Study Backs Lower Estimate” (18 January 2008, p. 273) exposes some serious
weaknesses in the second (2006) Lancet study (1) of Iraq mortality (L2).
The WHO study (2) has a much larger sample, is much better supervised and uses
sampling methods that are greatly superior to the published methods of the L2
survey (3). The WHO team even seems to bend over backwards to minimize the
distance between the two surveys. Yet even WHO’s apples-with-oranges
comparison, 151,000 estimated violent deaths for WHO versus 601,000 for L2,
leaves L2 exceeding WHO by 450,000 violent deaths.
Actually the factor-of-four difference greatly understates
the true discrepancy between WHO and L2 for two main reasons. First, the WHO
study applies quite a substantial upward adjustment to its estimate to
compensate for an assumed reporting bias that is “common in household surveys”.
However, if this generic argument applies to the WHO survey then it applies
with equal force to the L2 survey. Second, WHO applies its estimated violent
mortality rate to a larger population estimate than does L2. We address these
distortions by comparing estimated violent-death rates, rather than totals, in
the two surveys. L2 reports a post-war violent death rate of 7.2 per
1,000 per year compared to 1.09 per 1,000 per year for WHO. The two central
estimates differ by a factor of 6.6 when put on a comparable basis.”
9.
The Lancet papers received a huge amount of press when they were released. They
are still mentioned to the present day even though you and others have found
major flaws with them. How can you explain the staying power of the two papers?
The Lancet should not have published these articles. The
fact that these articles were published in the Lancet suggests enormous
credibility and allows people to refer to them as if they represent scientific
truth. Combined with the enormous publicity these articles received in the
media, it makes a powerful story out there for people to use.
Another issue is that countering a dramatic story does not
receive much support from the media. At the time, Leon de Winder and I
submitted our article to lots of newspapers right after the publication of the
second Lancet study, but none of them were willing to publish our response. For
example, The New York Times had asked me to comment on the second Lancet study
the day before it would reach the national headlines, and they were grateful
for my comments, making them decide to move it away from the headlines, in
contrast to most other newspapers. However, they were not willing to
publish our response. Part of the problem might have been that it did not
represent what people wanted to hear at the time, and newspapers apparently
need to be sensitive to that. Interestingly, our article somehow reached people
through the Internet and that is how you probably ran into it. So it found its
own life. Similarly, when a few years later we submitted a letter to Science to
ask for an independent investigation of the second Lancet study, it was again
rejected. Discrediting what once was headline-news is not an easy task and
takes time and effort from many people who care about truth.
SOURCES
Van Der Laan, Mark, “”Mortality after the 2003 invasion of
Iraq: A cross-sectional cluster sample survey”, by Burnham et al (2006, Lancet,
www.thelancet.com): An Approximate
Confidence Interval for Total Number of Violent Deaths in the Post Invasion
Period,” Division of Biostatistics, University of California, Berkeley,
10/26/06
Van der Laan, Mark de Winter, Leon, “Lancet,” November 2006
- “Statistical Illusionism,” U.C. Berkeley, 2006
Spagat, Michael, “Ethical and Data-Integrity Problems In The
Second Lancet Survey of Mortality in Iraq,” Defense and Peace Economics,
February 2010
- “Mainstreaming an Outlier: The Quest to Corroborate the
Second Lancet Survey of Mortality in Iraq,” Department of Economics Department,
University of London, February 2009
Rosenblum, Michael, van der Laan, Mark J., “Confidence
Intervals for the Population
Mean Tailored to Small Sample Sizes, with Applications to
Survey Sampling,”
The International Journal of Biostatistics, 2009
1 comment:
Interesting.
If Iraq Body Count always seemed somehow cavalier, Just Foreign Policy was still an outlier. Mother Jones' The Iraq Math War noted military interest in suppressing information of civilian casualties. There is a pattern of mis and dis information prevalent in politics. Indeed, climate science sparked the interest of a climate scientist's son in the matter of public perception of science : Roger Pielke Jr.
Here is another look at that subject : http://my.opera.com/nepmak2000/blog/2011/12/06/psyops-2012-what-you-see-is-not-what-you-get And there is Leading to War - the website movie about the epic WMD scare that went so far as to sacrifice Brewster Jennings operatives on the altar of killing CIA assessments via the nuclear threat desk by 'blowing' operative Valerie Plame ( Wilson ). Scooter Libby of Cheney's office served time for that act. Also of interest is that same office was implicated in 'manufacturing intelligence.'
Not that I have been silent on the topic myself, collecting intel which was silenced by DCMA games.
Post a Comment