April 5, 2005
The recent announcement of a newly discovered, arsenic-based life form has come under fire from microbiologists, who say the research paper is fatally flawed . So far, the paper's authors have refused to respond to the criticism, on the grounds that "any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated." That is to say, they won't acknowledge the informal vetting provided by their colleagues on science blogs and in the press. In the following column, first published in 2005, Daniel Engber explains why formal peer review doesn't always work.
In September 2001, the Journal of Reproductive Medicine weighed in on the healing power of God. A Columbia University research group reported that patients at a fertility clinic in Seoul were twice as likely to get pregnant when Christians prayed for them. Within a month, the study was in the New York Times science section and on Good Morning America , where the medical editor for ABC News called it "very well done" and opined that "getting pregnant involves a lot of biological, psychological, maybe even spiritual factors that we don't yet understand."
The prayer study has since fallen from grace. Scientists around the world wrote angry letters to the journal attacking the methodology, and the research-protections office of the Department of Health and Human Services looked into whether the subjects had properly given consent. Last year, the study's senior author removed his name from the paper, saying that he hadn't directly participated in the research. The real lead author will not discuss the work, and the third author—a parapsychologist, lawyer, and convicted con man—is now serving time in a federal prison (for an unrelated charge of fraud).
Why did this quackery get so far before being exposed? The prayer study seemed legitimate because it appeared in the pages of a "peer-reviewed" medical journal. That means the paper was vetted by an independent panel of experts in the field.
Peer review is the gold standard of modern science. For medical researchers and other scientists, it's the gateway to funding, publication, and career advancement. When they apply for government grants from the National Institutes of Health or the National Science Foundation, their proposals are reviewed by a panel of their colleagues . When they submit their completed work for publication, journals and university presses ask for the opinions of others in the field. And when they apply for jobs or tenure, scientists are judged largely on the basis of their peer-reviewed publications.
Scientists give peer review so much authority because they view it as a part of the grand tradition of scientific inquiry—an extension , even, of the formal experimental method. Peer evaluation is the endpoint of a cautious progression from theories and predictions to experiments and results. The system dates from the 1700s, when the Royal Society of London set up a "Committee on Papers" with the power to solicit expert opinions. It became the standard for scientific publication only after World War II, when the dramatic expansion of scientific research swamped journal editors and made them look to outsiders for help. Ever since, scientists have claimed that peer review filters out lousy papers, faulty experiments, and irrelevant findings. They say it improves the quality of an accepted paper by providing helpful comments for revision. And they can't imagine a better way to accomplish these goals.
So, what explains the Columbia prayer study? Journal editors will tell you that peer review is not designed to detect fraud—clever misinformation will sail right through no matter how scrupulous the reviews. But the prayer study wasn't a clever fraud. It was sprinkled with suspect elements, not the least of which was a set of results that violated known laws of science. The authors also used a needlessly convoluted experimental design; these and other red flags in the study have been cataloged on the Web by obstetrician and enthusiastic debunker Bruce Flamm. Even on its own terms, then, as a filter for lousy papers and bad experiments, peer review of the Columbia prayer study was a spectacular failure. Here's the problem: Despite its authority and influence over every aspect of the scientific community, no one has ever shown that peer review accomplishes anything at all.
In 1986, a deputy editor at the Journal of the American Medical Association named Drummond Rennie announced the first scholarly conference on peer review. A series of high-profile cases of scientific fraud had hit the medical journals in the early '80s, and lousy papers were making their way into print on a regular basis. Editors like Rennie were in the mood for critical self-examination .
If peer review didn't work, it was an extraordinary waste of time . Rennie proposed to study whether the system, with all its shortcomings, actually improved the quality of published research.
He didn't invent the idea—another medical editor, Steven Locke, had just published a book that asked the same question—but he nurtured and led the small group of maverick medical editors who were ready to challenge the status quo. In 1986, when Rennie announced the First International Congress on Peer Review and Biomedical Publication , there were only a handful of (peer-reviewed) papers on the topic; now there are several hundred and Rennie's congress convenes every four years. A cartoon in a special issue of JAMA devoted to the group's 2001 meeting in Barcelona shows the white-bearded Rennie dressed like Moses, leading his fellow scientists and editors through the desert of Sinai.
Rennie and his companions have spent almost 20 years in the desert, yet the golden calf is still intact. The study of peer review turns out to be tremendously difficult. To test whether it works, you'd need to compare the quality of papers that had gone through peer review with the quality of those that hadn't. But how would you get papers for the control group, given all the professional benefits that come with peer review? And assuming you could convince scientists to forgo the process, how could you objectively judge the quality of the papers? At Rennie's fifth congress this year in Chicago, several hundred studies will be presented, but no one will claim to have answered the big question: Does peer review work?
In the meantime, the system is threatening to collapse under its own weight (click here for more about how). What's the fix? Rennie's gang has produced a few tantalizing bits of data. One set of studies shows that revealing the identities of reviewers has no effect on the quality of their efforts, contradicting the standard argument that anonymity fosters rigorous criticism and high standards. In response to findings like these, some journals, like the British Medical Journal , have switched to "open review," telling authors who has reviewed their papers.
More sweeping changes have also been suggested. Well-trained, full-time editors and professional statisticians might be able to perform the functions of peer review on their own. Or scientists en masse might be recruited: Paul Ginsparg, who runs a digital archive for unpublished physics papers, has suggested that putting "preprints" of scientific papers on the Web could let the community as a whole decide which papers are most useful. Unpublished work could be tracked by an objective measure—like how often it's cited or downloaded—and then passed along for formal publication. Government funders like the NIH could hire professional reviewers to evaluate grants, or they could replace grants with cash prizes for successful research.
When journal editors are asked about these ideas, they often quote Winston Churchill's line, "Democracy is the worst form of government except all those other forms that have been tried from time to time." Or rather, they quote other journal editors quoting that line. But it's a poor analogy, since few alternatives to peer review have been tried in modern times. And democracy isn't really a good description of peer review, either. Sure, peer review allows scientists to participate in a system of self-governance. But wouldn't BMJ 's policy of open review or Ginsparg's proposal for Web-published preprints be far more democratic?
So far, though, the Churchill quoters are winning. A good study of peer review will take more than a small band of outsiders working part time with little money. We need a coordinated effort by scientific journals, funding agencies, and research scientists. It wouldn't be that expensive. But the funds have not been forthcoming. Instead, the federal government recently proposed using peer review "to improve the quality, objectivity, utility, and integrity of information" that it gives to the public for things like FDA drug requirements and HHS dietary guidelines. Ironically, most scientists hated the idea when it was first presented; the original version of the proposal stated that only researchers from private industry, who don't receive federal funding, would get to participate. (That provision has since been eliminated. * ) But forget arguing over the details of peer review. Let's first figure out whether it works.
Correction, April 8, 2005: The original version of this column missated a provision of the federal government's proposal on peer review. The revised version of the proposal would allow federally funded scientists to participate.