Some scientists make their careers by criticising other’s research. But who watches the watchmen?
…those self-critical, self-correcting principles of science simply don’t allow for hero-worship. Even the strongest critics of science need themselves to be criticised; those who raise the biggest questions about the way we do research need themselves to be questioned. Healthy science needs a whole community of sceptics, all constantly arguing with one another — and it helps if they’re willing to admit their own mistakes. Who watches the watchmen in science? The answer is, or at least should be: all of us.
Science, the cliché goes, is self-correcting. This is true in two senses – one lofty, one more mundane. The high-minded one is about the principle of science: the idea that we’re constantly updating our knowledge, finding out how our previous theories were wrong, and stumbling — sometimes only after a fashion — towards the truth. The second is that, for science to work, individual scientists need to do the grinding, boring work of correcting other scientists’ mistakes.
All scientists are part of this to some degree: most prominently, the peer-review system, at least in theory, involves scientists critiquing each other’s studies, throwing out the bad or mistaken ones, and suggesting improvements for those that are salvageable.
Some scientists, though, make their entire reputations as critics, loudly drawing attention to the flaws and failings in their fields. It’s hard not to respect these often-eccentric characters, who stand up against groupthink and intellectual inertia, telling entire fields of research what they don’t want to hear. But recent months have given us some cautionary tales about these scientific watchmen, and have shown in excruciating detail how even the most perceptive critics of science can end up bafflingly wrong.
In my own field of psychology, one of the most prominent examples of an uber-critic was Hans Eysenck. From the 1950s all the way to his death in 1997, Eysenck wrote blistering critiques of psychoanalysis and psychotherapy, noting the unscientific nature of Freudian theories and digging into the evidence base for therapy’s effects on mental health (I should note that Eysenck worked at the Institute of Psychiatry, now part of King’s College London, which is my employer).
In one typically acrimonious exchange in 1978, Eysenck criticised a study that had reviewed all the available evidence on psychotherapy. Eysenck argued that this kind of study — known as a “meta-analysis” because it tries to pool all the previous studies together and draw an overall conclusion — was futile, owing to the poor quality of all the original studies. The meta-analysis, he said, was “an exercise in mega-silliness”: a “mass of reports—good, bad, and indifferent—are fed into the computer”, he explained, “in the hope that people will cease caring about the quality of the material on which the conclusions are based.”
Whether or not this was a sound argument in the case of psychotherapy, Eysenck had put his finger on an important issue all scientists face when they try to zoom out to take an overall view of the evidence on some question: if you put garbage into a meta-analysis, you’ll get garbage out.
You would think that a scientist so concerned with the garbage in the scientific literature would do his best to avoid producing more garbage himself. But around the same time as he was excoriating meta-analysis, Eysenck began collaborating with one Ronald Grossarth-Maticek, a therapist working at Heidelberg University. Grossarth-Maticek had, he claimed, run large-scale studies of how personality questionnaires could predict fatal disease, and how his special kind of behavioural therapy could help people avoid cancer and heart disease. Eysenck worked with him to get these data into the scientific literature, eventually producing many dozens of scientific papers and reports.
The first indication that something might be a little off in this research comes if you look at the questions being asked. Usually a personality questionnaire would include questions like “Can you get a party going?” or “Do you enjoy sunbathing on the beach?” The Eysenck-Grossarth-Maticek questionnaire included questions like the following — to which a yes/no answer is required:
“Do you change your behaviour according to consequences of previous behaviour, i.e., do you repeat ways of acting which have in the past led to positive results, such as contentment, wellbeing, self-reliance, etc., and to stop acting in ways which lead to negative consequences, i.e., to feelings of anxiety, hopelessness, depression, excitement, annoyance, etc.? In other words, have you learned to give up ways of acting which have negative consequences, and to rely more and more on ways of acting which have positive consequences?”
Go on. Yes or no? Quickly, please.
The second issue is with the results. Frankly, they’re unbelievable. Answers to the kind of question quoted above could, the pair claimed, classify people into “Cancer-Prone”, “Heart Disease-Prone” or “Healthy” personalities — with massive implications for their lives. Cancer-Prone personalities were an astonishing 120 times more likely to die of cancer than Healthy personalities in the next ten years (the equivalent number for Heart-Disease Prone personalities dying of heart disease was 27 times, also amazingly high). And in a trial of Grossarth-Maticek’s therapy, not one treated patient died of cancer, whereas 32% of the control participants, who received no therapy, did.
As one of Eysenck’s critics, Anthony Pelosi, has argued, “such results are unheard of in the entire history of medical science”. There are only three possibilities: they’re either the most important results ever discovered in medicine (a 100% chance of avoiding cancer after some talking therapy!), grossly mistaken in some way, or made up. I suspect the answer isn’t the first one.
These results were criticised in the early 1990s (including by a statistician who’d seen the raw data, and who more or less argued they were fraudulent), though they were vociferously defended by Eysenck. He wrote that the criticisms, “…however incorrect, full of errors and misunderstandings, and lacking in objectivity, may have been useful in drawing attention to a large body of work, of both scientific and social relevance, that has been overlooked for too long.”
It was only in 2020 that the self-correcting nature of science really started to bite: after a resurgence of criticism from Pelosi and others, King’s College London investigated Eysenck’s work with Grossarth-Maticek, listed many (though not all) of the articles that used the data as “unsafe”, and wrote to the relevant scientific journals advising them to retract the papers. So far, 14 have been pulled from the literature (with a further 64 given an editorial “expression of concern”) — and given how many papers the duo published on these bizarre studies, this may end up being the tip of a rather large iceberg.
How had such a strong advocate of rigour in science ended up presiding over one of the most ludicrous sets of scientific papers ever published? There are allegations that he received funding from tobacco companies, who would have stood to benefit if it was personality rather than cigarettes that caused cancer, and that this might have influenced his reasoning (a conflict of interest that was never fully declared).
But deeper explanations relate to Eysenck’s personality. When he wasn’t railing against psychotherapy, he was publishing and debating almost every other contentious issue in the book, including crime and violence, astrology, extra-sensory perception, and the genetics of race and intelligence. This was someone who loved argument, loved controversy — and most importantly, refused in almost any case to back down under criticism (see this cringeworthy video for further evidence). Once he found himself deeply involved in the Grossarth-Maticek studies, he felt beholden to defend them despite their transparent absurdity.
The strange story of Eysenck — an arch-critic who conspicuously failed to see the flaws in his own work — has come to mind several times while I’ve been following a far more contemporary controversy: the case of John Ioannidis and COVID-19.
It’s fair to say that Stanford University’s John Ioannidis is a hero of mine. He’s the medical researcher who made waves in 2005 with a paper carrying the firecracker title “Why Most Published Research Findings are False”, and who has published an eye-watering number of papers outlining problems in clinical trials, economics, psychology, statistics, nutrition research and more.
Like Eysenck, he’s been a critic of meta-analysis: in a 2016 paper, he argued that scientists were cranking out far too many such analyses — not only because of the phenomenon of Garbage-In-Garbage-Out, but because the meta-analyses themselves are done poorly. He’s also argued that we should be much more transparent about conflicts of interest in research: even about conflicts we wouldn’t normally think of, such as nutrition researchers being biased towards finding health effects of a particular diet because it’s the one that they themselves follow.
Ioannidis’s contribution to science has been to make it far more open, honest, and self-reflective about its flaws. How odd it is, then, to see his failure to follow his own advice.
First, in mid-March, as the pandemic was making its way to America, Ioannidis wrote an article for STAT News where he argued that we should avoid rushing into big decisions like country-wide lockdowns without what he called “reliable data” on the virus. The most memorable part of the article was his prediction — on the basis of his analysis of the cursed cruise ship Diamond Princess — that around 10,000 people in the US would die from COVID-19 — a number that, he said, “is buried within the noise of the estimate of deaths from ‘influenza-like illness’”. As US deaths have just hit 125,000, I don’t need to emphasise how wrong that prediction was.
So far, so fair enough: everyone makes bad predictions sometimes. But some weeks later, it emerged that Ioannidis had helped co-author the infamous Santa Clara County study, where Stanford researchers estimated that the number of people who had been infected with the coronavirus was considerably higher than had been previously supposed. The message was that the “infection fatality rate” of the virus (the proportion of people who, once infected, die from the disease), must be very low, since the death rate had to be divided across a much larger number of infections. The study became extremely popular in anti-lockdown quarters and in the Right-wing populist media. The virus is hardly a threat, they argued — lift the lockdown now!
But the study had serious problems. When you do a study of the prevalence of a virus, your sample needs to be as random as possible. Here, though, the researchers had recruited participants using Facebook and via email, emphasising that they could get a test if they signed up to the study. In this way, it’s probable that they recruited disproportionate numbers of people who were worried they were (or had been) infected, and who thus wanted a test. If so, the study was fundamentally broken, with an artificially-high COVID infection rate that didn’t represent the real population level of the virus (there were also other issues relating to the false-positive rate of the test they used).
Then, an investigation by Stephanie Lee of BuzzFeed News revealed that the study had been part-funded by David Neeleman, the founder of the airline JetBlue — a company that would certainly have benefited from a shorter lockdown. Lee reported that Neeleman appeared to have been in direct contact with Ioannidis and the other Stanford researchers while the study was going on, and knew about their conclusions before they published their paper. Even if these conversations didn’t influence the conduct of the study in any way (as argued by Ioannidis and his co-authors), it was certainly odd — given Ioannidis’s record of advocating for radical transparency — that none of this was mentioned in the paper, even just to be safe.
Ioannidis didn’t stop there. He then did his own meta-analysis of prevalence studies, in an attempt to estimate the true infection fatality rate of the virus. His conclusion — once again — was that the infection fatality rate wasn’t far off that for the flu. But he had included flawed studies like his own one from Santa Clara, as well as several studies of the prevalence that only included young people — biasing the death rate substantially downwards and, again, not representing the rate in the population (several other issues are noted in a critique by the epidemiologist Hilda Bastian). That German accent you can hear faintly in the background is the ghost of Hans Eysenck, warning us about the “mega-silliness” of meta-analysing low-quality studies.
His most recent contribution is an article on forecasting COVID-19, upbraiding the researchers and politicians who predicted doomsday scenarios with overwhelmed hospitals. His own drastic under-prediction of 10,000 US deaths? Not mentioned once.
Although Ioannidis has at least sounded as if he’s glad to receive criticism, some of his discussion of the more mainstream epidemiological models has sounded Eysenckian — for instance, where he described the Imperial College model of the pandemic as having been “astronomically wrong”. There is, of course, a genuine debate to be had on how and when we should lift our lockdowns. There’s also a great deal that we don’t know about the virus (though more reliable estimates suggest, contra Ioannidis, that its infection fatality rate is many times higher than that of the flu). But Ioannidis’s constant string of findings that all confirm his initial belief — that the virus is far less dangerous than scientists are telling you — gives the impression of someone who has taken a position and is now simply defending it against all comers.
And for that reason, it’s an important reminder of what we often forget: scientists are human beings, and are subject to very human flaws. Most notably, they’re subject to bias, and a strong aversion to having their cherished theories proved wrong. The fact that Ioannidis, the world’s most famous sceptic of science, is himself subject to this bias is the strongest possible confirmation of its psychological power. The Eysenck and Ioannidis stories differ in very many ways, but they both tell us how contrarianism and iconoclasm — both crucial forces for the process of constant scepticism that science needs to progress — can go too far, leading researchers not to back down, but to double-down in the face of valid criticism.
Above, I should really have said that John Ioannidis was a hero of mine. Because this whole episode has reminded me that those self-critical, self-correcting principles of science simply don’t allow for hero-worship. Even the strongest critics of science need themselves to be criticised; those who raise the biggest questions about the way we do research need themselves to be questioned. Healthy science needs a whole community of sceptics, all constantly arguing with one another — and it helps if they’re willing to admit their own mistakes. Who watches the watchmen in science? The answer is, or at least should be: all of us.