Why Much of the Medical Literature Is Wrong
Written by Editor   
Thursday, September 18, 2014 03:26 PM

Much of the medical literature is prone to bias and is, in fact, wrong.  A recent article points out some of the reasons why, given a statistical association between X and Y, most people make the assumption that X caused Y. However, we can easily come up with 5 other scenarios to explain the same situation.

Reverse Causality

Given the association between X and Y, it is actually equally likely that Y caused X as it is that X caused Y. In most cases, it is obvious which variable is the cause and which is the effect. If a study showed a statistical association between smoking and coronary heart disease (CHD), it would be clear that smoking causes CHD and not that CHD makes people smoke. Because smoking preceded CHD, reverse causality in this case is impossible. But the situation is not always that clear-cut. Consider a study published in the NEJM that showed an association between diabetes and pancreatic cancer. The casual reader might conclude that diabetes causes pancreatic cancer. However, further analysis showed that much of the diabetes was of recent onset. Therefore, this was not a case of diabetes causing pancreatic cancer but of pancreatic cancer causing the diabetes.

Mistaking what came first in the order of causation is a form of bias. There are numerous examples in the literature. For example, an assumed association between breast feeding and stunted growth,  actually reflected the fact that sicker infants were preferentially breastfed for longer periods. Thus, stunted growth led to more breastfeeding, not the other way around. Sometimes it is difficult to disentangle which factor is the cause and which is the effect.

The Play of Chance and the DICE Miracle

Whenever a study finds an association between 2 variables, X and Y, there is always the possibility that the association was simply the result of random chance.  Most people assess whether a finding is due to chance by checking if the P value is less than .05. 

To illustrate the point, consider an elegant experiment using 3 different colored dice to simulate the outcomes of theoretical clinical trials and subsequent meta-analysis. Students were asked to roll pairs of dice, with a 6 counting as patient death and any other number correlating to survival. The students were told that one dice may be more "effective" or less effective (ie, generate more sixes or study deaths). Sure enough, no effect was seen for red dice, but a subgroup of white and green dice showed a 39% risk reduction (= .02). Some students even reported that their dice were "loaded." This finding was very surprising because only ordinary dice were used. Any difference seen for white and green dice was a completely random result.

The Frequency of False Positives

Most researchers set their significance level or rate of type 1 error at 5%. However, if you perform 2 analyses, then the chance of at least one of these tests being "wrong" is 9.75%. Perform 5 tests, and the probability becomes 22.62%; and with 10 tests, there is a 40.13% of at least 1 spurious association even if none of them are actually true. Because most papers present many different subgroups and composite endpoints, the chance of at least one spurious association is very high. Often, the one spurious association is published, and the other negative tests never see the light of day.

There is a way to guard against such spurious findings: replication. Unfortunately, the current structure of academic medicine does not favor the replication of published results, and several studies have shown that many published trials do not stand up to independent verification and are likely false positives. In a 2005 review of 45 highlighted studies in major medical journals, 24% were never replicated, 16% were contradicted by subsequent research, and another 16% were shown to have smaller effect sizes than originally reported. Less than half (44%) were truly replicated.

The frequency of these false-positive studies in the published literature can be estimated to some degree. Consider a situation in which 10% of all hypotheses are actually true. Now consider that most studies have a type 1 error rate (the probability of claiming an association when none exists [ie, a false positive]) of 5% and a type 2 error rate (the probability of claiming there is no association when one actually exists [ie, a false negative)] of 20%, which are the standard error rates presumed by most clinical trials. This would imply that of the 125 studies with a positive finding, only 80/125 or 64% are true. Therefore, one third of statistically significant findings are false positives purely by random chance. That assumes, of course, that there is no bias in the studies, which we will deal with presently.

Bias: Coffee, Cellphones, and Chocolate

Bias occurs when there is no real association between X and Y, but one is manufactured because of the way we conducted our study. The most common biases can be broadly categorized into 2 main types: selection bias and information bias.

One classic example of selection bias occurred in 1981 with a NEJM study showing an association between coffee consumption and pancreatic cancer. The selection bias occurred when the controls were recruited for the study. The control group had a high incidence of peptic ulcer disease, and so as not to worsen their symptoms they drank little coffee. Thus, the association between coffee and cancer was artificially created because the control group was fundamentally different from the general population in terms of their coffee consumption. When the study was repeated with proper controls, no effect was seen.

Information bias, as opposed to selection bias, occurs when there is a systematic error in how the data are collected or measured. Misclassification bias occurs when the measurement of an exposure or outcome is imperfect; for example, smokers who identify themselves as nonsmokers to investigators or individuals who systematically underreport their weight or overreport their height. A special situation, known as recall bias, occurs when subjects with a disease are more likely to remember the exposure under investigation.

An interesting type of information bias is the ecological fallacy. The ecological fallacy is the mistaken belief that population-level exposures can be used to draw conclusions about individual patient risks. A recent example of the ecological fallacy, was a tongue-in-cheek study  showing that countries with high chocolate consumption won more Nobel prizes. The problem with country-level data is that countries don't eat chocolate, and countries don't win Nobel prizes. People eat chocolate, and peoplewin Nobel prizes. This study, while amusing to read, did not establish the fundamental point that the individuals who won the Nobel prizes were the ones actually eating the chocolate.


Confounding, unlike bias, occurs when there really is an association between X and Y, but the magnitude of that association is influenced by a third variable. Whereas bias is a human creation, the product of inappropriate patient selection or errors in data collection, confounding exists in nature.  For example, diabetes confounds the relationship between renal failure and heart disease because it can lead to both conditions. Although patients with renal failure are at higher risk for heart disease, failing to account for the inherent risk of diabetes makes that association seem stronger than it actually is.

Confounding is a problem in every observational study, and statistical adjustment cannot always eliminate it. Even some of the best observational trials fall victim to confounding. Hormone replacement therapy was long thought to be protective for cardiac disease until the Women’s Health Initiative randomized trial refuted that notion. Despite the best attempts at statistical adjustment, there can always be residual confounding. 

Exaggerated Risk

Finally, let us make the unlikely assumption that we have a trial where nothing went wrong, and we are free of all of the problems discussed above. The greatest danger lies in our misinterpretation of the findings, it is paramount to remember that statistical significance does not imply clinical significance.

Source:  http://www.medscape.com/viewarticle/829866?src=wnl_edit_specol&uac=151914AX#4