Recent concerns of serious side effects in drugs (culminating in the withdrawals of blockbuster drugs such as Merck’s Vioxx) have lead to an increased public interest in the safety and toxicity of pharmaceutical products. Drug safety data are commonly included in drug labels or the fine print on the second page of high-gloss magazine advertisements. For example, the official “Complete Prescribing Information and Medication Guide for ADVAIR DISKUS®”, a medication I had to use recently and one of the best selling drugs ever lists 16 common (observed in more than 3% of the study subjects) side effects on page 16. In clinical trials, these side effects are called adverse events. I found similar frequency tables of adverse events in recent issues of Time Magazine (and others) for drugs such as Lamisil, Flonase, Clarinex, Allegra or Botox. An example for adverse events in a drug (from a past phase II clinical trial) I actually have the complete data for (and not only the marginal percentages displayed below) is this one:
How can patients, doctors or other health officials use these data? It is very hard by just looking at the marginal percentages to say anything meaningful on how the placebo compares to the drug. (And yes, it is fascinating to see that some percentages of reported adverse events under placebo are on par with or sometimes exceed the ones reported under the drug treatment, a phenomenon that everyone who has read German author Thomas Mann’s famous book “Der Zauberberg” (The Magic Mountain) understands.) Part of my research work is to find ways to supplement these simple frequency tables with useful information about the significance of a difference between the incidence rates under a drug and a placebo so one can draw more informed conclusions from such data. This could be in the form of simultaneous confidence intervals for the difference (or ratio) of marginal incident rates or a simple margin of error estimate such as the one people are used to from elections: “The drug leads placebo by 3% ( 5%) in terms of reported diarrhea incidents” or “A urinary track infection was at least 4 times more likely to occur under the drug than the placebo.” The difficulty is coming up with or “at least” estimates that are valid.
A naïve approach forms, for each adverse event separately, a 95% confidence interval for the risk difference or the risk ratio (i.e., relative risk). Suppose we do this for the 11 adverse events for the data displayed above. Then, the probability that at least one of the 11 confidence intervals formed will not contain the true risk difference (or risk ratio) is equal to 1 minus the probability that all intervals cover the true parameter = = 43%. For the 16 adverse events of the Advair Diskus, this becomes even larger than 50% (56% to be precise). These calculations actually assume that adverse events occur independent of each other, which is the worst case scenario for the above probability calculation (under the assumption that the test statistics on which the confidence intervals are based are jointly multivariate normal), but you get the idea that viewed as an ensemble, some of the intervals most likely fail in capturing the true risk difference or ratio. This is mostly dissatisfying from a drug’s sponsor point of view, namely when a confidence interval is completely above 0 (which means that the occurrence of the corresponding adverse event is significantly higher under the drug then under the placebo treatment) or above 1 for the risk ratio. Many argue that a sponsor’s concerns are irrelevant when it comes to drug safety, but there might also be a loss to society in that a potentially very beneficial drug does not reach the market because of (likely false) safety concerns.
The shortcoming of the naïve approach are easily remedied by insisting to form = 99.53% confidence intervals for each comparison. For then, even in the worst case scenario, the probability of at least one interval failing to capture the true difference is = 5%, which seems low enough. In fact, it is too low, often much lower than 5% when correlation among adverse events is taken into account. With such a strict control over the error rate there is not much power left to detect potential safety signals, i.e., identify those adverse events where there truly is a difference in the adverse event rate.
In my opinion, the solution, as so often, lies in the middle of these two extreme cases and the key is to incorporate the information on the correlation between adverse events when forming simultaneous confidence intervals. These correlations can be significant, as many adverse events can be grouped by body function or are based on the same underlying physiological relationship (think about the association between adverse events “Diarrhea” and “Abdominal Pain” in Table 1). One straightforward way to incorporate the correlation is by so called resampling methods such as the permutation or bootstrap approach, but this is for another blog.