How can patients, doctors or other health officials use these data? It is very hard by just looking at the marginal percentages to say anything meaningful on how the placebo compares to the drug. (And yes, it is fascinating to see that some percentages of reported adverse events under placebo are on par with or sometimes exceed the ones reported under the drug treatment, a phenomenon that everyone who has read German author Thomas Mann’s famous book “Der Zauberberg” (The Magic Mountain) understands.) Part of my research work is to find ways to supplement these simple frequency tables with useful information about the significance of a difference between the incidence rates under a drug and a placebo so one can draw more informed conclusions from such data. This could be in the form of simultaneous confidence intervals for the difference (or ratio) of marginal incident rates or a simple margin of error estimate such as the one people are used to from elections: “The drug leads placebo by 3% ($\pm$ 5%) in terms of reported diarrhea incidents” or “A urinary track infection was at least 4 times more likely to occur under the drug than the placebo.” The difficulty is coming up with $\pm$ or “at least” estimates that are valid.
A naïve approach forms, for each adverse event separately, a 95% confidence interval for the risk difference or the risk ratio (i.e., relative risk). Suppose we do this for the 11 adverse events for the data displayed above. Then, the probability that at least one of the 11 confidence intervals formed will not contain the true risk difference (or risk ratio) is equal to 1 minus the probability that all intervals cover the true parameter = $1 - {0.95}^{11}$ = 43%. For the 16 adverse events of the Advair Diskus, this becomes even larger than 50% (56% to be precise). These calculations actually assume that adverse events occur independent of each other, which is the worst case scenario for the above probability calculation (under the assumption that the test statistics on which the confidence intervals are based are jointly multivariate normal), but you get the idea that viewed as an ensemble, some of the intervals most likely fail in capturing the true risk difference or ratio. This is mostly dissatisfying from a drug’s sponsor point of view, namely when a confidence interval is completely above 0 (which means that the occurrence of the corresponding adverse event is significantly higher under the drug then under the placebo treatment) or above 1 for the risk ratio. Many argue that a sponsor’s concerns are irrelevant when it comes to drug safety, but there might also be a loss to society in that a potentially very beneficial drug does not reach the market because of (likely false) safety concerns.
The shortcoming of the naïve approach are easily remedied by insisting to form ${0.95}^{1/11}$ = 99.53% confidence intervals for each comparison. For then, even in the worst case scenario, the probability of at least one interval failing to capture the true difference is $1-[{0.95}^{1/11}]^{11} = 1 - 0.95$= 5%, which seems low enough. In fact, it is too low, often much lower than 5% when correlation among adverse events is taken into account. With such a strict control over the error rate there is not much power left to detect potential safety signals, i.e., identify those adverse events where there truly is a difference in the adverse event rate.