Blog 20: I Am (Probably) Wrong, Maybe

Fluvoxamine for Covid-19 Revisited

Background

I am writing this blog as a follow-up to a previous blog on fluvoxamine as a treatment for Covid-19 (see Blog 19: We Won’t Get Fooled Again, Again). In that blog I did my assessment of the results of a randomized, double-blind, placebo-controlled clinical trial of an old anti-depressant drug for the treatment of newly diagnosed Covid-19 patients. [1] As a brief recap, I took a Bayesian approach and used my favorite paradigm for evaluating the evidence:

What does the data say?

What do I believe?

What do I decide?

The evaluation in that Blog went like this:

  1. There was some laboratory evidence that fluvoxamine improved the outcomes of a small number of mice who were induced to have sepsis. There was some additional, but speculative, evidence about the cellular level mechanism of action data suggesting why fluvoxamine might have anti-inflammatory properties.
  • Most treatments fail in drug development in general, and the pandemic is no exception. Many, many repurposed treatments have failed when subjected to the scrutiny of well-designed clinical trials.
  • Thus, my honest prior belief that fluvoxamine works was about 1/1000, but I was generous and settled on a prior of 1/100 (1 in a hundred) chance that fluvoxamine works. In an interview, the primary clinical investigator noted that he thought doing a study with fluvoxamine in Covid-19 patients and getting a successful outcome was “a long shot.”
  • Based on this, researchers did a study that had some very good design elements (randomized, double-blind, placebo-controlled), but was small/modest in size (N=152 patients total), used remote or self-assessments by patients in their homes, and had ~25% lost-to-follow-up (that is, about 20 patients in each group were never observed as to their disease outcome). That trial produced a p-value of 0.009 for the difference in response observed – a combination of patient self-assessments about shortness of breath or the need for oxygen supplementation.
  • Using my favorite formula for combining evidence (as reported in previous Blogs like Blog #7), the upper bound on the posterior probability that fluvoxamine works (p1) is given by

p1 ≤  p0*BFB/(1-p0+p0*BFB) = 0.081,                                         (Equation 1)

where p0 is the prior probability (1/100 = 0.01) and the Bayes Factor Bound (BFB) is calculated using the p-value from the clinical trial (p=0.009)

BFB=1/[-e*p*ln(p)].

Thus, my conclusion was that there was an 8% chance that fluvoxamine actually works in mild Covid-19 patients in terms of their breathing complications that were the primary assessment in the study.

  • My decision was that, based on this albeit low probability, a subsequent, larger trial should be done with fluvoxamine since there is a huge unmet medical need for Covid-19 treatments. This initial trial provided evidence that was far better than the many non-randomized, non-controlled, non-blinded studies that were and are being done capriciously around the world. It is still a long shot, but one that is worth it in my opinion.

The Large Follow-up Study

To the credit of the original researchers and clinical investigator, Dr. Eric Lenze stated in a 60 Minutes interview, “I have to be a scientist about this. We’ve tested it in one study. But– in my view, it needs to be confirmed in a larger study.” [2] They have done that study (See ClinicalTrials.gov (https://clinicaltrials.gov/ct2/show/NCT04668950?term=fluvoxamine&cond=Covid19&draw=2), and the results were published in a very notable clinical journal, The Lancet (Global Health). [3]

Once again, this trial followed the rigorous approach of using randomization, double-blinding, and a placebo control group. Excellent! They even stepped up their clinical outcome measure to be something that is more important and relevant – preventing hospitalizations for which they had the following definition: either (a) retention in a COVID-19 emergency setting for at least 6 hours or (b) transfer to a tertiary hospital due to COVID-19. The trial included 1497 patients (741 on fluvoxamine and 756 on placebo) who were acutely diagnosed with Covid-19 and at high risk for disease complications. [Note: ALL patients received the best standard of care for Covid-19. Thus, placebo patients were NOT left untreated but rather treated as best as possible in the standard care setting. Those who received fluvoxamine received the best standard of care PLUS fluvoxamine. I note this so that those unfamiliar with the details of clinical research realized that “placebo” does NOT mean “untreated.” That would not be ethical.]

This study also used more rigorous evaluations and follow-up of patients. There were in-person clinical visits as well as telephone follow-up and tele-medicine interactions. Patients were followed for 28 days and there were only 6 patients who were not accounted for in the final analysis (recall: ~25% of follow-up was missing in the original smaller trial). So, this is also indicative of the thorough nature of this trial and one for which the researchers should be lauded.

For fluvoxamine, 79/741 (11%) met the hospitalization criteria and for placebo 119/756 (16%) met the criteria resulting in a relative risk of 0.68. The authors even used a Bayesian statistical approach for the analysis and quoted a 99.8% probability that fluvoxamine was superior to placebo! The study was planned to enroll 681 patients per treatment, but it was extended to accumulate additional data to ensure the findings of the study were reliable. [By the way, in case you are wondering, these results translate into a frequentist null hypothesis significance testing Z-statistic of -2.9 and a p-value of 0.0037. We will need this a bit later.]

So, in the parlance of pharmaceutical drug development, these researchers did a Phase 2 trial to test the hypothesis of whether fluvoxamine was useful in treating mild Covid-19. They used a surrogate clinical outcome (breathing measures) in a small trial (N=152) to find a signal that fluvoxamine may indeed be a useful treatment in patients who are newly diagnosed with Covid-19. They then did a larger Phase 3 confirmatory trial of a more meaningful clinical outcome (hospitalization) that was much larger (N=1497) and found significant or compelling results in favor of fluvoxamine. By many common standards of evidence, this is considered a success story and one that stands in clear contrast to many other Covid-19 stories in which far less evidence and rigor have been used. The authors of The Lancet publication state, “[These] Results provide compelling evidence of fluvoxamine’s benefit in reducing acute morbidity from COVID-19 illness.” Note the use of the word “compelling.”

In my original Blog (#19), I estimated that there was an 8% chance of success in the larger trial and therefore I stated, “As a betting man, I would place my bet on the study not being successful.”

So, one could say I was wrong. Fluvoxamine works in newly diagnosed patients at high risk of disease progression – “works” in the sense of reducing hospitalizations. And I am happy to be wrong if indeed we have a new and effective treatment to mitigate the progression of Covid-19, and one that is already generic and can be provided a low cost!

Are you sure? Are the results “compelling”? Let’s take a further look at the study and the results to gain more insight into the confirmatory study.

A Deeper Look

At the outset, let me say that the investigators did all the right things for this study – well-defined protocol and evaluation criteria; pre-specified statistical analysis plan (i.e., no playing with the data until you  get the answer you want); conformance with international standards for the conduct of clinical trials and the ethical treatment of patients. So, this is excellent and, again, in stark contrast with other sloppy Covid-19 clinical research. [This paper quotes that over 2800 clinical trials are registered in ClinicalTrials.gov with less than 11% of those trials reporting results and an even smaller percentage having sample sizes of over 100 patients. Wow! What a waste of resources and patient lives.]

Now, on to some observations and critical commentary.

First, the study was done in 11 clinical sites in Brazil. This is a little concerning in that most confirmatory studies in the pharmaceutical industry required to gain regulatory approval by FDA or the European Medicines Agency or the Ministry of Health, Labor and Welfare in Japan, etc. generally use studies done in multiple countries and many dozens, if not hundreds, of clinical sites. This is because the practice of medicine can differ around the world and even across clinical sites within a given country. To show the clinical benefits and acceptable safety of a new treatment, it must be shown to have utility in a broad range of clinical circumstances.

Second, The primary outcome was “a composite endpoint of medical admission to a hospital setting due to COVID-19- related illness defined as [a] COVID-19 emergency setting visits with participants remaining under observation for more than 6 h or [b] referral to further hospitalisation due to the progression of COVID-19 within 28 days of randomisation.” This endpoint is a little bit on the “soft” side, by which I mean there is some subjectivity involved.

Let’s take the first criteria [a]. Whether a patient stays in the hospital for 5 hours and 50 minutes versus 6 hours and 10 minutes is inconsequential. How long they stay may be a matter of personal preference or administrative matters in the hospital emergency room. So, in my opinion it is a vague endpoint for determining whether a patient had an outcome of interest or not. The investigators tried to overcome this by noting that, “The 6 h threshold referred only to periods of time recommended for observation by a clinician and does not include waiting times.” OK, fair enough, but still a little on the “soft” side. The authors also state in the Discussion, in an apparent contradiction to this, “The event adjudication committee did count patients wait times as contributing to a primary endpoint.” Hmmm?

For the second criteria [b], I have learned from some that referral to a hospital is also a subjective judgment. Some doctors may be more aggressive with a referral, and some may not. In cardiovascular studies, “hospital admission due to angina (chest pain)” is often considered a soft endpoint and more concrete endpoints – death, confirmed MI or confirmed stroke – are preferred.

I will address additional clinical measures subsequently, which is important to my evaluation or the evaluation of any new treatment.

Third, the primary outcome measure was a composite endpoint as noted previously, with two components – staying in an emergency room setting for at least 6 hours and referral for actual hospital admission. For the latter component of hospitalization, the results slightly favored fluvoxamine (10% on F and 13% on P) but was not a statistically significant difference (p=0.09). For the former component of 6 hours in the emergency room, the results more strongly favored F versus P – 7/741 (1%) on F versus 36/756 (5%) for P – with a p-value of 0.0001. Thus, the overall result (the combination of these components) is largely driven by the softest endpoint of whether a patient was observed in an emergency room for 6 hours.

Furthermore, based on Table 3 of the publication, which lists the pre-defined secondary clinical outcomes of interest, most were slightly in favor of fluvoxamine, but were not statistically significant. In fact, the median duration of hospitalization was 8 days on F and only 6 days on P, which almost was statistically significant in favor of placebo (p=0.059)! Very important outcomes like death, time to death, use of mechanical ventilation and time on ventilator were nowhere near statistically significant (p-values ranging from 0.24 to 0.90) and quite small – perhaps not clinically meaningful (here I am playing a doctor, so take this judgment with a grain of salt).

Now, for those who follow my blogs, you know that I am critical at times of using p-values, so you may be wondering about my reference to such p-values as I review this publication. First, it is because that is what is reported and what the authors used. Second is because when p-values are quite large (say, greater than 0.20), then I think the researchers have very weak evidence in the frequentist inferential approach and probably even weaker evidence in the Bayesian approach.

On a positive note, the authors did analyze the data according to compliance with the randomized study medication. It turns out that for fluvoxamine, 26% of patients did not comply with the consumption of their medication (compliance = taking their study medication for at least 8 of the 10 days they were to be treated) and placebo had 18% non-compliance. I do like the idea of examining the treatment effect WHEN patients take their medication, but I also recognize that this is a difficult analysis to do properly and requires more sophisticated causal inference approaches. [That’s the subject of another blog someday.] For the primary composite clinical measure of hospitalization, the treatment effect was even more pronounced – relative risk of 0.34. Also, the authors did summarize another very important outcome for the compliant subset of patients in the trial – death. Only 1 compliant patient died on fluvoxamine whereas 12 placebo compliant patients died. Though I would have to examine the details more carefully (e.g., is this a “cherry-picked” result from many different analyses?), on the surface this is a noteworthy finding.

Finally, this study was done in high-risk patients (those with other comorbidities such as diabetes, coronary artery disease, asthma etc.), which is good, but let’s not forget that. The authors rightly note that the results may not be applicable to the larger patient population of patients who do not have such comorbidities and are therefore at lower risk.

A Little Deeper Still (maybe a Lot Deeper)

Now the next step of my analysis gets deeper into their Bayesian analysis of the data. The primary analysis was defined a piori to be a Bayesian approach, which if you follow these Blogs, you will know is of great interest to me. This is one of the few trials that I have seen published in a major medical journal that has used a Bayesian inferential approach for the primary analysis and conclusion. The paper states, “Posterior efficacy of fluvoxamine for the primary outcome is calculated by means of the beta-binomial model for event rates, as detailed in the appendix of the statistical analysis plan, assuming informed priors on the basis of observational data for both placebo and fluvoxamine.”

I was particularly interested in the “informed prior” that was used since this can have an important and influential impact on the posterior distribution of the treatment effect estimate from which study conclusions are drawn. So, I went to reference #9 in the publication [4], which took me to reference #4 in that publication [5], which took me to an OSF website where I found a PDF of the Statistical Analysis Plan (SAP). [6] Whew !! That PDF is 123 pages long, but as far as I can tell, in Section 4.2.3.1. “Bayesian inference for dichotomous outcomes with covariate adjustment” it states, “Assigning a noninformative prior distribution 𝑝(β, γ) ∝ 1 …” This refers to a uniform prior distribution over the range of possible values.

So, without trying to scour through the mathematics of it all, I am actually confused as to whether the prior was informed as stated in the publication or noninformative as stated in the SAP. With that ambiguity, allow me to make some comments on the selection of a prior in this case. Some of this may be at odds with conventional thinking, but hey, that’s what I do. Note that I covered some of these concepts in Blog #11: Some Belief in Priors.

Quite often, when statisticians use a noninformative prior they are referring to a uniform distribution – denoted U(a, b) where a and b far exceed the range of possible values of the parameters of interest. Typically, this uniform distribution is centered at zero. Alternatively, they may use a normal distribution centered at zero with large variance – denoted N(0, s) where s is the variance of the distribution and once again s exceeds the range of likely or possible values (Figure 1). In the case of most drug research, we are interested in the treatment response being better than the control response (i.e., superiority trials) and this is definitely the case when placebo is the control. The treatment should be better than the control, and therefore, the treatment effect is hopefully greater than zero (when a larger response is indicative of a treatment benefit).

However, when using one of the above so-called noninformative priors for the treatment effect as is often done, half of the prior distribution is greater than zero which indicates that there is a 50% probability that the treatment is effective (Figure 1). This hardly seems like a noninformative prior! OK, these distributions have substantial spread/variance, but that only means that the observed data dominates the prior and therefore the posterior distribution for the treatment effect is nearly identical to the observed data.

In the case of fluvoxamine, my previous Blog put the upper bound on the probability that fluvoxamine was effective at 0.081. That’s a far cry from 0.50 as implied by the “noninformative prior.” If you really believe that the probability that fluvoxamine is effective in these Covid-19 population, then you can use such a symmetric noninformative prior, but I was not that optimistic based on the initial study of 152 patients for all the reasons stated previously.

Now, the published manuscript stated that they used an informative prior in the analysis, but I cannot find any clear documentation as to precisely what that prior was. Maybe I didn’t look in the right places or I didn’t spend enough time combing through all the documentation.

Nonetheless, I think the prior distribution should look something like Figure 2 in which most of the prior distribution for the treatment effect of fluvoxamine is less than zero (i.e., fluvoxamine is not effective) and only 8% of the treatment effect distribution is greater than zero. Now, without the data, I cannot compute the posterior distribution for the fluvoxamine treatment effect and therefore I cannot calculate the probability that fluvoxamine is effective – that is, the effect size is greater than zero. However, I can use Equation 1 to get an upper bound on that probability.

Using my posterior probability of 0.081 from the initial trial as my prior for this larger confirmatory study, and the p-value of 0.0037 computed earlier from the confirmatory study, the upper bond on the posterior probability that fluvoxamine is truly effective is

p1 ≤ 0.61.

So, the large confirmatory study moved the evidentiary needle (as I like to say) from 0.08 to 0.61 – quite a substantial leap in confidence that fluvoxamine actually works in newly diagnosed Covid-19 patients at high risk of progression. But, it is certainly not compelling.

For the last question in my analytics thinking paradigm, I need to answer, “What do I decide?” Given the extent of the pandemic that is still ongoing in many countries around the world, I think the totality of evidence is sufficient to warrant consideration of an Emergency Use Authorization from FDA if anyone wanted to submit such a dossier. Other treatments, like monoclonal antibodies, have been approved on such single studies with a p-value<0.05. However, they have a well-established mechanism of action, and  they have shown additional impact on other important secondary clinical measures/outcomes. Furthermore, they have been done in a multi-national setting. In my opinion, there is need to replicate the findings of this large trial with another confirmatory trial. That trial should include investigative sites in the US and European countries at a minimum. If such a trial were successful, then I would be fully convinced of the effectiveness of fluvoxamine.

In fact, starting now with a prior of 0.61 (the posterior I have calculated from this reported trial), if a subsequent trial (with the right characteristics and geographic spread) were to produce another p-value of 0.0037, then using Equation 1, the posterior probability of fluvoxamine being effective would be

p1 ≤ 0.965.

That’s something that I think most would find compelling.

Discussion

First, I recognize that some may see the work on fluvoxamine and the positive results from a large, well-designed trial as “compelling” as the authors have noted. Furthermore, I recognize that some may see this Blog as poking holes in the confirmatory study and casting doubt on its conclusions as merely a personal defense mechanism to deflect admitting that I was wrong with my prediction in my initial Blog on fluvoxamine. I just want the reader of this Blog to realize that I have reflected on my personal motivations and self-interest while writing this Blog.

As I have noted elsewhere, I deal in probabilities, and therefore it is difficult to say whether a statistician is right or wrong (kind of like the weatherman!). I could say that I would not bet on a pinch-hitter in the bottom of the 9th inning of a baseball game getting a home run to win the game. That probability is small … but it has happened. That is just one at-bat, and this is just one confirmatory trial. Would that batter be able to hit another homerun in another game in the bottom of the 9th inning … and will fluvoxamine show any benefit in another trial that is set on a global stage rather than in 11 clinics in Brazil?

There is mounting evidence that fluvoxamine, a selective serotonin reuptake inhibitor (SSRI), helps slow the progression of Covid-19 in newly diagnosed patients who have high-risk comorbidities. If this SSRI works, perhaps other SSRI drugs will as well. As noted by the authors, “A large observational study from France involved a different population, 7230 hospitalised COVID-19 patients, and reported a reduction in use of intubation or death with use of SSRIs.”

Yet there are some inconsistencies in the data in which secondary outcome measures do not show clear benefit, and for some measures, patients on fluvoxamine did worse than placebo. This is coupled with the authors’ admission, “The underlying mechanism of fluvoxamine for COVID-19 disease remains uncertain.” Just because scientists have not figured out the mechanism of action does not mean we should ignore empirical evidence … if it is strong and convincing evidence. After all, the mechanism of action of acetaminophen is not clearly understood, [7] and it is one of the most used OTC medications in the world, but that is based on multiple clinical trials and decades of clinical experience.

Perhaps the writers of the famous Kefauver-Harris Amendment to the Food Drug and Cosmetic Act of 1962 had an inadvertent wisdom about requiring substantial evidence for FDA approval of a new treatment. In part, they wrote, “‘substantial evidence’ means evidence consisting of adequate and well-controlled investigations, including clinical investigations, by experts qualified by scientific training and experience.” The plural form “investigations” has been interpreted to mean that FDA expects (at least) two confirmatory trials for approval (though there are exceptions). This also satisfies the general scientific principle of replicability.

Fluvoxamine is in the lead going down the back stretch of this race. It has a 61% chance of winning. That’s promising. One more study is needed to get it across the scientific finish line for the findings to be compelling.

References

[1] Lenze, E. J., et al.,  Fluvoxamine vs Placebo and Clinical Deterioration in Outpatients With Symptomatic COVID-19: A Randomized Clinical Trial. JAMA. 2020;324(22):2292-2300.

[2] 60 Minutes Transcript

https://www.cbsnews.com/news/fluvoxamine-antidepressant-drug-covid-treatment-60-minutes-2021-03-07/

[3] Reis, G, et al. Effect of early treatment with fluvoxamine on risk of emergency care and hospitalisation among patients with COVID-19: the TOGETHER randomised, platform clinical trial. Published Online October 27, 2021, The Lancet, Global Health. https://doi.org/10.1016/S2214-109X(21)00448-4

[4] Reis G, Silva EAdSM, Silva DCM et al. A multi-center, adaptive, randomized, platform trial to evaluate the effect of repurposed medicines in outpatients with early coronavirus disease 2019 (COVID-19) and high-risk for complications: the TOGETHER master trial protocol [version 2; peer review: 1 approved, 1 approved with reservations]. Gates Open Res 2021, 5:117. https://doi.org/10.12688/gatesopenres.13304.2)

[5] Zannat NE, Reis G, Silva EADSM, et al.: A multi-center, adaptive, randomized, platform trial to evaluate the effect of repurposed medicines in outpatients with early coronavirus disease 2019 (COVID-19) and high-risk for complications: the TOGETHER master trial. 2021. http://www.doi.org/10.17605/OSF.IO/EG37X

[6] Statistical Analysis Plan – https://osf.io/dnpek/ (Accessed Nov 11, 2021)

[7] Ghanem CI, Pérez MJ, Manautou JE, Mottino AD. Acetaminophen from liver to brain: New insights into drug pharmacological action and toxicity. Pharmacol Res. 2016;109:119-131. doi:10.1016/j.phrs.2016.02.020

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s