Those of you who know me know that I have done some thinking and writing about the topic of estimands, specifically how it relates to biomedical research and the application to estimating a treatment effect. Some have noted that this topic is ostensibly missing from my blog, a situation I will remedy presently. Also, this posting is timely in that the ICH Guideline on Estimands and Sensitivity Analysis was just released TODAY.
For those not familiar with the topic of estimands, suffice to say it is as simple and complex as this: In a clinical trial of a new treatment (or more generally an intervention), WHAT treatment effect do you want to estimate? To be clear, much has been written in the statistical literature about HOW to estimate a treatment effect under many, many different clinical trial designs and complex data scenarios. For each HOW there is an implied WHAT, but often the WHAT has not been given suitable consideration prior to defining the HOW. Statisticians (a group in which I consider myself) often like to ‘do things right’ (HOW) sometimes at the expense of ‘doing the right things’ (WHAT). In my enlightened moments, I like to think, first WHAT then HOW. For if we cannot agree on the WHAT, then we will never agree on the HOW. Perhaps this will be clearer if I illustrate this with a story.
A (Somewhat) Real Story
I like to hike in the mountains – the Great Smokey Mountains in the US, The Alps in Switzerland, the Andes in Peru. Please note: I emphasize hiking and NOT mountain climbing. So, I thought about hiking the tallest free-standing mountain in the world that is accessible to the fit and able hiker – Mount Kilimanjaro in Africa. It’s majestic summit can rise above the clouds and stands 19,341 ft tall. I thought it best to go to the experts to plan such a trip, so I went to a travel agency specializing in this expedition.
The agent told me that it is a spectacular experience. The weather is most often clear enough that when you get to the summit, the views are breathtaking, as he showed me some pictures. I asked for him to explain the journey to the summit in more detail. It went something like this:
- Steve: How long does it take to hike to the summit?
- Agent: Well, on day 6, hikers …
- Steve: What do you mean Day 6?
- Agent: Well it takes time to work your way up the mountain, and you must acclimate to the altitude etc. It is really a very high mountain.
- Steve: OK, I get it. Go on.
- Agent: On Day 6, hikers take on average 4.65 hours to get to the top. So, you get up at 4AM and start getting ready – get dressed, something to eat, organizing your gear, …
- Steve: Wait! You said 4.65 hours to hike on Day 6. Why do I have to get up at 4AM?!?
- Agent: Well, actually, 35% of the hikers only hike for 2 hours. Then they experience some sort of adverse event – altitude sickness, shortness of breath, etc. Then about 20% of hikers go for 4 hours before they give up. They just do not have the endurance it takes to make the ascent. Then there are the 45% who can stick to the plan and make it to the top. For them, it takes 7 hours of hiking to get there. Thus, you get up at 4AM, start hiking by 5AM, reach the summit at Noon, spend 15-30 minutes enjoying the view and taking pictures, and then you hike back down. You must be down the mountain before dark because you do NOT want to be caught on the mountain after dark!
I will spare you the rest of the details, and suffice to say, I am still planning that trip. But I did learn a very valuable lesson. When someone asks, “How long does it take to hike to the top of Mt Kilimanjaro?”, what do they mean? When someone answers, “4.65 hours,” what do they mean? Furthermore, WHAT IS THE RIGHT ANSWER?
- Is it the “intent-to-hike” estimate of 4.65 hours?
- Is it the completers (i.e. adherers) estimate of 7 hours?
- Or is it the whole story?
- If I were asking other friends to join me on this trip, what would I tell them?
- What should the advertising agency put in their travel brochures (assuming they are interested in full disclosure)?
- As a traveler, what would you want to know? (Please think about you answer to this and we will come back to it at the end of this blog.)
OK, for those familiar with clinical trials, the analogy should be obvious. All randomized clinical trials of any reasonable size and duration start as controlled experiments but turn into something akin to observational studies due to patients or their physician-investigators decisions to deviate from the plan (i.e. protocol). Those deviations may be minor mishaps or omissions during the course of the trial (i.e. missing a scheduled visit due to the patient being on vacation) or may be caused by the treatments themselves. That is, a patient may discontinue their randomized (usually blinded) study treatment because they are experiencing adverse events or unsatisfactory efficacy. Thus, the balance created by randomization at the beginning of the trial is perturbed, and the treatment groups in the study may no longer be directly comparable. When the deviations are such that they affect either the interpretation or the existence of the clinical outcomes of interest that are associated with the clinical question of interest, they are called “intercurrent events” (see ICH-E9(R1) Estimands and Sensitivity Analysis in Clinical Trials) . These intercurrent events can, and often do, confound the estimation of the treatment effect in a randomized, controlled trial.
So, what appears to be a simple question, “What is the treatment effect?” that should/could be easily measured by doing a randomized, controlled experiment becomes a complex assessment of groups of patients that have different journeys during the clinical trial. Some take their medication as intended and continue to the end of the trial, some do not. So, what’s a statistician to do?
Analyze as Randomized (or A Little History for the Intellectually Curious)
OK, so I like the history of all this stuff. Those who do not study and understand history are doomed to repeat its mistakes! So, here goes my version, at least an abbreviated one.
In the landmark streptomycin trial conducted by the United Kingdom Medical Research Council in 1946, Sir Austin Bradford Hill designed and orchestrated the first double-blind, randomized clinical trial. Thereafter, the US National Institutes of Health and other federal agencies through the 1950’s began to fund randomized clinical trials as the medical and scientific community began to embrace its value and power. For example, 1954 Salk polio vaccine trial boosted the prominence of this approach .
In the late 1950‘s and early 1960’s, debates and hearings were ongoing in the US Congress on what was to become The Kefauver-Harris Amendment to the US Food, Drug and Cosmetic Act (the US law that requires “adequate and well-controlled investigations” to establish the efficacy of a new treatment) . Around the same time, Congress allocated funding to the NIH to study the long-term cardiovascular (CV) effects of a popular and widely used anti-diabetic drug – tolbutamide, a sulfonylurea – which had never been studied before. The University Group Diabetes Program (UGDP) was initiated in 1961 and a group of university researchers developed a large, multi-center, randomized, placebo-controlled study to assess whether control of blood glucose led to more favorable CV outcomes . The study included not only tolbutamide as a treatment, but also other glycemic control agents that were in prominent use at the time, including insulin.
It was a ground-breaking effort in that it was conducted with several key elements that were novel at the time, most notably for this blog, the principle of intent-to-treat (ITT). As statisticians realized that the long-term follow-up of patients would be complicated by many (what we now call) intercurrent events, they argued for “respecting randomization” as assessing patients based on their initial treatment allocation. Suffice to say it was quite controversial, but over time, with many vehement philosophical arguments as well as hard-fought battles over individual trials, statisticians gradually won the day with the notion of ITT. I am reminded of a lecture given in 1968 by Donald S. Frederickson when he was the Director of the National Heart Institute in which he stated in a quite positive tone, “The anarchy of guess and intuition [in the design and analysis of clinical trials] has given way to a benevolent tyranny of statisticians.” 
Anyway, as far as I can tell [and I will take feedback or insights into this], the principle of intent-to-treat was thrust into the limelight in UGDP study which was the inflection point for the ITT principle. It became cemented into the statistical codex over the ensuing decades. In fact, with the emergence of the International Conference on Harmonization (ICH) in the early 1990’s, an Expert Working Group of international statisticians took on the challenge of writing what was eventually approved by the ICH Steering Committee as ICH-E9: Statistical Principles for Clinical Trials . It was in this internationally agreed treatise that ITT was formally advocated as the default, if not necessary, statistical analysis approach for confirmatory clinical trials, particularly in the regulated drug development process. As defined therein, ITT is “The principle that asserts that the effect of a treatment policy can be best assessed by evaluating on the basis of the intention to treat a subject (i.e., the planned treatment regimen) rather than the actual treatment given. It has the consequence that subjects allocated to a treatment group should be followed up, assessed, and analyzed as members of that group irrespective of their compliance with the planned course of treatment.”
Let’s examine this a bit more in details since it is such a fundamental and ubiquitous principle underlying most analyses of regulated clinical research trials as well as most academic research clinical trials. The key phrase is, “the effect of a treatment policy.” For the uninitiated and to make it simple, that means a study in which patients are randomized to an Experimental treatment (E) or a Control treatment (C) are actually testing the hypothesis that
H0: E+(whatever happens after randomization) = C+(whatever happens after randomization)
Ha: E+(whatever happens after randomization) ≠ C+(whatever happens after randomization).
Now, if “whatever happens after randomization” is balanced between treatment groups and all patients are followed and efficacy measured throughout the clinical study to the end, then the groups are directly comparable, and the above hypotheses are indeed what we really want:
H0: E=C versus Ha: E≠ C.
However, in long-term trials with the potential for patients to deviate from their randomized/allocated treatments, (think of the UGDP with 10-15 years of follow-up), many intercurrent events can occur and consequently confound the estimation of the comparison of treatment E versus treatment C. Thus, an ITT approach is testing the hypothesis of “the effect of initiating treatment E versus initiating treatment C” which is decidedly NOT testing the hypothesis of “the effect of Treatment E versus the effect of Treatment C.”
The Advent of Estimands
Clinical research studies have become more complex as the medical research community, including pharma, have tackled more difficult/chronic diseases, used more innovative trial designs and studied diverse treatment modalities. In such trials, patients discontinued their study medication or even discontinued participation in the trial (as is their right), thereby creating so-called missing data – that is, there is no data on the primary efficacy outcome of interest for such patients. This complexity has led to greater complication in the evaluation of a treatment effect, even to the point of disagreements on the definition of WHAT a treatment effect is and, subsequently, how it should be estimated.
Following the publication of ICH-E9 and the increased emphasis on analyzing clinical trials using ITT, statisticians had to find ways to implement the ITT principle so as to include all randomized patients in the analysis. Certainly, work on the “missing data problem” was already underway, but in this blogger’s opinion, there was an acceleration of work on how to analyze clinical trials for which data was missing. The discussions and debated on how to handle missing data in the ITT analysis of a clinical trial was contentious and at times even rancorous (at least as rancorous as statisticians can be). This led to the FDA commissioning the National Research Council in the US to study the issue. This led to the 2010 report “The Prevention and Treatment of Missing Data in Clinical Trials” . It is in that report that the notion of the estimand (“that which is to be estimated”) took greater prominence.
Not long after that report, the notion that clinical studies should be primarily interested an ITT estimand (i.e. the effect of initiating a treatment) came under greater scrutiny. Statisticians proposed to ICH that there be a working group on an Addendum to ICH-E9 to address the issue of estimands and sensitivity analysis in randomized clinical trials. That effort has now come to fruition, and as of TODAY, ICH-E9(R1) “Estimands and Sensitivity Analysis in Clinical Trials” has been released . I will save commentary on that document for a later blog on Estimands but suffice to say that it opens the door to broader perspectives (i.e. beyond or different than ITT) for defining the primary estimand in a clinical trial.
Now, speaking of broader perspectives, I will use this blog to shamelessly promote a publication that I wrote with Mouna Akacha and Frank Bretz – “Estimands in clinical trials – broadening the perspective” – in which we propose the tripartite approach to estimands . “Tripartite” comes from the notion that treatments have multiple causal effects that can be aggregated into three fundamental categories of clinical interest:
- a treatment can cause a discontinuation of that treatment due to an adverse event;
- a treatment can cause a discontinuation of that treatment due to lack of efficacy;
- a treatment can cause a patient can have an acceptable benefit-risk ratio resulting in an observed efficacy (and safety) response at the intended endpoint of the trial.
Sound like the hike up Mt. Kilimanjaro? It should. So, what answer(s) did you decide you would prefer to hear from the travel agent about how long it takes to climb to the top of Mt. Kilimanjaro? What answer(s) do you want when you ask, “What is the treatment effect?”
Subsequent blogs will delve into the tripartite approach in more detail as well as the ICH-E9(R1) Addendum. Stay tuned and send any feedback along. I am always anxious to hear if I am hitting the mark or adding value to your thinking and understanding. If nothing else, perhaps I am tracking down some interesting references for you to read.
 ICH Expert Working (2019). ICH Harmonised Guideline Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline On Statistical Principles For Clinical Trials E9(R1). https://database.ich.org/sites/default/files/E9-R1_Step4_Guideline_2019_1203.pdf
 Marks HM (2011). The 1954 Salk poliomyelitis vaccine field trial. Clin Trials. Apr;8(2):224-34.
 Drug Amendments Act of 1962, Public Law 87–781, 76 STAT 780. http://prescriptiondrugs.procon.org/sourcefiles/1962Amendments.pdf (accessed March 10, 2015).
 Blackburn, H. Jacobs, D.R. Jr. (2017). The University Group Diabetes Program 1961-1978: pioneering randomized controlled trial. International Journal of Epidemiology, 2017, 46(5), 1354–1364.
 Fredrickson, D. (1968). The Field Trial: Some Thoughts on The Indispensable Ordeal. Bull NY Acad Med, 44 (8) 985-993.
 International Conference on Harmonisation. ICH harmonised tripartite guideline E9: statistical principles for clinical trials. Available at: http://www.ich.org [Accessed on 1 June 2015].
 National Research Council. The prevention and treatment of missing data in clinical trials. In Panel on Handling Missing Data in Clinical Trials, Committee on National Statistics, Division of Behavioral and Social Sciences and Education. The National Academies Press: Washington, DC, 2010.
 Akacha, M., Bretz, F. & Ruberg, S.J. Estimands in clinical trials — broadening the perspective. Stat. Med. 36, 5–19 (2017).