No. 25 – Estimands Part 2 – What Exactly Is an Estimand?

The word estimand has exploded into the vernacular of statistical thinking for clinical trials in the last decade or so. Where did it come from and why is it so important now?]

Welcome back to the sequel of Blog 24 – Estimands Part 1 – Just Do ITT? I will continue with some historical background and context to help makes sense of the “perspectives paper” on estimands published by Fleming et al (Fleming, 2025) as well as the ensuing commentaries by three different groups of authors. Since the term “estimands” is a relatively new addition to the lexicon of everyday statistics, I will provide some historical context for its origin and its contemporary use including the recent ICH E9(R1) guidance on Estimands and Sensitivity Analysis.

The Origins of the Term “Estimand”

As far as I can tell, the first use of the word estimands comes from a 1939 economics paper in the Journal of the American Statistical Association (JASA) entitled “The Concept of Demand and Price Elasticity—The Dynamics of Automobile Demand.” I will not endeavor to explain or to belabor the authors subject matter and intent but only note that the authors at one point state (rather matter-of-factly), “The analysis ought to include explicitly as many of the major variables affecting the estimand* as possible.” [Note: Their italics, not mine.] The asterisk refers to a footnote that states, “Statistics needs a term like estimand to replace the lengthy phrase independent variable.” [Note: Again, their italics, not mine.] So, we see that the use of the word does not quite coincide with the current use we have today.

So, what happened next?

Again, as far as I can tell, the term “estimand” lay dormant until 1962 when it was used by none other than John Tukey in his very famous paper, The Future of Data Analysis” published in the Annals of Mathematical Statistics. (Tukey, 1962) In that paper (p. 60 to be precise), Tukey writes a bit cryptically, but I have done a little rearranging of a couple of key sentences that I think more clearly makes the point for our present-day discussion. My version of his argument using his words is “We must give even more attention … to discovering what it is reasonable to think of an estimator as estimating” rather than “starting with an estimator and discovering what is a reasonable estimand.”

There it is! The second use of the term “estimand” that I can find, and it is directly aligned with the use of the word as we think of it today. As many have heard me say in my talks on estimands – first WHAT; then HOW. This is Tukey’s message. Tukey is writing to mathematical statisticians and imploring them to go beyond or behind the mathematical development of estimators and their properties and to think more carefully about the target of estimation … what needs to be estimated … the estimand.

In 1968, Tukey and Fred Mostellar (whom we met ever-so briefly in Blog 24) wrote a Chapter entitled “Data Analysis—Including Statistics” that appeared in the Handbook of Social Psychology in which they use the word “estimand” on one occasion as follows, “We speak of the estimator’s target as an estimand … rather than just as a parameter.” Thereafter, the term seemed to disappear from the statistical lexicon, and (I am speculating here) statisticians may have not adopted the term because they felt the concept was adequately covered by the term “parameter.” After all, a parameter in some model is the object/target of our estimation/estimator. Why do we need to call it an “estimand”?

The next reference I can find that uses the word “estimand” supports my speculation above. It appeared in 1983 in the book The Theory of Point Estimation by Erich Lehmann. Very early in the book (p. 4 to be exact) Lehmann writes in technical language, “a real-valued function defined over a parameter space W, whose value at q is to be estimated; we shall call g(q) the estimand.” [Note: his italics, not mine.] Yikes! For the purposes of this blog, suffice to say that the estimand is a parameter from some model involving a probability distribution. I will note that p. 4 is the only reference to the word estimand in the book that I can find.

The Estimand Explosion

As noted in Blog 24, intention-to-treat was firmly established in 1998 with the issuance of the international guidance ICH E9 Statistical Principles for Clinical Trials. However, in the 2000’s, some statisticians and clinicians were uncomfortable with its use in some/many situations, especially those involved with evaluating symptomatic treatments. I will repeat the concern from Blog 24 as a reminder.

Let’s say a diabetic patient is randomized to an experimental treatment (X) or control (C). A patient on X has an adverse event mid-way through a 52-week trial and is given a “rescue medication (R)” that is a known, approved, marketed anti-diabetic treatment. The patient does well on R with a notable reduction in HbA1c and acceptable safety. Using the ITT approach, that favorable outcome is attributed to X and analyze as if HbA1c that followed R was caused by X. A similar scenario could occur for patients on C; the patient fails on C and gets R and succeeds with the successful outcome being attributed to C for inference. And by the way, the R following E could be different than the R following C. Of course, this is further complicated by the fact that, inevitably, there are some patients who leave the study and for whom we cannot make a 52-week measurement of their outcome – HbA1c or safety.

The US FDA commissioned a group of experts to address the clinical trial reality that not all patients follow the protocol, and therefore, there is incomplete data on the randomized study treatments. Remember our key logical expression:

Randomization + complete data implies cause-and-effect, or

Thus, incomplete data breaks the logic for making C&E inference. That expert committee issued a report in 2010 entitled The Prevention and Treatment of Missing Data in Clinical Trials. [Note: I prefer to use the term “incomplete data” since that is the foundation of C&E inference. I won’t spend time elaborating on this, but I give some more detailed explanations in my book Does This Treatment Cause that Outcome? The Science of Estimating a Treatment effect and Why It Matters.] In that 160 page report the term “estimand” is used over 100 times, though its only definition is given as “that which is to be estimated.” The report does focus a considerable amount of attention describing the importance of defining what treatment effect is of greatest interest for a given clinical trial.

Using that report as a starting point, in 2014 the pharmaceutical, statistical community created a Concept Paper for ICH to revise ICH E9 to include the concepts of estimands and sensitivity analysis – what is known as ICH E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials, which became approved in various parts of the pharmaceutical regulatory world starting in early 2020 with further adoption ongoing as of this writing.

ICH E9(R1) – The Basics

ICH E9(R1) also describes an estimand as “that which is to be estimated” and further elaborates, “An estimand is a precise description of the treatment effect reflecting the clinical question posed by a given clinical trial objective.” So, here we see the importance of the clinical question of interest. The clinical question – keep that in mind. This is why I say that the estimand is essentially a clinical matter, although much of its development and implementation to date has swirled in the statistical world.

To its credit, ICH E9(R1) digs deeper into the concept and defines an estimand using four attributes and five strategies. I think of the four attributes as the WHAT, and the five strategies as the HOW [Recall: In Blog 24 I argued (along with many others) that we should focus first on the WHAT and then on the HOW.]

The four attributes are

  1. Treatment Condition
  2. Population
  3. Outcome Variable
  4. Population-level Summary Statistic.

The five strategies are

  1. Treatment policy
  2. Hypothetical
  3. Composite
  4. While on Treatment
  5. Principal Stratification.

The four attributes may look very familiar and some critics of ICH E9(R1) have said, “What’s new?” The novelty in ICH E9(R1) is in its attempt to be explicit and precise with these concepts. I do not have the time or space here for thorough descriptions of each of these elements (It would fill a book! In fact, it did … in my forthcoming book). However, I will take a high-level view of the concepts contained in ICH E9(R1).

Importantly, for this blog and the ensuing blogs on the papers I have previously mentioned, these strategies are all general methodologies for handling incomplete data on the randomized study treatment (RST).  Allow me to explain in short.

  1. The treatment policy strategy is to ignore the discontinuation of the RST and collect data through the end of the study so that there is an outcome on every randomized patient.
  2. The hypothetical strategy “creates” complete data by imputing what a patient’s response would have been if they had completed the trial.
  3. The composite strategy creates a new variable that, in general, includes an efficacy measure and a measure to accommodate reasons to discontinuing the RST.
  4. The while on treatment strategy only uses efficacy measurements during the time the patient is on the RST, and thus, every patient has an outcome.
  5. One version of the principal stratification strategy is to assess the treatment effect in the principal stratum of patients who can adhere to the RST and complete the study.

In each case, assumptions are made to “restore” complete data so that cause-and-effect inference can be made.

This is a crucial point.

When statisticians analyze data from a clinical trial, they write a mathematical model to describe the data that includes important design features of the clinical trial as well as other variables of interest (i.e., covariates). The  model necessarily includes a parameter q that represents the treatment effect. That is the estimand! That is the target of estimation! That model also includes assumptions about the probability distribution governing the model (i.e., normally distributed, equal variances across treatments, proportional  hazards across treatments, etc.). The set of assumptions will undoubtedly contain elements for how to deal with the problem of incomplete data using one of the strategies noted above, thereby creating “complete” data. When the “complete” data is put into the model, the statistical analysis will produce an estimate of q as well as some measure of its uncertainty. That estimate is used to describe the treatment effect and plays a central role in declaring whether and to what extent the treatment “works.”

The key is that the model, including the data manipulations involved with implementing one or more of the above strategies, is the embodiment of the estimand – literally. The estimand – the treatment effect parameter –  is embedded in the model. Thus, regardless of what a researcher says their estimand is, the model and its embedded treatment effect parameter explicitly define the estimand.

What data is used – whether observed, derived, imputed or manipulated in any other way –

for the analysis and the model applied to that data defines the estimand.

Of course, this is why ICH E9(R1) was written … because statisticians argued about the data and the analysis model without deeper consideration for the estimand they were implicitly defining. This is putting the cart before the horse. ICH E9(R1) intended to drive a conversation primarily about an estimand, and then encouraged statistician to find the best model, strategy, and assumptions to estimate that estimand. We must ask, as Tukey implored, does the parameter in our model (i.e., the estimand) accurately reflect the clinical question of interest?

Yet quite often I see and hear statisticians yammering away about models for imputations, assumptions, and data manipulations to “create” complete data to rectify the problem of incomplete data without recognizing the implicit parameter they are estimating and whether it is clinically meaningful. The statistical world is still getting it backwards, even after years of ICH E9(R1) publication and implementation efforts. The old ways are still too engrained, and change is difficult to enact. This will be described in more detail in the ensuing blogs in this thread and a thorough review of ICH E9(R1) and a detailed implementation approach is presented in my book.

Stay tuned.

References

Fleming TR, Carroll KJ, Wittes JT, Emerson SS, Rothmann MD, Collins S, Levin G. A Perspective on the Appropriate Implementation of ICH E9(R1) Addendum Strategies for Handling Intercurrent Events. Stat Med. 2025 May;44, 10-12.

Charles F. Roos & Victor von Szeliski (1939) The Concept of Demand and Price Elasticity—The Dynamics of Automobile Demand, J Am Statist Assoc, 34:208, 652-664, DOI: 10.1080/01621459.1939.10502402.

Tukey, J. The Future of Data Analysis. Ann of Math Stat. 1962 Mar; 33(1), p. 60.

Lehmann, E. L. Theory of Point Estimation. John Wiley & Sons, 1983. p. 4.

Leave a comment