No. 25 – Estimands Part 2 – What Exactly Is an Estimand?

The word estimand has exploded into the vernacular of statistical thinking for clinical trials in the last decade or so. Where did it come from and why is it so important now?

Welcome back to the sequel of Blog 24 – Estimands Part 1 – Just Do ITT? I will continue with some historical background and context to help makes sense of the “perspectives paper” on estimands published by Fleming et al (Fleming, 2025) as well as the ensuing commentaries by three different groups of authors. Since the term “estimands” is a relatively new addition to the lexicon of everyday statistics, I will provide some historical context for its origin and its contemporary use including the recent ICH E9(R1) guidance on Estimands and Sensitivity Analysis.

The Origins of the Term “Estimand”

As far as I can tell, the first use of the word estimands comes from a 1939 economics paper in the Journal of the American Statistical Association (JASA) entitled “The Concept of Demand and Price Elasticity—The Dynamics of Automobile Demand.” I will not endeavor to explain or to belabor the authors subject matter and intent but only note that the authors at one point state (rather matter-of-factly), “The analysis ought to include explicitly as many of the major variables affecting the estimand* as possible.” [Note: Their italics, not mine.] The asterisk refers to a footnote that states, “Statistics needs a term like estimand to replace the lengthy phrase independent variable.” [Note: Again, their italics, not mine.] So, we see that the use of the word does not quite coincide with the current use we have today.

So, what happened next?

Again, as far as I can tell, the term “estimand” lay dormant until 1962 when it was used by none other than John Tukey in his very famous paper, The Future of Data Analysis” published in the Annals of Mathematical Statistics. (Tukey, 1962) In that paper (p. 60 to be precise), Tukey writes a bit cryptically, but I have done a little rearranging of a couple of key sentences that I think more clearly makes the point for our present-day discussion. My version of his argument using his words is “We must give even more attention … to discovering what it is reasonable to think of an estimator as estimating” rather than “starting with an estimator and discovering what is a reasonable estimand.”

There it is! The second use of the term “estimand” that I can find, and it is directly aligned with the use of the word as we think of it today. As many have heard me say in my talks on estimands – first WHAT; then HOW. This is Tukey’s message. Tukey is writing to mathematical statisticians and imploring them to go beyond or behind the mathematical development of estimators and their properties and to think more carefully about the target of estimation … what needs to be estimated … the estimand.

In 1968, Tukey and Fred Mostellar (whom we met ever-so briefly in Blog 24) wrote a Chapter entitled “Data Analysis—Including Statistics” that appeared in the Handbook of Social Psychology in which they use the word “estimand” on one occasion as follows, “We speak of the estimator’s target as an estimand … rather than just as a parameter.” Thereafter, the term seemed to disappear from the statistical lexicon, and (I am speculating here) statisticians may have not adopted the term because they felt the concept was adequately covered by the term “parameter.” After all, a parameter in some model is the object/target of our estimation/estimator. Why do we need to call it an “estimand”?

The next reference I can find that uses the word “estimand” supports my speculation above. It appeared in 1983 in the book The Theory of Point Estimation by Erich Lehmann. Very early in the book (p. 4 to be exact) Lehmann writes in technical language, “a real-valued function defined over a parameter space W, whose value at q is to be estimated; we shall call g(q) the estimand.” [Note: his italics, not mine.] Yikes! For the purposes of this blog, suffice to say that the estimand is a parameter from some model involving a probability distribution. I will note that p. 4 is the only reference to the word estimand in the book that I can find.

The Estimand Explosion

As noted in Blog 24, intention-to-treat was firmly established in 1998 with the issuance of the international guidance ICH E9 Statistical Principles for Clinical Trials. However, in the 2000’s, some statisticians and clinicians were uncomfortable with its use in some/many situations, especially those involved with evaluating symptomatic treatments. I will repeat the concern from Blog 24 as a reminder.

Let’s say a diabetic patient is randomized to an experimental treatment (X) or control (C). A patient on X has an adverse event mid-way through a 52-week trial and is given a “rescue medication (R)” that is a known, approved, marketed anti-diabetic treatment. The patient does well on R with a notable reduction in HbA1c and acceptable safety. Using the ITT approach, that favorable outcome is attributed to X and analyze as if HbA1c that followed R was caused by X. A similar scenario could occur for patients on C; the patient fails on C and gets R and succeeds with the successful outcome being attributed to C for inference. And by the way, the R following E could be different than the R following C. Of course, this is further complicated by the fact that, inevitably, there are some patients who leave the study and for whom we cannot make a 52-week measurement of their outcome – HbA1c or safety.

The US FDA commissioned a group of experts to address the clinical trial reality that not all patients follow the protocol, and therefore, there is incomplete data on the randomized study treatments. Remember our key logical expression:

Randomization + complete data implies cause-and-effect, or

Thus, incomplete data breaks the logic for making C&E inference. That expert committee issued a report in 2010 entitled The Prevention and Treatment of Missing Data in Clinical Trials. [Note: I prefer to use the term “incomplete data” since that is the foundation of C&E inference. I won’t spend time elaborating on this, but I give some more detailed explanations in my book Does This Treatment Cause that Outcome? The Science of Estimating a Treatment effect and Why It Matters.] In that 160 page report the term “estimand” is used over 100 times, though its only definition is given as “that which is to be estimated.” The report does focus a considerable amount of attention describing the importance of defining what treatment effect is of greatest interest for a given clinical trial.

Using that report as a starting point, in 2014 the pharmaceutical, statistical community created a Concept Paper for ICH to revise ICH E9 to include the concepts of estimands and sensitivity analysis – what is known as ICH E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials, which became approved in various parts of the pharmaceutical regulatory world starting in early 2020 with further adoption ongoing as of this writing.

ICH E9(R1) – The Basics

ICH E9(R1) also describes an estimand as “that which is to be estimated” and further elaborates, “An estimand is a precise description of the treatment effect reflecting the clinical question posed by a given clinical trial objective.” So, here we see the importance of the clinical question of interest. The clinical question – keep that in mind. This is why I say that the estimand is essentially a clinical matter, although much of its development and implementation to date has swirled in the statistical world.

To its credit, ICH E9(R1) digs deeper into the concept and defines an estimand using four attributes and five strategies. I think of the four attributes as the WHAT, and the five strategies as the HOW [Recall: In Blog 24 I argued (along with many others) that we should focus first on the WHAT and then on the HOW.]

The four attributes are

Treatment Condition
Population
Outcome Variable
Population-level Summary Statistic.

The five strategies are

Treatment policy
Hypothetical
Composite
While on Treatment
Principal Stratification.

The four attributes may look very familiar and some critics of ICH E9(R1) have said, “What’s new?” The novelty in ICH E9(R1) is in its attempt to be explicit and precise with these concepts. I do not have the time or space here for thorough descriptions of each of these elements (It would fill a book! In fact, it did … in my forthcoming book). However, I will take a high-level view of the concepts contained in ICH E9(R1).

Importantly, for this blog and the ensuing blogs on the papers I have previously mentioned, these strategies are all general methodologies for handling incomplete data on the randomized study treatment (RST). Allow me to explain in short.

The treatment policy strategy is to ignore the discontinuation of the RST and collect data through the end of the study so that there is an outcome on every randomized patient.
The hypothetical strategy “creates” complete data by imputing what a patient’s response would have been if they had completed the trial.
The composite strategy creates a new variable that, in general, includes an efficacy measure and a measure to accommodate reasons to discontinuing the RST.
The while on treatment strategy only uses efficacy measurements during the time the patient is on the RST, and thus, every patient has an outcome.
One version of the principal stratification strategy is to assess the treatment effect in the principal stratum of patients who can adhere to the RST and complete the study.

In each case, assumptions are made to “restore” complete data so that cause-and-effect inference can be made.

This is a crucial point.

When statisticians analyze data from a clinical trial, they write a mathematical model to describe the data that includes important design features of the clinical trial as well as other variables of interest (i.e., covariates). The model necessarily includes a parameter q that represents the treatment effect. That is the estimand! That is the target of estimation! That model also includes assumptions about the probability distribution governing the model (i.e., normally distributed, equal variances across treatments, proportional hazards across treatments, etc.). The set of assumptions will undoubtedly contain elements for how to deal with the problem of incomplete data using one of the strategies noted above, thereby creating “complete” data. When the “complete” data is put into the model, the statistical analysis will produce an estimate of q as well as some measure of its uncertainty. That estimate is used to describe the treatment effect and plays a central role in declaring whether and to what extent the treatment “works.”

The key is that the model, including the data manipulations involved with implementing one or more of the above strategies, is the embodiment of the estimand – literally. The estimand – the treatment effect parameter – is embedded in the model. Thus, regardless of what a researcher says their estimand is, the model and its embedded treatment effect parameter explicitly define the estimand.

What data is used – whether observed, derived, imputed or manipulated in any other way –

for the analysis and the model applied to that data defines the estimand.

Of course, this is why ICH E9(R1) was written … because statisticians argued about the data and the analysis model without deeper consideration for the estimand they were implicitly defining. This is putting the cart before the horse. ICH E9(R1) intended to drive a conversation primarily about an estimand, and then encouraged statistician to find the best model, strategy, and assumptions to estimate that estimand. We must ask, as Tukey implored, does the parameter in our model (i.e., the estimand) accurately reflect the clinical question of interest?

Yet quite often I see and hear statisticians yammering away about models for imputations, assumptions, and data manipulations to “create” complete data to rectify the problem of incomplete data without recognizing the implicit parameter they are estimating and whether it is clinically meaningful. The statistical world is still getting it backwards, even after years of ICH E9(R1) publication and implementation efforts. The old ways are still too engrained, and change is difficult to enact. This will be described in more detail in the ensuing blogs in this thread and a thorough review of ICH E9(R1) and a detailed implementation approach is presented in my book.

Stay tuned.

References

Fleming TR, Carroll KJ, Wittes JT, Emerson SS, Rothmann MD, Collins S, Levin G. A Perspective on the Appropriate Implementation of ICH E9(R1) Addendum Strategies for Handling Intercurrent Events. Stat Med. 2025 May;44, 10-12.

Charles F. Roos & Victor von Szeliski (1939) The Concept of Demand and Price Elasticity—The Dynamics of Automobile Demand, J Am Statist Assoc, 34:208, 652-664, DOI: 10.1080/01621459.1939.10502402.

Tukey, J. The Future of Data Analysis. Ann of Math Stat. 1962 Mar; 33(1), p. 60.

Lehmann, E. L. Theory of Point Estimation. John Wiley & Sons, 1983. p. 4.

Frank, Many thanks for reading and taking the time to comment. I always learn from your insights and experience. I will try to make a concise resposne to your several points.

I imply that the estimand is a single parameters because, following Lehmann, we can say our estimand theta is a function of model parameters. That is, theta = f(parameters) [sorry for the crude English text; I don’t know how to use mathematical formulas or Greek letters in the comment editor.] Thus, it can all be crunched down into a single parameter that we estimate, point and interval, and make inference about to decide whether the treatment effect represented by that single parameter is meaningful clinically and statistically (with a strong preference for Bayesian inference and decision-making). I think this is what you meant by “many parameter -> one mapping.” That one mapped item is the estimand in my mind.

Your Markov state model reminds me of the Desirability of Outcome Ranking (DOOR) approach advanced by Scott Evans and others as a way to handle the multiple effects (plural) of a treatment distilled into a single outcome measure. But this is more of the “variable” attribute of the estimand.

I still like the Tripartite Estimand Approach. It targets three key clinical questions that I believe are relevant to ALL clinical trials of novel interventions. (1) What are the chances of discontinuing the intervention due to adverse events? (2) What are the chances of discontinuing the intervention due to lack of efficacy? (3) For those who can adhere to the intervention, what are the benefits (treatment effect estimate for the efficacy variable(s)) and risks (longer-term side effects)? I believe these “provide an estimand that is (1) interpretable and (2) estimates a meaningful quantity outside the study” as you have phrased it. The Adherers Average Causal Effect (AdACE) in Estimand 3 is controversial in some circles due to the causal inference approach, potential outcomes framework, and assumptions underlying the current methodology. Nonetheless, as a patients and as a caregiver to a patient, I am willing to accept an approximate answer to Estimand 3 while keeping in mind both Estimands 1 and 2. Many doctors have told me that they wish they had those three pieces of information to give to their patients, recognizing that ALL treatment effect estimates coming from clinical trials are approximations to real-life clinical care.

I agree that the composite outcome is simple and easy for clinicians to interpret. It just seems that the statistical world likes to deal with continuous data whenever possible rather than dichotomizing or categorizing information … for reasons that are well-known.

OK, that’s all for now. THANKS again for your comments. I am still learning.

LikeLike

2 thoughts on “No. 25 – Estimands Part 2 – What Exactly Is an Estimand?”

Frank Harrell says:

July 2, 2026 at 11:57 am
This is a truly excellent article Steve.

You implied that the treatment effect estimand is usually a single parameter. This holds only under certain conditions:
- The model is linear, residuals Gaussian (usually), conditional variance is constant, and no baseline variable interacts with treatment. Then the regression coefficient for treatment (difference in means) is “the” parameter.
- The model is nonlinear, you are interested only in relative effects (odds or hazard ratios, etc.), and the treatment effect does not interact with baseline variables.
In other cases the estimand will usually be a many parameter -> one quantity mapping. Here are some examples:
- For time-to-event outcomes we may be interested in the covariate-specific difference in cumulative incidence at a specific time horizon, or restricted mean survival time (RMST) (note that many statisticians make the mistake of computing marginal cumulative incidence or marginal RMSTs which do not have an interpretation and do not transport to any population). Interpretable, generalizable estimands are a function of multiple survival curve parameters in addition to covariate effects.
- For general outcomes including recurrent events, state transition models provide the most general framework and one may be interested in covariate-specific difference in expected time in a given set of states, e.g., mean time alive and well. The treatment estimand is a summation over time periods of conditional state occupancy probabilities.
Regarding the overall approaches you so nicely outlined, I submit that only the composite outcome approach provides an estimand that is (1) interpretable and (2) estimates a meaningful quantity outside the study. In my opinion we need to account for the need for rescue therapies, and to count these occurrences as bad outcomes. For example a MOST (Markov Ordinal State Transition) model might have death as the worst outcome and initial treatment failure as an intermediate outcome. In a state transition model the need for rescue therapy would create an absorbing state, and the patient would not need to be followed beyond that point except for safety analyses. An estimand might be the covariate-conditional number of months gained with a good outcome and without the need to switch therapies.

The composite outcome approach happens to be one of the simplest approaches to apply also.

LikeLike
1. Steve Ruberg says:
  
  July 14, 2026 at 1:42 pm
  
  Frank, Many thanks for reading and taking the time to comment. I always learn from your insights and experience. I will try to make a concise resposne to your several points.
  
  I imply that the estimand is a single parameters because, following Lehmann, we can say our estimand theta is a function of model parameters. That is, theta = f(parameters) [sorry for the crude English text; I don’t know how to use mathematical formulas or Greek letters in the comment editor.] Thus, it can all be crunched down into a single parameter that we estimate, point and interval, and make inference about to decide whether the treatment effect represented by that single parameter is meaningful clinically and statistically (with a strong preference for Bayesian inference and decision-making). I think this is what you meant by “many parameter -> one mapping.” That one mapped item is the estimand in my mind.
  
  Your Markov state model reminds me of the Desirability of Outcome Ranking (DOOR) approach advanced by Scott Evans and others as a way to handle the multiple effects (plural) of a treatment distilled into a single outcome measure. But this is more of the “variable” attribute of the estimand.
  
  I still like the Tripartite Estimand Approach. It targets three key clinical questions that I believe are relevant to ALL clinical trials of novel interventions. (1) What are the chances of discontinuing the intervention due to adverse events? (2) What are the chances of discontinuing the intervention due to lack of efficacy? (3) For those who can adhere to the intervention, what are the benefits (treatment effect estimate for the efficacy variable(s)) and risks (longer-term side effects)? I believe these “provide an estimand that is (1) interpretable and (2) estimates a meaningful quantity outside the study” as you have phrased it. The Adherers Average Causal Effect (AdACE) in Estimand 3 is controversial in some circles due to the causal inference approach, potential outcomes framework, and assumptions underlying the current methodology. Nonetheless, as a patients and as a caregiver to a patient, I am willing to accept an approximate answer to Estimand 3 while keeping in mind both Estimands 1 and 2. Many doctors have told me that they wish they had those three pieces of information to give to their patients, recognizing that ALL treatment effect estimates coming from clinical trials are approximations to real-life clinical care.
  
  I agree that the composite outcome is simple and easy for clinicians to interpret. It just seems that the statistical world likes to deal with continuous data whenever possible rather than dichotomizing or categorizing information … for reasons that are well-known.
  
  OK, that’s all for now. THANKS again for your comments. I am still learning.
  
  LikeLike

No. 25 – Estimands Part 2 – What Exactly Is an Estimand?

Published by Steve Ruberg

2 thoughts on “No. 25 – Estimands Part 2 – What Exactly Is an Estimand?”

Leave a comment Cancel reply

Share this:

Related

Published by Steve Ruberg

2 thoughts on “No. 25 – Estimands Part 2 – What Exactly Is an Estimand?”

Leave a comment Cancel reply