The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes.Stat Methods Med Res. 2017 Aug; 26(4):1654-1670.SM
There is increasing interest in estimating the causal effects of treatments using observational data. Propensity-score matching methods are frequently used to adjust for differences in observed characteristics between treated and control individuals in observational studies. Survival or time-to-event outcomes occur frequently in the medical literature, but the use of propensity score methods in survival analysis has not been thoroughly investigated. This paper compares two approaches for estimating the Average Treatment Effect (ATE) on survival outcomes: Inverse Probability of Treatment Weighting (IPTW) and full matching. The performance of these methods was compared in an extensive set of simulations that varied the extent of confounding and the amount of misspecification of the propensity score model. We found that both IPTW and full matching resulted in estimation of marginal hazard ratios with negligible bias when the ATE was the target estimand and the treatment-selection process was weak to moderate. However, when the treatment-selection process was strong, both methods resulted in biased estimation of the true marginal hazard ratio, even when the propensity score model was correctly specified. When the propensity score model was correctly specified, bias tended to be lower for full matching than for IPTW. The reasons for these biases and for the differences between the two methods appeared to be due to some extreme weights generated for each method. Both methods tended to produce more extreme weights as the magnitude of the effects of covariates on treatment selection increased. Furthermore, more extreme weights were observed for IPTW than for full matching. However, the poorer performance of both methods in the presence of a strong treatment-selection process was mitigated by the use of IPTW with restriction and full matching with a caliper restriction when the propensity score model was correctly specified.