Propensity score analysis (in observational studies) typically involves two stages:
Propensity score analysis (in observational studies) typically involves two stages:
Propensity score analysis (in observational studies) typically involves two stages:
Stage 1. Estimate the propensity score: by a logistic regression model or machine learning methods.
Stage 2. Given the estimated propensity score, estimate the causal effects through one of these methods:
Propensity score analysis (in observational studies) typically involves two stages:
Stage 1. Estimate the propensity score: by a logistic regression model or machine learning methods.
Stage 2. Given the estimated propensity score, estimate the causal effects through one of these methods:
The general idea is to use the estimated propensity scores to correct for lack of balance between groups, then go on to estimate the causal effect using the "balanced" data.
The main purpose of estimating propensity score is to ensure overlap and balance of covariates between treatment groups, instead of “finding a perfect fit" of propensity score.
As long as the important covariates are balanced, model overfitting is not a concern; underfitting can be a problem however.
The main purpose of estimating propensity score is to ensure overlap and balance of covariates between treatment groups, instead of “finding a perfect fit" of propensity score.
As long as the important covariates are balanced, model overfitting is not a concern; underfitting can be a problem however.
Essentially any balancing score (not necessarily propensity score) would be good enough for practical use.
A standard procedure for estimating propensity scores includes:
initial fit;
discarding outliers (with too large or too small propensity scores);
A standard procedure for estimating propensity scores includes:
initial fit;
discarding outliers (with too large or too small propensity scores);
check covariate balance; and
A standard procedure for estimating propensity scores includes:
initial fit;
discarding outliers (with too large or too small propensity scores);
check covariate balance; and
re-fit if necessary.
W_i | X_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ \ \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = X_i\boldsymbol{\beta}.
Step 1. Estimate propensity score using a logistic regression:
W_i | X_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ \ \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = X_i\boldsymbol{\beta}.
Include all covariates in this initial model or do a stepwise selection on the covariates and interactions to get an initial estimate of the propensity scores. That is,
\hat{e}^0(X_i) = \dfrac{e^{X_i\hat{\boldsymbol{\beta}}}}{1 + e^{X_i\hat{\boldsymbol{\beta}}}}.
Step 1. Estimate propensity score using a logistic regression:
W_i | X_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ \ \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = X_i\boldsymbol{\beta}.
Include all covariates in this initial model or do a stepwise selection on the covariates and interactions to get an initial estimate of the propensity scores. That is,
\hat{e}^0(X_i) = \dfrac{e^{X_i\hat{\boldsymbol{\beta}}}}{1 + e^{X_i\hat{\boldsymbol{\beta}}}}.
Can also use machine learning methods.
Step 2. Check overlap of propensity score between treatment groups. If necessary, discard the observations with non-overlapping propensity scores.
Step 3. Assess balance given by initial model in Step 1.
Step 2. Check overlap of propensity score between treatment groups. If necessary, discard the observations with non-overlapping propensity scores.
Step 3. Assess balance given by initial model in Step 1.
Step 4. If one or more covariates are seriously unbalanced, include some of their higher order terms and/or interactions to re-fit the propensity score model and repeat Steps 1-3, until most covariates are balanced.
Step 2. Check overlap of propensity score between treatment groups. If necessary, discard the observations with non-overlapping propensity scores.
Step 3. Assess balance given by initial model in Step 1.
Step 4. If one or more covariates are seriously unbalanced, include some of their higher order terms and/or interactions to re-fit the propensity score model and repeat Steps 1-3, until most covariates are balanced.
Note: There are situations where some important covariates will still not be completely balanced after repeated trials. Then they should be taken into account in Stage 2 (outcome stage) of propensity score analysis.
In practice, balance checking in the PS estimation stage can be done via sub-classification/stratification, matching or weighting.
The workflow is the same: fit initial model, check balance (sub-classification, matching or weighting), then refit.
Given the estimated propensity score, we can estimate the causal estimands through sub-classification/stratification, weighting or matching.
Let's start with stratification.
Given the estimated propensity score, we can estimate the causal estimands through sub-classification/stratification, weighting or matching.
Let's start with stratification.
Recall that the result of 5 strata of a single covariate removes 90% bias.
Given the estimated propensity score, we can estimate the causal estimands through sub-classification/stratification, weighting or matching.
Let's start with stratification.
Recall that the result of 5 strata of a single covariate removes 90% bias.
Stratification using propensity score as the summary score should have approximately the same effects.
Divide the subjects in to K strata by the corresponding quantiles of the estimated propensity scores.
ATE: estimate ATE within each stratum and then average by the block size. That is,
\hat{\tau}^{ATE} = \sum_{k=1}^K \left(\bar{Y}_{k,1} - \bar{Y}_{k,0} \right) \dfrac{N_{k,1}+N_{k,0}}{N},
with N_{k,1} and N_{k,0} being the numbers of units in class k under treated and control, respectively.
Divide the subjects in to K strata by the corresponding quantiles of the estimated propensity scores.
ATE: estimate ATE within each stratum and then average by the block size. That is,
\hat{\tau}^{ATE} = \sum_{k=1}^K \left(\bar{Y}_{k,1} - \bar{Y}_{k,0} \right) \dfrac{N_{k,1}+N_{k,0}}{N},
with N_{k,1} and N_{k,0} being the numbers of units in class k under treated and control, respectively.
ATT: weight within-block ATE by proportion of treated units N_{k,1}/N_1.
Divide the subjects in to K strata by the corresponding quantiles of the estimated propensity scores.
ATE: estimate ATE within each stratum and then average by the block size. That is,
\hat{\tau}^{ATE} = \sum_{k=1}^K \left(\bar{Y}_{k,1} - \bar{Y}_{k,0} \right) \dfrac{N_{k,1}+N_{k,0}}{N},
with N_{k,1} and N_{k,0} being the numbers of units in class k under treated and control, respectively.
ATT: weight within-block ATE by proportion of treated units N_{k,1}/N_1.
A variance estimator for \hat{\tau}^{ATE} is
\mathbb{Var}\left[\hat{\tau}^{ATE}\right] = \sum_{k=1}^K \left(\mathbb{Var}[\bar{Y}_{k,1}] - \mathbb{Var}[\bar{Y}_{k,0}] \right) \left(\dfrac{N_{k,1}+N_{k,0}}{N}\right)^2,
or use bootstrap.
5 blocks is usually not enough, consider higher number such as 10.
Stratification is a coarsened version of matching.
5 blocks is usually not enough, consider higher number such as 10.
Stratification is a coarsened version of matching.
Empirical results from real applications and situations: usually not as good as matching or weighting.
5 blocks is usually not enough, consider higher number such as 10.
Stratification is a coarsened version of matching.
Empirical results from real applications and situations: usually not as good as matching or weighting.
Good for cases with extreme outliers (smoothing): less sensitive, but also less efficient.
5 blocks is usually not enough, consider higher number such as 10.
Stratification is a coarsened version of matching.
Empirical results from real applications and situations: usually not as good as matching or weighting.
Good for cases with extreme outliers (smoothing): less sensitive, but also less efficient.
Can be combined with regression: first estimate causal effects using regression within each block and then average the within-subclass estimates.
In propensity score matching, potential matches are compared using (estimated) propensity score.
1-to-n closest neighbor matching is common when the control group is large compared to treatment group.
In propensity score matching, potential matches are compared using (estimated) propensity score.
1-to-n closest neighbor matching is common when the control group is large compared to treatment group.
In most software packages, the default is actually 1-to-1 closest neighbor matching.
In propensity score matching, potential matches are compared using (estimated) propensity score.
1-to-n closest neighbor matching is common when the control group is large compared to treatment group.
In most software packages, the default is actually 1-to-1 closest neighbor matching.
Pros: robust, matched pairs (so you can do within pair analysis).
In propensity score matching, potential matches are compared using (estimated) propensity score.
1-to-n closest neighbor matching is common when the control group is large compared to treatment group.
In most software packages, the default is actually 1-to-1 closest neighbor matching.
Pros: robust, matched pairs (so you can do within pair analysis).
Sometimes, dimension reduction via the propensity score may be too drastic, recent methods advocate matching on the multivariate covariates directly.
In propensity score matching, potential matches are compared using (estimated) propensity score.
1-to-n closest neighbor matching is common when the control group is large compared to treatment group.
In most software packages, the default is actually 1-to-1 closest neighbor matching.
Pros: robust, matched pairs (so you can do within pair analysis).
Sometimes, dimension reduction via the propensity score may be too drastic, recent methods advocate matching on the multivariate covariates directly.
Nonetheless, this is what we will focus on for our minimum wage data.
{Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)
Remember the key propensity score property:
{Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)
Idea: in a regression estimator, adjusting for e(X) instead of the whole X; thus in regression models of Y(w) use e(X) as the single predictor.
Remember the key propensity score property:
{Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)
Idea: in a regression estimator, adjusting for e(X) instead of the whole X; thus in regression models of Y(w) use e(X) as the single predictor.
Clearly, modeling \mathbb{Pr}(Y(w)|\hat{e}(X)) is simpler than modeling \mathbb{Pr}(Y(w)|X); effectively more data to estimate essential parameters due to the dimension reduction.
Remember the key propensity score property:
{Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)
Idea: in a regression estimator, adjusting for e(X) instead of the whole X; thus in regression models of Y(w) use e(X) as the single predictor.
Clearly, modeling \mathbb{Pr}(Y(w)|\hat{e}(X)) is simpler than modeling \mathbb{Pr}(Y(w)|X); effectively more data to estimate essential parameters due to the dimension reduction.
However,
Remember the key propensity score property:
{Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)
Idea: in a regression estimator, adjusting for e(X) instead of the whole X; thus in regression models of Y(w) use e(X) as the single predictor.
Clearly, modeling \mathbb{Pr}(Y(w)|\hat{e}(X)) is simpler than modeling \mathbb{Pr}(Y(w)|X); effectively more data to estimate essential parameters due to the dimension reduction.
However,
we lose interpretation of the effects of individual covariates, e.g. age, sex; and
reduction to the one-dimensional propensity score may be too drastic.
Idea: instead of using the estimated \hat{e}(X) as the single predictor, use it as an additional predictor in the model. That is, \mathbb{Pr}(Y(w)|X,\hat{e}(X)).
Turns out that \mathbb{Pr}(Y(w)|X,\hat{e}(X)) gives both efficiency and robustness.
Idea: instead of using the estimated \hat{e}(X) as the single predictor, use it as an additional predictor in the model. That is, \mathbb{Pr}(Y(w)|X,\hat{e}(X)).
Turns out that \mathbb{Pr}(Y(w)|X,\hat{e}(X)) gives both efficiency and robustness.
Also, if we are unable to achieve full balance on some of the predictors, using \mathbb{Pr}(Y(w)|X,\hat{e}(X)) will help further control for those unbalance predictors.
Idea: instead of using the estimated \hat{e}(X) as the single predictor, use it as an additional predictor in the model. That is, \mathbb{Pr}(Y(w)|X,\hat{e}(X)).
Turns out that \mathbb{Pr}(Y(w)|X,\hat{e}(X)) gives both efficiency and robustness.
Also, if we are unable to achieve full balance on some of the predictors, using \mathbb{Pr}(Y(w)|X,\hat{e}(X)) will help further control for those unbalance predictors.
Empirical evidences (e.g. simulations) support this claim.
Propensity score analysis (in observational studies) typically involves two stages:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |