Item nonresponse:
Unit nonresponse:
Item nonresponse:
Unit nonresponse:
We will only focus on item nonresponse.
Item nonresponse:
Unit nonresponse:
We will only focus on item nonresponse.
If you are interested in building models for both unit and item nonresponse, here is a paper on some of the research I have done on the topic: https://arxiv.org/pdf/1907.06145.pdf
What can happen when using available case analyses with different types of missing data?
What can happen when using available case analyses with different types of missing data?
What can happen when using available case analyses with different types of missing data?
MCAR: unbiased when disregarding missing data; variance increase (losing partially complete data)
MAR: biased (depending on the strength of MAR and amount of missing data) when missing data mechanism is not modeled; variance increase (losing partially complete data).
What can happen when using available case analyses with different types of missing data?
MCAR: unbiased when disregarding missing data; variance increase (losing partially complete data)
MAR: biased (depending on the strength of MAR and amount of missing data) when missing data mechanism is not modeled; variance increase (losing partially complete data).
NMAR: generally biased!
Marginal/conditional mean imputation
Nearest neighbor imputation:
Marginal/conditional mean imputation
Nearest neighbor imputation:
Use observation from one of the previous time periods (for panel data)
Plug in the variable mean for missing values.
Plug in the variable mean for missing values.
Plug in the variable mean for missing values.
Point estimates of means OK under MCAR
Variances and covariances underestimated.
Plug in the variable mean for missing values.
Point estimates of means OK under MCAR
Variances and covariances underestimated.
Distributional characteristics altered.
Plug in the variable mean for missing values.
Point estimates of means OK under MCAR
Variances and covariances underestimated.
Distributional characteristics altered.
Regression coefficients inaccurate.
Plug in the variable mean for missing values.
Point estimates of means OK under MCAR
Variances and covariances underestimated.
Distributional characteristics altered.
Regression coefficients inaccurate.
Similar problems for plug-in conditional means.
Plug in donors' observed values.
Plug in donors' observed values.
Plug in donors' observed values.
Hot deck: for each non-respondent, find a respondent who "looks like" the non-respondent in the same dataset
Cold deck: find potential donors in an external but similar dataset. For example, respondents from a 2016 election poll survey might serve as potential donors for non-respondents in the 2018 version of the same survey.
Plug in donors' observed values.
Hot deck: for each non-respondent, find a respondent who "looks like" the non-respondent in the same dataset
Cold deck: find potential donors in an external but similar dataset. For example, respondents from a 2016 election poll survey might serve as potential donors for non-respondents in the 2018 version of the same survey.
Common metrics: Statistical distance, adjustment cells, propensity scores.
Point estimates of means OK under MAR.
Variances and covariances underestimated.
Point estimates of means OK under MAR.
Variances and covariances underestimated.
Distributional characteristics OK.
Point estimates of means OK under MAR.
Variances and covariances underestimated.
Distributional characteristics OK.
Regression coefficients OK under MAR.
Fill in dataset m times with imputations.
Analyze repeated data sets separately, then combine the estimates from each one.
Fill in dataset m times with imputations.
Analyze repeated data sets separately, then combine the estimates from each one.
Imputations drawn from probability models for missing data.
Fill in dataset m times with imputations.
Analyze repeated data sets separately, then combine the estimates from each one.
Imputations drawn from probability models for missing data.
Suppose
Suppose
Y= income (unit of measurement is $10,000)
X= level of education (0 = undergraduate, 1 = graduate)
Suppose
Y= income (unit of measurement is $10,000)
X= level of education (0 = undergraduate, 1 = graduate)
Rubin (1987)
Population estimand: Q
Sample estimate: q
Variance of q: u
In each imputed dataset dj, where j=1,…,m, calculate qj=q(dj) uj=u(dj)
Suppose we are interested in estimating the mean income in our example. Then
Suppose we are interested in estimating the mean income in our example. Then
Q = μY
q=ˉy=1nn∑i=1yi
Suppose we are interested in estimating the mean income in our example. Then
Q = μY
q=ˉy=1nn∑i=1yi
u = ˆV[ˉy]=s2n
Suppose we are interested in estimating the mean income in our example. Then
Q = μY
q=ˉy=1nn∑i=1yi
u = ˆV[ˉy]=s2n
In each imputed dataset dj, calculate qj=ˉyj and uj=s2jn
ˉqm=m∑i=1qim
bm=m∑i=1(qi−ˉqm)2m−1
ˉum=m∑i=1uim
ˉqm
MI estimate of Q:
ˉqm
MI estimate of variance is:
Tm=(1+1/m)bm+ˉum
MI estimate of Q:
ˉqm
MI estimate of variance is:
Tm=(1+1/m)bm+ˉum
Use t-distribution inference for Q
ˉqm±t1−α/2√Tm
Notice that the variance incorporates uncertainty both from within and between the m datasets.
Back to our income example,
Back to our income example,
By the way, ˉy=12.64 from the "true complete dataset".
ˉqm=m∑j=1qjm=12.66+13.14+12.903=12.90
MI estimate of Q:
ˉqm=m∑j=1qjm=12.66+13.14+12.903=12.90
Between variance
bm=m∑j=1(qj−ˉqm)2m−1=0.06
MI estimate of Q:
ˉqm=m∑j=1qjm=12.66+13.14+12.903=12.90
Between variance
bm=m∑j=1(qj−ˉqm)2m−1=0.06
Within variance
ˉum=m∑j=1ujm=0.37+0.29+0.323=0.33
MI estimate of Q:
ˉqm=m∑j=1qjm=12.66+13.14+12.903=12.90
Between variance
bm=m∑j=1(qj−ˉqm)2m−1=0.06
Within variance
ˉum=m∑j=1ujm=0.37+0.29+0.323=0.33
MI estimate of variance is:
Tm=(1+1/m)bm+ˉum=(1+1/3)0.06+0.33=0.41
MI estimate of Q:
ˉqm=m∑j=1qjm=12.66+13.14+12.903=12.90
Between variance
bm=m∑j=1(qj−ˉqm)2m−1=0.06
Within variance
ˉum=m∑j=1ujm=0.37+0.29+0.323=0.33
MI estimate of variance is:
Tm=(1+1/m)bm+ˉum=(1+1/3)0.06+0.33=0.41
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |