Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

IDS 702: Module 5.2

Imputation methods I

Dr. Olanrewaju Michael Akande

1 / 16

Strategies for handling missing data

  • Item nonresponse:
    • use complete/available cases analyses
    • single imputation methods
    • multiple imputation
    • model-based methods
2 / 16

Strategies for handling missing data

  • Item nonresponse:

    • use complete/available cases analyses
    • single imputation methods
    • multiple imputation
    • model-based methods
  • Unit nonresponse:

    • weighting adjustments
    • model-based methods (identifiability issues!).
2 / 16

Strategies for handling missing data

  • Item nonresponse:

    • use complete/available cases analyses
    • single imputation methods
    • multiple imputation
    • model-based methods
  • Unit nonresponse:

    • weighting adjustments
    • model-based methods (identifiability issues!).
  • We will only focus on item nonresponse.

2 / 16

Strategies for handling missing data

  • Item nonresponse:

    • use complete/available cases analyses
    • single imputation methods
    • multiple imputation
    • model-based methods
  • Unit nonresponse:

    • weighting adjustments
    • model-based methods (identifiability issues!).
  • We will only focus on item nonresponse.

  • If you are interested in building models for both unit and item nonresponse, here is a paper on some of the research I have done on the topic: https://arxiv.org/pdf/1907.06145.pdf

2 / 16

Complete/available cases analyses

What can happen when using available case analyses with different types of missing data?

3 / 16

Complete/available cases analyses

What can happen when using available case analyses with different types of missing data?

  • MCAR: unbiased when disregarding missing data; variance increase (losing partially complete data)
3 / 16

Complete/available cases analyses

What can happen when using available case analyses with different types of missing data?

  • MCAR: unbiased when disregarding missing data; variance increase (losing partially complete data)

  • MAR: biased (depending on the strength of MAR and amount of missing data) when missing data mechanism is not modeled; variance increase (losing partially complete data).

3 / 16

Complete/available cases analyses

What can happen when using available case analyses with different types of missing data?

  • MCAR: unbiased when disregarding missing data; variance increase (losing partially complete data)

  • MAR: biased (depending on the strength of MAR and amount of missing data) when missing data mechanism is not modeled; variance increase (losing partially complete data).

  • NMAR: generally biased!

3 / 16

Single imputation methods

4 / 16

Single imputation methods

  • Marginal/conditional mean imputation
4 / 16

Single imputation methods

  • Marginal/conditional mean imputation

  • Nearest neighbor imputation:

    • hot deck imputation
    • cold deck imputation
4 / 16

Single imputation methods

  • Marginal/conditional mean imputation

  • Nearest neighbor imputation:

    • hot deck imputation
    • cold deck imputation
  • Use observation from one of the previous time periods (for panel data)

    • LOCF -- last observation carried forward
    • BOCF -- baseline observation carried forward
4 / 16

Mean imputation

Plug in the variable mean for missing values.

5 / 16

Mean imputation

Plug in the variable mean for missing values.

  • Point estimates of means OK under MCAR
5 / 16

Mean imputation

Plug in the variable mean for missing values.

  • Point estimates of means OK under MCAR

  • Variances and covariances underestimated.

5 / 16

Mean imputation

Plug in the variable mean for missing values.

  • Point estimates of means OK under MCAR

  • Variances and covariances underestimated.

  • Distributional characteristics altered.

5 / 16

Mean imputation

Plug in the variable mean for missing values.

  • Point estimates of means OK under MCAR

  • Variances and covariances underestimated.

  • Distributional characteristics altered.

  • Regression coefficients inaccurate.

5 / 16

Mean imputation

Plug in the variable mean for missing values.

  • Point estimates of means OK under MCAR

  • Variances and covariances underestimated.

  • Distributional characteristics altered.

  • Regression coefficients inaccurate.

Similar problems for plug-in conditional means.

5 / 16

Nearest neighbor imputation

Plug in donors' observed values.

6 / 16

Nearest neighbor imputation

Plug in donors' observed values.

  • Hot deck: for each non-respondent, find a respondent who "looks like" the non-respondent in the same dataset
6 / 16

Nearest neighbor imputation

Plug in donors' observed values.

  • Hot deck: for each non-respondent, find a respondent who "looks like" the non-respondent in the same dataset

  • Cold deck: find potential donors in an external but similar dataset. For example, respondents from a 2016 election poll survey might serve as potential donors for non-respondents in the 2018 version of the same survey.

6 / 16

Nearest neighbor imputation

Plug in donors' observed values.

  • Hot deck: for each non-respondent, find a respondent who "looks like" the non-respondent in the same dataset

  • Cold deck: find potential donors in an external but similar dataset. For example, respondents from a 2016 election poll survey might serve as potential donors for non-respondents in the 2018 version of the same survey.

  • Common metrics: Statistical distance, adjustment cells, propensity scores.

6 / 16

Nearest neighbor imputation

  • Point estimates of means OK under MAR.
7 / 16

Nearest neighbor imputation

  • Point estimates of means OK under MAR.

  • Variances and covariances underestimated.

7 / 16

Nearest neighbor imputation

  • Point estimates of means OK under MAR.

  • Variances and covariances underestimated.

  • Distributional characteristics OK.

7 / 16

Nearest neighbor imputation

  • Point estimates of means OK under MAR.

  • Variances and covariances underestimated.

  • Distributional characteristics OK.

  • Regression coefficients OK under MAR.

7 / 16

Multiple imputation (MI)

  • Fill in dataset m times with imputations.
8 / 16

Multiple imputation (MI)

  • Fill in dataset m times with imputations.

  • Analyze repeated data sets separately, then combine the estimates from each one.

8 / 16

Multiple imputation (MI)

  • Fill in dataset m times with imputations.

  • Analyze repeated data sets separately, then combine the estimates from each one.

  • Imputations drawn from probability models for missing data.

8 / 16

Multiple imputation (MI)

  • Fill in dataset m times with imputations.

  • Analyze repeated data sets separately, then combine the estimates from each one.

  • Imputations drawn from probability models for missing data.

8 / 16

MI example

Suppose

  • Y= income (unit of measurement is $10,000)
9 / 16

MI example

Suppose

  • Y= income (unit of measurement is $10,000)

  • X= level of education (0 = undergraduate, 1 = graduate)

9 / 16

MI example

Suppose

  • Y= income (unit of measurement is $10,000)

  • X= level of education (0 = undergraduate, 1 = graduate)

9 / 16

MI: inferences from multiply-imputed datasets

Rubin (1987)

  • Population estimand: Q

  • Sample estimate: q

  • Variance of q: u

  • In each imputed dataset dj, where j=1,,m, calculate qj=q(dj) uj=u(dj)

10 / 16

MI example: inferences from multiply-imputed datasets

Suppose we are interested in estimating the mean income in our example. Then

  • Q = μY
11 / 16

MI example: inferences from multiply-imputed datasets

Suppose we are interested in estimating the mean income in our example. Then

  • Q = μY

  • q=ˉy=1nni=1yi

11 / 16

MI example: inferences from multiply-imputed datasets

Suppose we are interested in estimating the mean income in our example. Then

  • Q = μY

  • q=ˉy=1nni=1yi

  • u = ˆV[ˉy]=s2n

11 / 16

MI example: inferences from multiply-imputed datasets

Suppose we are interested in estimating the mean income in our example. Then

  • Q = μY

  • q=ˉy=1nni=1yi

  • u = ˆV[ˉy]=s2n

  • In each imputed dataset dj, calculate qj=ˉyj   and   uj=s2jn

11 / 16

MI: quantities needed for inference

  • ˉqm=mi=1qim

  • bm=mi=1(qiˉqm)2m1

  • ˉum=mi=1uim

12 / 16

MI: inferences from multiply-imputed datasets

  • MI estimate of Q:

    ˉqm

13 / 16

MI: inferences from multiply-imputed datasets

  • MI estimate of Q:

    ˉqm

  • MI estimate of variance is:

    Tm=(1+1/m)bm+ˉum

13 / 16

MI: inferences from multiply-imputed datasets

  • MI estimate of Q:

    ˉqm

  • MI estimate of variance is:

    Tm=(1+1/m)bm+ˉum

  • Use t-distribution inference for Q

    ˉqm±t1α/2Tm

    Notice that the variance incorporates uncertainty both from within and between the m datasets.

13 / 16

MI example

Back to our income example,

14 / 16

MI example

Back to our income example,

By the way, ˉy=12.64 from the "true complete dataset".

14 / 16

MI example

  • MI estimate of Q:

    ˉqm=mj=1qjm=12.66+13.14+12.903=12.90

15 / 16

MI example

  • MI estimate of Q:

    ˉqm=mj=1qjm=12.66+13.14+12.903=12.90

  • Between variance

    bm=mj=1(qjˉqm)2m1=0.06

15 / 16

MI example

  • MI estimate of Q:

    ˉqm=mj=1qjm=12.66+13.14+12.903=12.90

  • Between variance

    bm=mj=1(qjˉqm)2m1=0.06

  • Within variance

    ˉum=mj=1ujm=0.37+0.29+0.323=0.33

15 / 16

MI example

  • MI estimate of Q:

    ˉqm=mj=1qjm=12.66+13.14+12.903=12.90

  • Between variance

    bm=mj=1(qjˉqm)2m1=0.06

  • Within variance

    ˉum=mj=1ujm=0.37+0.29+0.323=0.33

  • MI estimate of variance is:

    Tm=(1+1/m)bm+ˉum=(1+1/3)0.06+0.33=0.41

15 / 16

MI example

  • MI estimate of Q:

    ˉqm=mj=1qjm=12.66+13.14+12.903=12.90

  • Between variance

    bm=mj=1(qjˉqm)2m1=0.06

  • Within variance

    ˉum=mj=1ujm=0.37+0.29+0.323=0.33

  • MI estimate of variance is:

    Tm=(1+1/m)bm+ˉum=(1+1/3)0.06+0.33=0.41

    Where should the imputations come from? We will answer that soon!
15 / 16

What's next?

Move on to the readings for the next module!

16 / 16

Strategies for handling missing data

  • Item nonresponse:
    • use complete/available cases analyses
    • single imputation methods
    • multiple imputation
    • model-based methods
2 / 16
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow