Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

IDS 702: Module 1.8

Transformations

Dr. Olanrewaju Michael Akande

1 / 5

Transformations

  • As we have already seen, sometimes, we have to deal with data that fail linearity and normality.
2 / 5

Transformations

  • As we have already seen, sometimes, we have to deal with data that fail linearity and normality.

  • Transforming variables can help with linearity and normality (for the response variable, since we do not need normality of the predictors).

2 / 5

Transformations

  • As we have already seen, sometimes, we have to deal with data that fail linearity and normality.

  • Transforming variables can help with linearity and normality (for the response variable, since we do not need normality of the predictors).

  • The most common transformation is the natural logarithm. For the response variable, that is, loge(y) or ln(y).

2 / 5

Transformations

  • As we have already seen, sometimes, we have to deal with data that fail linearity and normality.

  • Transforming variables can help with linearity and normality (for the response variable, since we do not need normality of the predictors).

  • The most common transformation is the natural logarithm. For the response variable, that is, loge(y) or ln(y).

  • This is often because it is the easiest to interpret.

2 / 5

Transformations

  • As we have already seen, sometimes, we have to deal with data that fail linearity and normality.

  • Transforming variables can help with linearity and normality (for the response variable, since we do not need normality of the predictors).

  • The most common transformation is the natural logarithm. For the response variable, that is, loge(y) or ln(y).

  • This is often because it is the easiest to interpret.

  • Suppose

    ln(yi)=β0+β1xi1+β2xi2++βpxip+ϵi.

2 / 5

Transformations

  • As we have already seen, sometimes, we have to deal with data that fail linearity and normality.

  • Transforming variables can help with linearity and normality (for the response variable, since we do not need normality of the predictors).

  • The most common transformation is the natural logarithm. For the response variable, that is, loge(y) or ln(y).

  • This is often because it is the easiest to interpret.

  • Suppose

    ln(yi)=β0+β1xi1+β2xi2++βpxip+ϵi.

  • Then it is easy to see that

    yi=e(β0+β1xi1+β2xi2++βpxip+ϵi)=eβ0×eβ1xi1×eβ2xi2××eβpxip×eϵi.

2 / 5

Transformations

  • As we have already seen, sometimes, we have to deal with data that fail linearity and normality.

  • Transforming variables can help with linearity and normality (for the response variable, since we do not need normality of the predictors).

  • The most common transformation is the natural logarithm. For the response variable, that is, loge(y) or ln(y).

  • This is often because it is the easiest to interpret.

  • Suppose

    ln(yi)=β0+β1xi1+β2xi2++βpxip+ϵi.

  • Then it is easy to see that

    yi=e(β0+β1xi1+β2xi2++βpxip+ϵi)=eβ0×eβ1xi1×eβ2xi2××eβpxip×eϵi.

  • That is, the predictors actually have a multiplicative effect on y.

2 / 5

Natural log transformation

  • The estimated βj's can be interpreted in terms of approximate proportional differences.
3 / 5

Natural log transformation

  • The estimated βj's can be interpreted in terms of approximate proportional differences.

  • For example, suppose β1=0.10, then eβ1=1.1052.

3 / 5

Natural log transformation

  • The estimated βj's can be interpreted in terms of approximate proportional differences.

  • For example, suppose β1=0.10, then eβ1=1.1052.

  • Thus, a difference of 1 unit in x1 corresponds to an expected positive difference of approximately 11% in y.

3 / 5

Natural log transformation

  • The estimated βj's can be interpreted in terms of approximate proportional differences.

  • For example, suppose β1=0.10, then eβ1=1.1052.

  • Thus, a difference of 1 unit in x1 corresponds to an expected positive difference of approximately 11% in y.

  • Similarly, β1=0.10 implies eβ1=0.9048, which means a difference of 1 unit in x1 corresponds to an expected negative difference of approximately 10% in y.

3 / 5

Natural log transformation

  • The estimated βj's can be interpreted in terms of approximate proportional differences.

  • For example, suppose β1=0.10, then eβ1=1.1052.

  • Thus, a difference of 1 unit in x1 corresponds to an expected positive difference of approximately 11% in y.

  • Similarly, β1=0.10 implies eβ1=0.9048, which means a difference of 1 unit in x1 corresponds to an expected negative difference of approximately 10% in y.

  • When making predictions using the regression of the transformed variable, remember to transform back to the original scale to make your predictions more meaningful.

3 / 5

Other transformations

  • While the natural logarithm transformation is the most common, there are several options.
4 / 5

Other transformations

  • While the natural logarithm transformation is the most common, there are several options.

  • For example, logarithm transformations with other bases, taking squares, taking square roots, etc.

4 / 5

Other transformations

  • While the natural logarithm transformation is the most common, there are several options.

  • For example, logarithm transformations with other bases, taking squares, taking square roots, etc.

  • Which one should you use?
4 / 5

Other transformations

  • While the natural logarithm transformation is the most common, there are several options.

  • For example, logarithm transformations with other bases, taking squares, taking square roots, etc.

  • Which one should you use?
  • Well, it depends on what you are trying to fix.

4 / 5

Other transformations

  • While the natural logarithm transformation is the most common, there are several options.

  • For example, logarithm transformations with other bases, taking squares, taking square roots, etc.

  • Which one should you use?
  • Well, it depends on what you are trying to fix.

  • For linearity, for example, it is possible to need a logarithm transformation on the response variable but a square root transformation on the one of the predictors, to fix violations of linearity and normality.

4 / 5

Other transformations

  • While the natural logarithm transformation is the most common, there are several options.

  • For example, logarithm transformations with other bases, taking squares, taking square roots, etc.

  • Which one should you use?
  • Well, it depends on what you are trying to fix.

  • For linearity, for example, it is possible to need a logarithm transformation on the response variable but a square root transformation on the one of the predictors, to fix violations of linearity and normality.

  • Overall, if you do not know the options to consider, you could try Box-Cox power transformations (to fix non-normality).

4 / 5

Other transformations

  • While the natural logarithm transformation is the most common, there are several options.

  • For example, logarithm transformations with other bases, taking squares, taking square roots, etc.

  • Which one should you use?
  • Well, it depends on what you are trying to fix.

  • For linearity, for example, it is possible to need a logarithm transformation on the response variable but a square root transformation on the one of the predictors, to fix violations of linearity and normality.

  • Overall, if you do not know the options to consider, you could try Box-Cox power transformations (to fix non-normality).

  • We will not spend time on those in this course but I am more than happy to provide resources to anyone who is interested.

4 / 5

Other transformations

  • While the natural logarithm transformation is the most common, there are several options.

  • For example, logarithm transformations with other bases, taking squares, taking square roots, etc.

  • Which one should you use?
  • Well, it depends on what you are trying to fix.

  • For linearity, for example, it is possible to need a logarithm transformation on the response variable but a square root transformation on the one of the predictors, to fix violations of linearity and normality.

  • Overall, if you do not know the options to consider, you could try Box-Cox power transformations (to fix non-normality).

  • We will not spend time on those in this course but I am more than happy to provide resources to anyone who is interested.

  • First, see the boxcox function in R's MASS library.

4 / 5

What's next?

Move on to the readings for the next module!

5 / 5

Transformations

  • As we have already seen, sometimes, we have to deal with data that fail linearity and normality.
2 / 5
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow