[Notes on Mathematics for ESL] Chapter 10: Boosting and Additive Trees

10.5 Why Exponential Loss?

Derivation of Equation (10.16)

Since $Y\in{-1,1}$, we can expand the expectation as follows:

In order to minimize the expectation, we equal derivatives w.r.t. $f(x)$ as zero:

which gives:

Notes on Equation (10.18)

If $Y=1$, then $Y’=1$, which gives

Likewise, if $Y=-1$, then $Y’=0$, which gives

As a result, the binomial log-likelihood loss is equivalent to the deviance. In the language of neural networks, the cross-entropy is equivalent to the softplus. The only difference is that $0$ is used to indicate negative examples in cross-entropy; while $-1$ is used in softplus.

10.6 Loss Functions and Robustness

This section explains the choice of loss functions for both classification and regression. It gives a very direct expalanation about why square loss is undesirable for classification. Highly recommended!