### 10.5 Why Exponential Loss?

#### Derivation of Equation (10.16)

Since $Y\in{-1,1}$, we can expand the expectation as follows:

In order to minimize the expectation, we equal derivatives w.r.t. $f(x)$ as zero:

which gives:

#### Notes on Equation (10.18)

If $Y=1$, then $Y’=1$, which gives

Likewise, if $Y=-1$, then $Y’=0$, which gives

As a result, the *binomial log-likelihood loss* is equivalent to the *deviance*. In the language of neural networks, the *cross-entropy* is equivalent to the *softplus*. The only difference is that $0$ is used to indicate negative examples in *cross-entropy*; while $-1$ is used in *softplus*.

### 10.6 Loss Functions and Robustness

This section explains the choice of loss functions for both classification and regression. It gives a very direct expalanation about why square loss is undesirable for classification. Highly recommended!