# [Notes on Mathematics for ESL] Chapter 4: Linear Methods for Classification

### 4.3 Linear Discriminant Analysis

#### Derivation of Equation (4.9)

For that each class’s density follows multivariate Gaussian

Take the logarithm of $f_k(x)$, we get

where $c = -\log [(2\pi)^{p/2}\lvert\Sigma\rvert^{1/2}]$ and $\mu_k^T\Sigma^{-1}x=x^T\Sigma^{-1}\mu_k$. Following the above formula, we can derive Equation (4.9) easily

#### Notes on Computations for LDA

It’s stated in the book that the LDA classifier can be implemented by the following pair of steps:

• Sphere the data with respect to the common covariance estimate $\hat \Sigma:X^*\leftarrow D^{-1/2}U^TX$, where $\hat \Sigma=UDU^T$. The common covariance estimate of $X^*$ will now be the indentity.
• Classify to the closest class centroid in the transformed space, modulo the effect of the class prior probabilities $\pi_k$.

However, detailed explanation is not given in the book. Here, I give some skipped mathematical steps which may help the understanding.

which shows that the covariance estimate of $X^*$ is the identity.

Note that the classification for LDA is based on the linear discriminat functions

which is the Equation (4.10) in the book. Since the input $x$ is same for each class, so we can add back a term $\frac12x^T\Sigma^{-1}x$ which is cancelled in the previous derivation. Now the functions are turned into:

We know that $\Sigma=I$ in the transformed space, so $\delta_k(x)=-1/2\lVert x-\mu_k\rVert_2+\log\pi_k$. And $\mu_k$ is the centroid for the $k$th class. The claimed method to classify is proved.

### 4.4 Logistic Regression

#### Derivation of Equation (4.21) and (4.22)

In the two-class case, $p_1(x;\beta)=p(x;\beta)$ and $p_2(x;\beta) = 1-p(x;\beta)$ where

The Equation (4.21) can be derived easily as follows,

Note that

Plug it into Equation (4.21), we get