4.3 Linear Discriminant Analysis
Derivation of Equation (4.9)
For that each class’s density follows multivariate Gaussian
Take the logarithm of $f_k(x)$, we get
where $c = -\log [(2\pi)^{p/2}\lvert\Sigma\rvert^{1/2}]$ and $\mu_k^T\Sigma^{-1}x=x^T\Sigma^{-1}\mu_k$. Following the above formula, we can derive Equation (4.9) easily
Notes on Computations for LDA
It’s stated in the book that the LDA classifier can be implemented by the following pair of steps:
- Sphere the data with respect to the common covariance estimate $\hat \Sigma:X^*\leftarrow D^{-1/2}U^TX$, where $\hat \Sigma=UDU^T$. The common covariance estimate of $X^*$ will now be the indentity.
- Classify to the closest class centroid in the transformed space, modulo the effect of the class prior probabilities $\pi_k$.
However, detailed explanation is not given in the book. Here, I give some skipped mathematical steps which may help the understanding.
which shows that the covariance estimate of $X^*$ is the identity.
Note that the classification for LDA is based on the linear discriminat functions
which is the Equation (4.10) in the book. Since the input $x$ is same for each class, so we can add back a term $\frac12x^T\Sigma^{-1}x$ which is cancelled in the previous derivation. Now the functions are turned into:
We know that $\Sigma=I$ in the transformed space, so $\delta_k(x)=-1/2\lVert x-\mu_k\rVert_2+\log\pi_k$. And $\mu_k$ is the centroid for the $k$th class. The claimed method to classify is proved.
4.4 Logistic Regression
Derivation of Equation (4.21) and (4.22)
In the two-class case, $p_1(x;\beta)=p(x;\beta)$ and $p_2(x;\beta) = 1-p(x;\beta)$ where
The Equation (4.21) can be derived easily as follows,
Note that
Plug it into Equation (4.21), we get