Billy Ian's Short Leisure-time Wander

into language, learning, intelligence and beyond

[Notes on Mathematics for ESL] Chapter 4: Linear Methods for Classification

| Comments

4.3 Linear Discriminant Analysis

Derivation of Equation (4.9)

For that each class’s density follows multivariate Gaussian

Take the logarithm of $f_k(x)$, we get

where $c = -\log [(2\pi)^{p/2}\lvert\Sigma\rvert^{1/2}]$ and $\mu_k^T\Sigma^{-1}x=x^T\Sigma^{-1}\mu_k$. Following the above formula, we can derive Equation (4.9) easily

Notes on Computations for LDA

It’s stated in the book that the LDA classifier can be implemented by the following pair of steps:

  • Sphere the data with respect to the common covariance estimate $\hat \Sigma:X^*\leftarrow D^{-1/2}U^TX$, where $\hat \Sigma=UDU^T$. The common covariance estimate of $X^*$ will now be the indentity.
  • Classify to the closest class centroid in the transformed space, modulo the effect of the class prior probabilities $\pi_k$.

However, detailed explanation is not given in the book. Here, I give some skipped mathematical steps which may help the understanding.

which shows that the covariance estimate of $X^*$ is the identity.

Note that the classification for LDA is based on the linear discriminat functions

which is the Equation (4.10) in the book. Since the input $x$ is same for each class, so we can add back a term $\frac12x^T\Sigma^{-1}x$ which is cancelled in the previous derivation. Now the functions are turned into:

We know that $\Sigma=I$ in the transformed space, so $\delta_k(x)=-1/2\lVert x-\mu_k\rVert_2+\log\pi_k$. And $\mu_k$ is the centroid for the $k$th class. The claimed method to classify is proved.

4.4 Logistic Regression

Derivation of Equation (4.21) and (4.22)

In the two-class case, $p_1(x;\beta)=p(x;\beta)$ and $p_2(x;\beta) = 1-p(x;\beta)$ where

The Equation (4.21) can be derived easily as follows,

Note that

Plug it into Equation (4.21), we get