Code in R for Coursera-ML and CMPUT466/551 in University of Alberta
This post refers to programming exercise 1 in Coursera-ML. All the code can be found in this repo.
setwd("~/Coding/R/CMPUT466/mlclass-ex1")
library(ggplot2)
library(MASS)
Linear regression with one variable
Plotting the Data
Use a scatter plot to visualize the data.
data <- read.csv("ex1data1.txt", header=FALSE)
ggplot(data, aes(V1, V2, col="Training data")) +
geom_point(shape=I(4), size=I(3)) +
labs(x="Population of City in 10,000s", y="Profit in $10,000s") +
scale_color_discrete(guide=FALSE)
Gradient Descent
Compute the cost function:
\[J(\theta)=\frac1{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2\]
where the hypothesis \(h_\theta(x)\) is given by the linear model
\[h_{\theta)}(x)=\theta^T x\]
computeCost = function(X, y, theta) {
m = length(X)
return(sum(((X %*% theta) - y)^2)/(2*m))
}
X <- as.matrix(data$V1)
X <- cbind(1, X)
y <- as.matrix(data$V2)
theta <- matrix(0, 2, 1)
computeCost(X, y, theta)
## [1] 16.03637
Use batch gradient descent algorithm to minimize cost function. In each iteration:
\[\theta_j = \theta_j - \alpha\frac1m\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}\]
iterations <- 1500
alpha <- 0.01
m <- length(y)
J_history <- matrix(0, iterations, 1)
for(i in 1:iterations) {
theta <- theta - alpha/m*t(X) %*% (X %*% theta - y)
J_history[i] <- computeCost(X, y, theta)
}
ggplot(data, aes(V1, V2, col="Training data")) +
geom_point(shape=I(4), size=I(3)) +
labs(x="Population of City in 10,000s", y="Profit in $10,000s") +
geom_line(aes(X[,2], X %*% theta, col="Linear regression")) +
scale_color_manual(name="", values=c("Blue", "Red"))
Linear regression with multiple variables
Feature Normalization
When features differ by orders of magnitude, performing feature scaling can make gradient descent converge much more quickly. The scale
function in R can be useful in this task.
data <- read.csv("ex1data2.txt", header=FALSE)
X <- as.matrix(data[,c("V1", "V2")])
y <- as.matrix(data$V3)
m <- length(y)
X <- scale(X)
mu <- attr(X, "scaled:center")
sigma <- attr(X, "scaled:scale")
X <- cbind(1, X)
Gradient Descent
Computing the cost function and the gradient descent algorithm are similar to the univariate case. Four different learning rates are selected to see the rate of convergence.
alpha = c(0.01, 0.03, 0.1, 0.3)
num_iters = 400
theta = matrix(0, 3, 4)
J_history <- matrix(0, num_iters, 4)
for(i in 1:num_iters) {
for(j in 1:4) {
theta[,j] <- theta[,j] - alpha[j]/m*t(X) %*% (X %*% theta[,j] - y)
J_history[i, j] <- computeCost(X, y, theta[,j])
}
}
df <- data.frame(x=1:num_iters, y=J_history)
ggplot(df, aes(x)) + labs(x="Number of iterations", y="Cost J") +
geom_line(aes(y=y.1, col="0.01")) +
geom_line(aes(y=y.2, col="0.03")) +
geom_line(aes(y=y.3, col="0.1")) +
geom_line(aes(y=y.4, col="0.3")) +
scale_color_discrete(name="Learning Rate")
Use \(\theta\) that we’ve found to predict the price of a house with \(1650\) square feet and 3 bedrooms. Remember to do the feature normalization first!
X_predict <- (c(1650., 3.) - mu)/sigma
price <- c(1, X_predict) %*% theta
price
## [,1] [,2] [,3] [,4]
## [1,] 289314.6 293150 293081.5 293081.5
Normal Equations
The closed-form solution to linear regression is:
\[\theta=(X^T X)^{-1}X^T y\]
data <- read.csv('ex1data2.txt', header=FALSE)
X <- as.matrix(data[, c("V1", "V2")])
y <- as.matrix(data$V3)
m <- length(y)
X <- cbind(1, X)
theta = ginv(t(X) %*% X, tol=0) %*% t(X) %*% y
price <- c(1, 1650, 3) %*% theta
price
## [,1]
## [1,] 293081.5
The predicted price is almost the same as the price obtained using gradient descent.