Billy Ian's Short Leisure-time Wander

into language, learning, intelligence and beyond

Notes on Convex Optimization (3): Unconstrained Minimization Problems

| Comments

Unconstrained optimization problems are defined as follows:

\begin{equation} \text{minimize}\quad f(x) \tag{1} \label{eq:1} \end{equation}

where $f: \mathbf{R}^n \rightarrow \mathbf{R}$ is convex and twice continously differentiable (which implies that $\mathbf{dom}\enspace f$ is open). We denote the optimal value $\inf_xf(x)=f(x^\ast)$, as $p^\ast$. Since $f$ is differentiable and convex, a necessary and sufficient condition for a point $x^\ast$ to be optimal is

\begin{equation} \nabla f(x^\ast)=0 \tag{2} \label{eq:2} \end{equation}

. Thus, solving the unconstrained minimization problem \eqref{eq:1} is the same as finding a solution of \eqref{eq:2}, which is a set of $n$ equations in the $n$ variables $x_1, \dots, x_n$. Usually, the problem must be solved by an iterative algorithm. By this we mean an algorithm that computes a sequence of points $x^{(0)}, x^{(1)}, \dots \in \mathbf{dom}\enspace f$ with $f(x^{(k)})\rightarrow p^\ast$ as $k\rightarrow\infty$. The algorithm is terminated when $f(x^{k}) - p^\ast \le \epsilon$, where $\epsilon>0$ is some specified tolerance.

Initial point and sublevel set

The starting point $x^{(0)}$ must lie in $\mathbf{dom}\enspace f$, and in addition the sublevel set

must be closed. This condition is satisfied for all $x^{(0)}\in\mathbf{dom}\enspace f$ if the function $f$ is closed.

Note: 1) Continuous functions with $\mathbf{dom}\enspace f=\mathbf{R}^n$ are closed; 2) Another important class of closed functions are continuous functions with open domains.

Examples

Quadratic minimization and least-squares

The general convex quadratic minimization problem has the form

where $P\in\mathbf{S}_+^n$, $q\in\mathbf{R}^n$, and $r\in\mathbf{R}$. This problem can be solved via the optimality conditions, $Px+q=0$, which is a set of linear equations.

One special case of the quadratic minimization problem that arises very frequently is the least-squares problem

The optimality condition

are called the normal equations of the least-squares problem.

Unconstrained geometric programming

Analytic center of linear inequalities

where the domain of $f$ is the open set

Analytic center of a linear matrix inequality

where $F: \mathbf{R}^n\rightarrow\mathbf{S}^p$ is affine. Here the domain of $f$ is

Strong convexity and implications

The objective function is strongly convex on $S$, which means that there exists an $m>0$ such that

\begin{equation} \nabla^2f(x) \succeq mI \tag{3} \label{eq:3} \end{equation}

for all $x\in S$. For $x, y \in S$ we have

for some $z$ on the line segement $[x, y]$. By the strong convexity assumption \eqref{eq:3}, the last term on the righthand side is at least $(m/2)\lVert y-x\rVert^2_2$, so we have the inequality

\begin{equation} f(y) \ge f(x) + \nabla f(x)^T(y-x) + \frac{m}2\lVert y-x \rVert_2^2 \tag{4} \label{eq:4} \end{equation}

for all $x$ and $y$ in $S$.

Bound $f(x)-p^\ast$ in terms of $\lVert \nabla f(x) \rVert_2$

Setting the gradient of the righthand side of \eqref{eq:4} with respect to $y$ equal to zero, we find that $\tilde y = x-(1/m)\nabla f(x)$ minimizes the righthand side. Therefore we have

Since this holds for any $y\in S$, we have

\begin{equation} p^* \ge f(x) - \frac1{2m}\lVert \nabla f(x)\rVert_2^2 \tag{5} \label{eq:5} \end{equation}

This inequality shows that if the gradient is small at a point, then the point is nearly optimal.

Bound $\lVert x-x^\ast\rVert_2^2$ in terms of $\lVert \nabla f(x) \rVert_2$

Apply \eqref{eq:4} with $y=x^\ast$ to obtain

where we use the Cauchy-Schwarz inequality in the second inequality. Since $p^\ast \le f(x)$, we must have

Therefore, we have

\begin{equation} \lVert x - x^\ast \rVert_2 \le \frac2m\lVert \nabla f(x) \rVert_2. \tag{6}\label{eq:6} \end{equation}

Uniqueness of the optimal point $x^\ast$

If there are two optimal point $x^\ast_1, x^\ast_2$, according to \eqref{eq:6},

Hence, $x_1^\ast = x_2^\ast$, the optimal point $x^\ast$ is unique.

Upper bound on $\nabla^2f(x)$

There exists a constant $M$ such that

for all $x \in S$. This upper bound on the Hessian implies for any $x, y \in S$,

minimizing each side over $y$ yields

Condition number of sublevel sets

The ratio $\kappa=M/m$ is an upper bound on the condition number of the matrix $\nabla^2 f(x)$, i.e., the ratio of its largest eigenvalue to its smallest eigenvalue.

We define the width of a convex set $C \subseteq \mathbf{R}^n$, in the direction $q$, where $\lVert q \rVert_2 = 1$, as

The minimum width and maximum width of $C$ are given by

The condition number of the convex set $C$ is defined as

Suppose $f$ satisfies $mI \preceq \nabla^2 f(x) \preceq MI$ for all $x\in S$. The condition number of the $\alpha$-sublevel $C_\alpha={x \vert f(x) \le \alpha}$, where $p^\ast < \alpha \le f(x^{(0)})$, is bounded by

The strong convexity constants

It must be kept in mind that the constants $m$ and $M$ are known only in rare cases, so they cannot be used in a practical stopping criterion.

Comments