Saturday, February 11, 2012

Convexity

Some things to prove:
  1. The distance to a set $S$ from some vector $\mathbf{x}$, given by $d\left(\mathbf{x}, S\right) = \displaystyle\min_{\mathbf{y}\in S}\,\,\|\mathbf{x} - \mathbf{y}\|$ is convex.
  2. The set of of all symmetric positive definite matrices is convex.
  3. Let $f\left(x\right)=\exp\left(Q\left(x\right)\right)$. Prove that $f$ is log concave if $Q\left(x\right)$ is concave.

To do this, I'm going to employ the use of some tools from Chapter 2 of the book Convex Optimization, by Steven Boyd and Lieven Vandenberghe. One of the great things about this book is that it is very accessible.

First, recall the definition of lines and line segments: If two points $x,y$ in $\mathbb{R}^{n}$ are not equal, then points of the form the form $y = \theta x_1 + \left(1 - \theta\right)x_2$ where $\theta \in \mathbb{R}$ form the line passing through $x_1$ and $x_2$.

This is nice and all, but it's not exactly the most intuitive way to define a line. The authors graciously give another interpretation which I find to be very intuitive: $$y=x_2+\theta\left(x_1-x_2\right).$$ They explain it in an equally intuitive way: "... $y$ is the sum of the base point $x_2$ (corresponding to $\theta = 0$) and the direction $x_1 - x_2$ (which points from $x_2$ to $x_1$) scaled by the parameter $\theta$." (p. 21)

Let's look at this in vector notation. $y$ should actually be $\mathbf{y}$ because we're not talking about scalar values here, we're talking about points/vectors on the $n$-dimensional real line, which have more than one component. The same goes for $x_1$ and $x_2$. So, it should be written $$\mathbf{y} = \mathbf{x}_2 + \theta\left(\mathbf{x}_1 - \mathbf{x}_2\right).$$ It's one of my math pet peeves when people don't use something to indicate whether they are talking about vectors or scalars, even if you define it that way from the start. We're so used to working with symbols like $x$ and $y$ as scalar values, that it's nice to be able to let the paper take care of the notation and not have to think "Am I looking at a scalar or vector here?". When vectors are bold you can immediately see that the data type is a vector, whereas with the other convention $x$ and $y$ could be a scalar or a vector depending on how it was defined at the beginning of the [proof, book, paper, etc.]. OK, rant over!

Assuming you have NumPy and matplotlib installed you can use the following Python code to demonstrate the definition of a line i just described.
from pylab import *

def randbetween(a, b, *size):
    assert a < b, 'a must be less than b'
    for s in size:
        assert isinstance(s, int), \ 
            'the elements of size must all be integers'
    bma = b - a
    return bma * rand(*size) + a

b = 10
a = 1
x1 = randbetween(a, b, 1, 2)
x2 = randbetween(a, b, 1, 2)

# 0 <= theta <= 1
theta = linspace(0, 1, 10000)[newaxis]

# compute the line over theta
y = dot(theta.T, x1) + dot(1 - theta.T, x2)

# plot the points and the line
figure()
hold('on')
plot(x1[0, 0], x1[0, 1], '.', ms=20)
plot(x2[0, 0], x2[0, 1], '.', ms=20)
plot(y[:, 0], y[:, 1])
hold('off')

# show the plots
show()
You can literally just copy this code, then in an IPython console type: paste. This will run all of the code I just listed and should result in a nice plot. Obviously for $\theta > 1$ and $\theta < 0$ the line will extend beyond $\mathbf{x}_1$ and beyond $\mathbf{x}_2$ respectively. What's more, you can rewrite the second equation in vector-matrix form. Given two $k$-dimensional row vectors $\mathbf{x}_1$ and $\mathbf{x}_2$ and given a column vector $\boldsymbol\theta = \left(\theta_1, \theta_2, \dotsc, \theta_n\right)$ where $\theta_1 = 0$ and $\theta_n = 1$ and monotonically increasing elements, then a line in $\mathbb{R}^{n}$ can be written as $$\mathbf{Y} = \boldsymbol\theta\cdot\mathbf{x}_1 + \left(\mathbf{1}_n - \boldsymbol\theta\right)\cdot\mathbf{x}_2,$$ where $\mathbf{1}_n$ is an $n\times{1}$ vector with each element equal to 1, and $\mathbf{Y}$ is an $n\times{k}$ matrix with each column being the points of the line in the $k$th dimension. The next important concept is the notion of an affine set. A set $C\subseteq{\mathbb{R}^{n}}$ is said to be affine if the line through any points in $C$ not equal to each other is in $C$. Said differently, "$C$ contains the linear combinations of any two points in $C$, provided the coefficients in the linear combination sum to one (Boyd & Vandenberghe)." (p. 22). In symbolic terms, $$x_1, x_2\in{C}\textrm{ and }\theta\in{\mathbb{R}}\Rightarrow{\theta{x_1}+\left(1-\theta\right)x_2\in{C}}$$ OK, now on to convex sets. The definition is exactly the same as an affine set except that $\theta$ is bounded by 0 and 1. A set $C$ is convex if, $$x_1,x_2\in{C}\textrm{ and }\theta\in\left[0,1\right]\Rightarrow{\theta{x_1}+\left(1-\theta\right)x_2\in{C}}$$ Let's now define a convex function. A function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is said to be convex if the domain of $f$ is a convex set and if for all $x$, $y$ is an element of the domain of $f$, and $\theta$ with $0\leq\theta\leq{1}$ we have, $$f\left(\theta x + \left(1-\theta\right)y\right)\leq\theta f\left(x\right) + \left(1-\theta\right)f\left(y\right).$$ Strict convexity is when $x\neq{y}$ and $0\lt\theta\lt{1}$ and concavity is when $-f$ is convex, with strict concavity if $-f$ is strictly convex.

Let's use the definition of convexity to show that $d$ is convex.

The triangle inequality and homogeneity show us that $$f\left(\theta\mathbf{x} + \left(1-\theta\right)\mathbf{y}\right)\leq f\left(\theta\mathbf{x}\right) + f\left(\left(1-\theta\right)\mathbf{y}\right)=\theta f\left(\mathbf{x}\right) + \left(1-\theta\right)f\left(\mathbf{y}\right)$$ is true. Then substituting $d$ for $f$ allows us to see that $d$ is convex.

In my next few posts I'll show the 2 and 3 parts of the list at the beginning of this post.

No comments:

Post a Comment