Hello, World: February 2012

Sunday, February 26, 2012

Definition of a Confidence Interval

"95% of intervals of the form $$\left({\overline X} - \frac{2\sigma}{\sqrt{n}},{\overline X} + \frac{2\sigma}{\sqrt{n}}\right),$$ which are based on samples of size $n$ each, will contain the mean of the normal density in their interior."

Saturday, February 11, 2012

Convexity

Some things to prove:

The distance to a set $S$ from some vector $\mathbf{x}$, given by $d\left(\mathbf{x}, S\right) = \displaystyle\min_{\mathbf{y}\in S}\,\,\|\mathbf{x} - \mathbf{y}\|$ is convex.
The set of of all symmetric positive definite matrices is convex.
Let $f\left(x\right)=\exp\left(Q\left(x\right)\right)$. Prove that $f$ is log concave if $Q\left(x\right)$ is concave.

To do this, I'm going to employ the use of some tools from Chapter 2 of the book Convex Optimization, by Steven Boyd and Lieven Vandenberghe. One of the great things about this book is that it is very accessible.

First, recall the definition of lines and line segments: If two points $x,y$ in $\mathbb{R}^{n}$ are not equal, then points of the form the form $y = \theta x_1 + \left(1 - \theta\right)x_2$ where $\theta \in \mathbb{R}$ form the line passing through $x_1$ and $x_2$.

This is nice and all, but it's not exactly the most intuitive way to define a line. The authors graciously give another interpretation which I find to be very intuitive: $$y=x_2+\theta\left(x_1-x_2\right).$$ They explain it in an equally intuitive way: "... $y$ is the sum of the base point $x_2$ (corresponding to $\theta = 0$) and the direction $x_1 - x_2$ (which points from $x_2$ to $x_1$) scaled by the parameter $\theta$." (p. 21)

Let's look at this in vector notation. $y$ should actually be $\mathbf{y}$ because we're not talking about scalar values here, we're talking about points/vectors on the $n$-dimensional real line, which have more than one component. The same goes for $x_1$ and $x_2$. So, it should be written $$\mathbf{y} = \mathbf{x}_2 + \theta\left(\mathbf{x}_1 - \mathbf{x}_2\right).$$ It's one of my math pet peeves when people don't use something to indicate whether they are talking about vectors or scalars, even if you define it that way from the start. We're so used to working with symbols like $x$ and $y$ as scalar values, that it's nice to be able to let the paper take care of the notation and not have to think "Am I looking at a scalar or vector here?". When vectors are bold you can immediately see that the data type is a vector, whereas with the other convention $x$ and $y$ could be a scalar or a vector depending on how it was defined at the beginning of the [proof, book, paper, etc.]. OK, rant over!

Assuming you have NumPy and matplotlib installed you can use the following Python code to demonstrate the definition of a line i just described.

from pylab import *

def randbetween(a, b, *size):
    assert a < b, 'a must be less than b'
    for s in size:
        assert isinstance(s, int), \ 
            'the elements of size must all be integers'
    bma = b - a
    return bma * rand(*size) + a

b = 10
a = 1
x1 = randbetween(a, b, 1, 2)
x2 = randbetween(a, b, 1, 2)

# 0 <= theta <= 1
theta = linspace(0, 1, 10000)[newaxis]

# compute the line over theta
y = dot(theta.T, x1) + dot(1 - theta.T, x2)

# plot the points and the line
figure()
hold('on')
plot(x1[0, 0], x1[0, 1], '.', ms=20)
plot(x2[0, 0], x2[0, 1], '.', ms=20)
plot(y[:, 0], y[:, 1])
hold('off')

# show the plots
show()

You can literally just copy this code, then in an IPython console type: paste. This will run all of the code I just listed and should result in a nice plot. Obviously for $\theta > 1$ and $\theta < 0$ the line will extend beyond $\mathbf{x}_1$ and beyond $\mathbf{x}_2$ respectively. What's more, you can rewrite the second equation in vector-matrix form. Given two $k$-dimensional row vectors $\mathbf{x}_1$ and $\mathbf{x}_2$ and given a column vector $\boldsymbol\theta = \left(\theta_1, \theta_2, \dotsc, \theta_n\right)$ where $\theta_1 = 0$ and $\theta_n = 1$ and monotonically increasing elements, then a line in $\mathbb{R}^{n}$ can be written as $$\mathbf{Y} = \boldsymbol\theta\cdot\mathbf{x}_1 + \left(\mathbf{1}_n - \boldsymbol\theta\right)\cdot\mathbf{x}_2,$$ where $\mathbf{1}_n$ is an $n\times{1}$ vector with each element equal to 1, and $\mathbf{Y}$ is an $n\times{k}$ matrix with each column being the points of the line in the $k$th dimension. The next important concept is the notion of an affine set. A set $C\subseteq{\mathbb{R}^{n}}$ is said to be affine if the line through any points in $C$ not equal to each other is in $C$. Said differently, "$C$ contains the linear combinations of any two points in $C$, provided the coefficients in the linear combination sum to one (Boyd & Vandenberghe)." (p. 22). In symbolic terms, $$x_1, x_2\in{C}\textrm{ and }\theta\in{\mathbb{R}}\Rightarrow{\theta{x_1}+\left(1-\theta\right)x_2\in{C}}$$ OK, now on to convex sets. The definition is exactly the same as an affine set except that $\theta$ is bounded by 0 and 1. A set $C$ is convex if, $$x_1,x_2\in{C}\textrm{ and }\theta\in\left[0,1\right]\Rightarrow{\theta{x_1}+\left(1-\theta\right)x_2\in{C}}$$ Let's now define a convex function. A function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is said to be convex if the domain of $f$ is a convex set and if for all $x$, $y$ is an element of the domain of $f$, and $\theta$ with $0\leq\theta\leq{1}$ we have, $$f\left(\theta x + \left(1-\theta\right)y\right)\leq\theta f\left(x\right) + \left(1-\theta\right)f\left(y\right).$$ Strict convexity is when $x\neq{y}$ and $0\lt\theta\lt{1}$ and concavity is when $-f$ is convex, with strict concavity if $-f$ is strictly convex.

Let's use the definition of convexity to show that $d$ is convex.

The triangle inequality and homogeneity show us that $$f\left(\theta\mathbf{x} + \left(1-\theta\right)\mathbf{y}\right)\leq f\left(\theta\mathbf{x}\right) + f\left(\left(1-\theta\right)\mathbf{y}\right)=\theta f\left(\mathbf{x}\right) + \left(1-\theta\right)f\left(\mathbf{y}\right)$$ is true. Then substituting $d$ for $f$ allows us to see that $d$ is convex.

In my next few posts I'll show the 2 and 3 parts of the list at the beginning of this post.

Uninstall All Ruby Gems

Today I ran into an issue when trying to do a nice Rails tutorial. Having two versions of Ruby installed on my machine was (I think) causing some conflicts. At best, it was confusing me. So, here's how to uninstall all Ruby gems:

gem list | cut -d" " -f1 | xargs gem uninstall -aIx

Tuesday, February 7, 2012

Gamma

I was working on some mathematical statistics problems and I came across a slick estimation problem that I thought I would share.

Problem:
Given $$\left(f\mid k\right) = \frac{x^{k-1}\exp\left(-x/\theta\right)}{\Gamma\left(k\right)\cdot\theta^{k}},\,\,\,\textrm{ where } x \gt 0,\,\,\, k \gt 0,\,\,\, \theta = 1,$$Determine whether it is possible to find a value of $c$ such that $d\left(X\right) = cX^{2}$ will be an unbiased estimator of $k$.

My attempt at a solution:

$$E\left[c\cdot X^{2}\right] = k$$ for the left hand side to be an unbiased estimator of $k$. $$c\cdot E\left[X^{2}\right] = k$$ is true via the properties of $E\left[\cdot\right]$. Rearranging terms we can see that $$c = \frac{k}{E\left[X^{2}\right]}$$ $$c = k\cdot\left(\int_{0}^{\infty}x^{2}\frac{x^{k-1}e^{-x}}{\Gamma\left(k\right)}\mathrm{d}\,\,x\right)^{-1} = k\cdot\left(\int_{0}^{\infty}\frac{x^{k+1}e^{-x}}{\Gamma\left(k\right)}\mathrm{d}\,\,x\right)^{-1}$$ $$k\cdot\Gamma\left(k\right)\cdot\left(\int_{0}^{\infty}x^{k+1}e^{-x}\mathrm{d}\,\,x\right)^{-1} = \frac{\Gamma\left(k+1\right)}{k\cdot\Gamma\left(k+1\right)}=\frac{1}{k}.$$ So the value of $c$ such that $E\left[cX^{2}\right]$ is NOT an unbiased estimator of $k$ is $1/k$ because by definition, an unbiased estimator of some parameter $\theta$ cannot depend on the value of $\theta$.

Saturday, February 4, 2012

Setting up Git for my website

Just wanted to log the process of setting up the Git version control system for editing the code of my website.

Local Machine

Make a directory for your code
Run git init inside of it
Add some content and commit it

Server

Password stuff

Make an SSH public encryption key if you haven't already

ssh -t rsa -C 'someone@somewebsite.com'
Follow the instructions it spits out
Give a password if you want

ssh -p 2222 me@myserver.com 'mkdir -p .ssh'

You'll have to enter a password here

cat ~/.ssh/id_rsa.pub | ssh -p 2222 me@myserver.com 'cat >> .ssh/authorized_keys'

Git stuff

ssh -p 2222 me@myserver.com
cd public_html && mkdir website.git && cd website.git && git init --bare
cd hooks && cp post-receive.sample post-receive
echo 'GIT_WORK_TREE=/dir/where/you/want/to/store/code git checkout -f' >> post-receive

Local Machine

git remote add website ssh://ip.add.re.ss:portnumber/~/public_html/website.git
git push -u website +master:refs/heads/master && git push

Now, if I want to edit the code of my website on my local machine I can do so and have changes pushed to the website whenever I want to. I just have to be careful not to push breaking changes. Time to get intimate with Git branches!

Funny

Saw this picture while trolling some links from Hacker News.

#4 reminds me of some arguments I've had...

Wednesday, February 1, 2012

Advanced Topics in Statistics: Day 2

Time for some more probability review and some new material on classes of distributions. Chapter numbers refer to Casella and Berger (2002).

More review of probability:
2.2: Expected Values: *Very Important*
$$E\left[g\left(X\right)\right] = \int_{-\infty}^{\infty}g\left(x\right)f_{X}\left(x\right)\,\,\mathrm{d}x,$$ when $X$ is a continuous random variable.
$$E\left[g\left(X\right)\right] = \sum_{x}g\left(x\right)f_{X}\left(x\right),$$ when $X$ is a discrete random variable.
Properties of $E\left[\cdot\right]$:

$E\left[a\right] = a, a\in\mathbb{R}$
$E\left[ag_{1}\left(X_{1}\right) + bg_{2}\left(X_{2}\right) + c\right] = aE\left[g_{1}\left(X_{1}\right)\right] + bE\left[g_{2}\left(X_{2}\right)\right] + c$
If $g_{1}\left(x\right)\le g_{2}\left(x\right)\le \cdots \le g_{n}\left(x\right), \forall x,$ then $E\left[g_{1}\left(x\right)\right]\le E\left[g_{2}\left(x\right)\right]\le\cdots\le E\left[g_{n}\left(x\right)\right]$.

Remark I
In measure theoretic notation the generalized expected value is, $$E\left[x\right] = \int_{\omega\in\Omega}X\left(\omega\right)\,\,\mathrm{d}P\,\left(\omega\right).$$ I'm not sure exactly what this means; I'll have to come back to it later.

Remark II
What about interchanging sums and integrals?$\newcommand{\?}{\stackrel{?}{=}}$
$$\begin{eqnarray*}
\int\int f\left(x,y\right)\,\,\mathrm{d}x\,\mathrm{d}y&\?&\int\int f\left(x,y\right)\,\,\mathrm{d}y\,\,\mathrm{d}x\\
\sum_{j}\sum_{k}a_{jk}&\?&\sum_{k}\sum_{j}a_{jk}\\
\sum_{j}\left(\int f_{j}\left(x\right)\,\,\mathrm{d}x\right) &\?& \int\left(\sum_{j}f_{j}\left(x\right)\right)\,\,\mathrm{d}x
\end{eqnarray*}$$
For example, $a \ge 0, f \ge 0, f_{j} \ge 0$ then it works. Or, take the absolute values inside the sum (integral), and show that one of the sides is finite.

More to follow on classes of distributions.

Here's how to mount and unmount an ISO image in Linux, from Bash.

Mounting:
sudo mkdir /media/example sudo mount -o loop example.iso /media/example

Unmounting:
sudo umount /media/example sudo rmdir /media/example

Basic Probability

Approximately 1/3 of all human twins are identical (one-egg) and 2/3 are fraternal (two-egg) twins. Identical twins are necessarily the same sex, with male and female being equally likely. Among fraternal twins, approximately one-fourth are both female, one-fourth are both male, and half are one male and one female. Finally, among all U.S. births, approximately 1 in 90 is a twin birth. Define the following events:

$A = \left\{\textrm{a U.S. birth results in twin females}\right\}$
$B = \left\{\textrm{a U.S. birth results in identical twins}\right\}$
$C = \left\{\textrm{a U.S. birth results in twins}\right\}$

(a) State in words, the event $A\cap{B}\cap{C}$.

A U.S. birth results in identical twin females.

(b) Find $P\,\,\left(\!A\cap{B}\cap{C}\,\right)$.

$P\,\,\left(\!A\cap{B}\cap{C}\,\right) = \frac{1}{90}\cdot \frac{1}{3}\cdot \frac{1}{2}=\frac{1}{540}$