# Hamming Codes III

We are now in a position to actually write down the Hamming (7,4) code.  As explained in the previous three entries, we want some way of both detecting that an error occurred and of correcting that error.  As with many problems, this is much easier once you know it is possible (thanks, Richard Hamming!)  In particular, last time we proved that it is necessary to send a 7-bit message in our scheme of correcting errors for a 4-bit message, but is it sufficient?  An easy way to deduce the solution in this case (and then to see the pattern that proves the general case) is to require that our parity check detects the location of any 1-bit error.

Specifically, someone will receive a (possibly corrupt) 7-bit string v, and we want a matrix that will output 0 if all is well, or communicate a number 1-7 if one of the bits is corrupt.  It takes 3 bits to communicate 8 numbers (8 = 23), so our parity matrix H (following wikipedia’s notation) must be 3 x 7.  To make it easy to remember, we’ll even have column j be the binary representation for j.  More directly:

I am so fed up with wordpress’ latex compiler…

Now we can work backwards (again, we’re assuming an answer exists),  and for reasons that may be clear later, we’ll set our three parity bits to be the three “singleton” columns of H, so that the “coded” message v = (p1,p2,d1,p3,d2,d3,d4).  Then if everything goes smoothly, we have that Hv0, so that

0 = p1+d1+d2+d4
0 = p2+d1+d3+d4
0 = p3+d2+d3+d4.

Notice also that if one bit gets corrupted, this is equivalent to sending the message v+ej, and
G(v+ej) = 0+gj,
where gj is the jth column of G (which is the binary representation of the number j). Hence multiplying a message with a 1-bit mistake gives us the index of the corrupt bit.
But this tells us how we must encode our message m = (d1,d2,d3,d4) as well.  We want a matrix G so that G= (p1,p2,d1,p3,d2,d3,d4).   But the above gives us a linear condition for what this matrix must look like (and an explanation for why the parity bits are all “singletons”).

Finally we want to “decode” our message, which is also straightforward at this point, since it will just be the matrix which returns the non-parity bits from the encoded message.

As a review, and to wrap everything up:

2. Transmit the message v = Gm = (0,0,1,1,0,0,1)
3. Check the parity by confirming that Hv = (0,0,0).
4. Decode the message Rv = (1,0,0,1), as desired.

Wikipedia’s also got an explanation involving Venn diagrams which I did not much like, though I may write a bit about Venn diagrams themselves in the future…

# Hamming Error-Correcting Codes

This is part 1 in hopefully a series on the Hamming error-correcting codes, to be continued on Friday.  The problem is this:  suppose you wish to send a message composed entirely of 0’s and 1’s, but suspect part of it might get “corrupted”.  More concretely, I am going to send you the message

01101,

but there’s a chance that you receive instead something like 00101 (so the second bit was flipped), or 01111 (so the fourth bit was flipped).  Is there any way to control for this?  An idea that is both natural and wrong is to send two copies of the message.  In this case, I’d send along

0110101101.

Now if a bit gets flipped (note that there are now more bits that could be flipped), you can see exactly where — perhaps you receive

0010101101,
where the non-matching bits are highlighted.  The problem here is that you cannot tell whether the offending bit should be interpreted as a 0 or a 1 (which might be ok if you are only interested in error detection).  But if we want to be able to correct the error, we’ll have to be a little more clever.  As a very broad outline of our strategy, we are going to take each of the symbols we would like to transmit and encode them into a “big enough” space so that each symbol has a buffer zone around it to account for any errors in transmission.

In some sense, this is a familiar strategy: on touchscreen keyboards you do not have to hit the exact spot for a key to work, only be close enough.  In this case, the “symbols” I’m referring to are letters, and the “buffer zone” is the area of the screen you can touch so the computer recognizes that you are trying to touch this particular letter.

The trick here (and what is sort of confusing about this) is that the symbols we wish to transmit are bits, and the space that we will be embedding our symbols into will also be bits (rather than a shiny retina display!) As a hint of things to come, I’ve displayed below a representation of the space of 2-bit strings (which will not be big enough to detect 1-bit errors), and a representation of the space of 3-bit strings, which is of perfect size to detect 1-bit errors in a 1 bit message.

All 2-bit strings, with a line connecting them if the distance between them is 1, so you can get to one from the other by switching a single bit.

All 3-bit strings, connected whenever the distance between them is 1. Notice that the points (0,0,0) and (1,1,1) have disjoint balls of radius 1 around them. This will be the basis of the error correcting code, by agreeing that anything received in the red “ball” will be interpreted as the symbol (0,0,0), and anything received in the blue “ball” with be interpreted as (1,1,1). Then this can be turned into a 1 bit error correcting code.

# Volume of Balls

I like the following question:

“Which dimension has the ‘biggest’ unit ball?”

To define a few words, a “ball” of radius r about a point x in a metric space with metric d (in particular, in $R^n$ with standard metric) is defined by $B_X(x,r) = \{ y \in X: d(x,y) \leq r \}$.  So the unit ball in Euclidean space is the collection of points of distance less or equal to one from, say, the origin.  If you think I am being overly careful with this definition, you’re right!

1-, 2-, and 3-dimensional balls.

By “biggest”, I will mean the n-dimensional volume.  So if we define V(n) to be the volume of the unit ball in $R^n$, then V(1) = 2 (the 1-dimensional ball is a line), $V(2) = \pi r^2 \approx 3.14$, and $V(3) = \frac{4}{3} \pi r^3 \approx 4.19$.  So the volumes are getting bigger, and maybe we have a good question on our hands.  But now what’s the volume of a four dimensional sphere?

It turns out that one can derive the formula

$V(n) = \frac{\pi^{n/2}}{\Gamma(n/2+1)}$,

where $\Gamma$ is the gamma function, a generalization of the factorial (!) function!  In particular, $\Gamma(n+1) = n!$ for integers n.  Then, for even dimensions,

$V(2n) = \frac{\pi^{n}}{n!}$.

Now so long as we are only worried about integer dimensions, we can use the half-integer identity for the gamma function:
$\Gamma(n+1/2) = \frac{(2n)!}{4^nn!}\sqrt{\pi},$
to get a similar formula for odd dimensions:
$V(2n-1) = \frac{4^n\pi^{n-1}n!}{(2n)!}$.

Then a quick calculation gives:

V(1) = 2.00
V(2) = 3.14
V(3) = 4.19
V(4) = 4.93
V(5) = 5.26
V(6) = 5.17
V(7) = 4.72
V(8) = 4.06
V(9) = 3.30.

We note that the denominator for both odd and even dimensions grows much faster than the numerator to conclude (in academia we could only nod our head suggestively so far at this conclusion, but we’re in industry now!) that the 5-dimensional ball has the most volume.

As an aside, and possibly a subject for a later post, if we allow for fractional dimensions (everything in V(n) is defined for n a real number, not just an integer), then the maximum value of V is at about 5.2569, where the unit ball has volume around 5.2778.

The function V(n), plotted for real numbers n, with max around (5.26, 5.28).

# Large data sets

1200 points on the sphere, each connected to its 7 nearest neighbors.

I was thinking about the topic that Rice’s VIGRE program has focused on in the past, namely how to visualize large data sets.  Now, discrete data is not the greatest thing for an analyst, as opposed to, say, a manifold.  Along this line of thinking, one might ask:

If you know that a data set comes from some manifold, is there a way to detect which manifold?

Towards answering this, I wrote a quick program in MATLAB that:
1. Draws random points from a parametrized surface (so far just a sphere and a torus, but it would be easy to use any parametrized surface),
2. Draws a line connecting each point to its n nearest neighbors.

The data without a "ghosted" surface. You can still tell the data set has a nonzero genus (i.e., a "hole"), but it is somewhat hard to see depth without rotating the picture (so I've included the surface in all the others). This is 400 points on a torus, each connected to its 3 nearest neighbors.

The program also “ghosts” in the parametrized surface to see how well this method does at illustrating the surface.

At this stage, the program is just something to play around with a bit, but I think the images you get from it are really great!   One of the problems it still has is that not every vertex has n edges leaving from it (since the relation “is one of the n nearest vertices” is not symmetric, which is easy to see with a sketch).  This also slows down computation since really I am drawing 2*N*n lines, even though only about N*n of them are displayed.  Scroll down to see all the images.

The sphere with N = 400, n = 3.

The torus with N = 400, n = 3.

N = 400, n = 7.

N = 1200, n = 3.

N = 1200, n = 3.

N = 2000, n = 7.

# Hausdorff dimension II

I left off yesterday asking how to measure a surface (something sort of 2-dimensional) in 3-space, and noting that Lebesgue measure would give this set 0 area.  This is where Hausdorff measure comes in.  There is in fact a continuum of Hausdorff measures, one for each real number.  First it is helpful to define the diameter of a set, which is just the farthest distance apart of two points in the set.  For a circle, the diameter is the usual definition.  Also, we’ll denote the diameter of a set U by |U|.

The construction of the measure is typically described in two steps: Suppose we are trying to find the m-dimensional Hausdorff measure of a set U.  Then in the first step, we cover U by sets no larger than some number, d, and look at the sum of the diameters of these sets raised to the mth power. The smallest (well, infimum) of these coverings will be the number we associate with the parameter d.  In the second step, we let d go to zero.  The Hausdorff measure is the limit, if it exists.  In symbols,

Two sets with (approximately) the same diameter, indicated with the red line.

$H^m_d(U) = inf \sum_{k = 1}^{\infty} |U_k|^m$, where $|U_k| < d$ for each k.  Then $H^m(U) = \lim_{d \to 0} H^m_d(U)$.  There is an easy result (see, for example, Falconer’s “Geometry of Fractal Sets“) that says that for any set U, there is a unique value, called the Hausdorff dimension of U so that $H^m(U) = \infty \text{ if } 0 \leq d < \dim U$ and $H^d(E) = 0 \text{ if } \dim U< s < \infty$.  Note that the Hausdorff measure might be 0 or infinite, but this number will still exist.

We should also note that:

1. Nowhere in the definition did I use anything but distance, making all these concepts valid in metric spaces, as well as in Euclidean (vector) spaces.
2. Nowhere in the definition did I require the Hausdorff dimension to be an integer.

Expanding on this second point brings us into discussing fractals, which I have discussed a bit in the past week, but have not defined.  One reason for this is that I have not seen a great definition.  Self similarity is important, as is having a Hausdorff dimension different from its topological dimension.  At the very least, any self similar set whose Hausdorff dimension is not an integer would be given “fractal” status, though this would miss many good examples.  In any case, for any self similar set, the Hausdorff dimension is log(m)/log(s), where m is the number of copies at each step, and s is the (linear) scaling factor.

So if you view the real plane as a self similar set taken by starting with a square, then dividing it into four squares, and then each of those into four squares, and so on, then the number of copies is 4 and the scaling factor is 2 (Remember, it is the linear scaling factor! Each side is half the length it used to be.), so the Hausdorff dimension is log(4)/log(2), which is precisely 2, since $\log{4}/\log{2} = \log{2^2}/\log{2} = 2\log{2}/\log{2}$.

I have printed below two more examples of fractals and their dimensions.  Thanks to Leonid Kovalev for pointing out how to generate the Sierpinski carpet.  Also see this great wikipedia post for a much more complete list of fractals and their dimensions.

Dimension log(3)/log(2).
Has dimension log(8)/log(3) = 3*log(2)/log(3).

# Riddles.

At a conference in Palo Alto, heard a few clever riddles:

• Suppose you have a 15 x 20 sheet of chocolate.  What is the least number of sheets you must break to have only 1 x 1 pieces of chocolate left?
• Suppose you are blindfolded, and sitting in front of a table with 40 tiles on it.  Each tile is black on one side and white on the other.  You are told that precisely 10 of the tiles is turned white side up.  You are asked to split the tiles into two groups, each with the same number of white tiles.  Further, you are allowed to flip some tiles if you would like (though, remember, you are blindfolded).  How do you do this?
• Also, to satisfy my mathematically curious reader(s), a speaker at the conference yesterday was discussing “optimal Lipschitz extensions”.  Roughly, the question is this: if you have an open, bounded region in $\mathbb{R}^n$, with a Lipschitz function defined on the boundary, is there a “best” Lipschitz function defined on the entire region?  By “best”, we’ll mean that on any open bounded subset, the function has the smallest Lipschitz constant given its boundary.
For example, these two graphs are both Lipschitz with the same boundary data, though the second graph has very small crinkles in it.  Really, we are trying to define a notion of “best” that will choose the first extension, but not the second one.  It should not be obvious that such an extension even always exists, since we are imposing a number of conditions on the map, or if it exists, that there is only one (existence, in fact, does not hold for vector-valued functions).  In any case, there has been some research done into this, but the speaker mentioned a proof of the existence and uniqueness of such extensions using a game theoretic proof (pdf link to the paper by Peres, Schramm, Sheffield and Wilson).  This is great.

How it goes is that you choose a point x in the interior of the region, and then have two players playing the following game.  A (fair) coin is flipped, and the winner gets to move the point by a certain fixed distance d.  The game ends when the point reaches the boundary of the region, at which point player 2 pays player 1 the value of the function on the boundary.  Intuitively, player 1 will steer towards high values, and player 2 will steer towards lower values.  Now if each player plays optimally, then you can calculate an expected payoff for player 1 at each value x (that is, the average payoff as one plays the game over and over again).  As d approaches zero, this expected payoff converges to a value that will define the optimal Lipschitz extension.  I think this is a fantastic result.

Back to Houston on Saturday.

1. Lipschitz functions. I mentioned that the notion of “addition” is not (necessarily) defined in a metric space, so neither is differentiability (though I defined a notion of a graph Laplacian… note that the graph Laplacian is a particular differential operator, defined on a particular metric space).  However, we do have a notion of Lipschitz functions, which stand in for differentiable functions.  A Lipschitz function is one where the distance between two points is not changed too much by the function.  Specifically, if $f: X \to Y$ is a map between metric spaces X and Y, then f is Lipschitz if there is a constant C with  $d_Y(f(x),f(y)) \leq C d_X(x,y)$ for all x,y in X.  Notice that if we had a good notion of a limit, we might take a limit as y gets close to x, and call C the derivative of f at x.  In fact, this is precisely what we do in a Euclidean space.  However, Lipschitz is a little weaker (i.e., easier to satisfy) than differentiability on compact sets.  For example, the absolute value function $f(x) = |x|$ is Lipschitz with constant 1.  Notice, however, that a function like $f(x) = x^2$ is not Lipschitz on the real line, since the slope of f grows without bound.
3. Embedding theorems. For work that absolutely requires adding together elements, there exist embedding theorems.  One useful theorem states that any separable (i.e. has a countable dense subset) metric space may be isometrically embedded into a separable Banach space (namely, $\ell^{\infty}(\mathbb{Z})$, the space of bounded sequences) via the following process:  choose a countable dense subset $\{ x_j \}_{j = 0}^{\infty}$.  Then we define the isometry j by $j(x) = (d(x,x_0) - d(x_0,x_0),d(x,x_1)-d(x_0,x_1),d(x,x_2)-d(x_0,x_2),\ldots)$  This turns out to preserve the metric (i.e. be an isometry) while giving us a reasonable way of adding points and a way of reversing the operation.  A powerful tool is to prove results in the Banach space setting, then send the results back to the metric space world.  An interesting exercise is to explicitly construct this isometry for, say, the real line or the plane.  One quickly realizes how non-canonical the isometry is, as well as how hard it can be to write these maps down!