Hamming codes in python

As a bit of an experiment I’ve started up a GitHub repository with some code from this blog here [LATE EDIT: you need numpy (and might as well get scipy and matplotlib) to run this].  In particular, this week I’ve implemented a script in Python which contains a script for generating Hamming matrices (one for encoding, one for parity check, one for decoding), which constitutes as much of a proof of existence as I’m willing to get into.

There is also a “Message” class inside where you can play around with how many corrupted bits your message might have versus how long your message is versus the size of Hamming matrix you use.  The defaults are set at 3 corrupt bits in a set of 4000 bits, with the error checking done with the Hamming(7,4) code.  You can run this by downloading hamming_codes.py and running “python hamming_codes.py” from the command line.

The specific directory with this project is located inside the “HammingCodes” folder.  Possible experiments with this code later, but now I need sleep!

Hamming Codes II

No reasonable pictures for today! Have a cactus!

No reasonable pictures for today! Have a cactus!

So now we know the general idea behind Hamming error correcting codes, and how one might construct and visualize hypercubes.  Now suppose we want to encode, in an error correcting way, 4 bits.  Recall that this means finding a hypercube with enough vertices that we can designate 16 (=42) of them, *and* pick those 16 so that no two “symbol vertices” are closer than distance 2.  This means each “symbol vertex” has a disjoint neighborhood of distance 1.

A back of the envelope calculation gives a necessary condition to allow this: an n dimensional hypercube has 2n vertices and each vertex has n neighbors (so a “symbol neighborhood” takes up n+1 vertices).  Hence it is necessary that n satisfy

16*(n+1) ≤ 2n.

More generally, to encode m bits in n bits, we require 2m*(n+1) ≤ 2n.  Note without proof (for now, hopefully soon by construction) that this is also a sufficient condition.  Interesting from an efficiency point of view is seeing where equality exists.

Taking logs (base 2) of both sides, and realizing that log(n+1) is an integer only when (n+1) is a power of 2, so m = n-log(n+1), or, letting :=2k-1, m = 2k-1 – k.  In fact, one may (and we may) describe a whole class of Hamming (2k-1, 2k-1 – k) codes.

A brief foray into hypercubes

The discussion on error-correcting codes is about to get a little hypercube heavy (never a good state to be in), and a brief foray into how to construct/visualize them may be in order.  I’ll take the liberty of defining an n-dimensional (unit) hypercube as a shape whose

1. vertices are located at coordinates made of entirely 0’s and 1’s, and
2. has an edge wherever two vertices are distance 1 apart.

This would take two more things to make a complete definition: I should let you move the cube about however you like (no reason to have it fixed is space), and I should tell you about the 2-D faces, 3-D hyperfaces, and so on up to the (n-1)-D hyperfaces.  You can use that first one if you want, but I’ll ignore the second.  I think I did a good job of defining what’s called the 1-skeleton of a very particular n-dimensional hypercube.

Two sample vertices representing (1,1,0,0) and (1,0,1,0). These will not be connected in the hypercube.

Two sample vertices representing (1,1,0,0) and (1,0,1,0). These will not be connected in the hypercube.

Anyways.  Wednesday had pictures of a 2-cube and 3-cube.  What about the 4-cube?  Or 5-cube?  It will help to consider this all from a less analytic, more graph theory (or, if that sounds technical, “pictures and problem solving”) point of view.  Condition 1 for a hypercube says that there are 2n vertices, all the binary sequences of length n. Then condition 2 says that two vertices are connected if you can change one vertex’s binary sequence to the other’s by changing a single bit.   We’ll go one step further, by just coloring particles on a line: white for 0, black for 1 (this is something of a homage to my undergraduate thesis advisor’s work with polyhedra).

The only two things left to do are to draw the vertices and arrange them in nice ways (that is, fine a “nice” projection).

tesseract

A projection of a 4-cube, with vertices labelled.

Below is the image from the wikipedia 5-, 6-, and 7- cubes.  Note the some of the vertices are laying on top of eachother.  I’ll leave it as an exercise to the reader to label these vertices with the appropriate binary sequences.

5-cube

5-cube

6-cube

6-cube

 

7-cube

7-cube

Networks

Theme of the day is networks, and let’s even say self-organizing networks.  I expect to flesh this out in the future, but I have some pretty pictures now, so that’s a pretty good start.

A graph with 30 cliques of various sizes.

A general problem will be to take a graph (nodes and edges, etc), and identify sub-populations of that graph.  Two great questions, (with some reasonable answers):

– What kind of graph?
We could restrict the discussion to nodes connected by edges, with only one edge allowed between two nodes, and no self connections (this might be a “friend” graph in facebook), but perhaps it would be more useful to look at a directed graph (like a “follow” graph in Google+) where a connection is only one-way.  A weighted graph could let you have strong or weak connections – maybe you are interested in a “normalized” facebook friend graph, where the “strength” of a friendship (it strikes me that the quotation marks might just as well be around “friendship”) depends on how many friends each person has.  A bipartite graph might model facebook “likes”, since a person can like a product (but not other people).

The adjacency matrix for the above graph.

-What does it mean to “identify sub-populations”?
I hope to have some images below that help to give an intuitive understanding of this, and maybe flesh out that understanding in a later post.  This is actually a hard question to answer — must every node belong to some sub-population?  Must a node belong to only one sub-population?  In the first case, where does the crunchy peanut-butter eater belong, in a land of creamy peanut-butter eaters?  In the second case, where does a triathlete belong in a population of runners, swimmers and bikers?

The above matrix with the columns/rows acted on by the same permutation. This is what one might expect their network data to look like, and you’d hope to find the block structure above (or find an indication that such structure does not exist).

Anyways, in order to investigate some of this I’ve got some basic scripts set up that:
1) Generate a graph with 30 cliques of various sizes, sparsely connected to each other
2) Print an image of this graph
3) Print the adjacency matrix of the graph
4) Scramble the matrix, and print that
5) Run the affinity propagation algorithm from sklearn on the scrambled matrix, and print that

If affinity propagation was perfect, it would return a result very close to the original block matrix (possibly with the blocks permuted).

The results of affinity propagation. Notice that it identifies many more clusters than originally existed.  Also notice that this is also an adjacency matrix for the above graph.