I’ve moved over to a new website, and hopefully once the DNS records finish updating, that will be where “colindcarroll.com” sends you to, though I hope to have new, exciting material here in the future!
[NOTE: At the end of editing this, I found that the substitution used below is famous enough to have a name, and for Spivak to have called it the “world’s sneakiest substitution”. Glad I’m not the only one who thought so.]
In the course of working through some (very good) material on neural networks (which I may try to work through here later), I noticed that it was beneficial for a so-called “activation function” to be able to be written as the solution of an “easy” differential equation. Here by “easy” I mean something closer to “short to write” than “easy to solve”.In particular, two often used activation functions are
One might observe that these satisfy the equations
By invoking some theorems of Picard, Lindelof, Cauchy and Lipschitz (I was only going to credit Picard until wikipedia set me right), we recall that we could start from these (separable) differential equations and fix a single point to guarantee we would end up at the functions above. In seeking to solve the second, I found after substituting cos(u) =τ that
and shortly after that, I realized I had no idea how to integrate csc(u). Obviously the internet knows (substitute v = cot(u) + csc(u) to get the integral being –log(cot(u)+csc(u))), which is a really terrible answer, since I would never have gotten there myself.
Instinctually, I might have tried the approach to the right, which gets you back to where we started, or by changing the numerator to cos2x+sin2x, which leads to some amount of trouble, though intuitively, this feels like the right way to do it. Indeed, eventually this might lead you to using half angles (and avoiding integrals of inverse trig functions). We find
Avoiding the overwhelming temptation to split this integral into summands (which would leave us with a cot(u)), we instead divide the numerator and denominator by sin2(u) to find
Now substituting v = tan(u/2), we find that dv = 1/2 (1+tan2(u/2))du = 1/2(1+v2)du, so making this substitution, and then undoing all our old substitutions:
Using the half angle formulae that everyone of course remembers and dropping the C (remember, there’s already a constant on the other side of this equation), this simplifies to (finally)
Subbing back in and solving for gives, as desired,
As a bit of an experiment I’ve started up a GitHub repository with some code from this blog here [LATE EDIT: you need numpy (and might as well get scipy and matplotlib) to run this]. In particular, this week I’ve implemented a script in Python which contains a script for generating Hamming matrices (one for encoding, one for parity check, one for decoding), which constitutes as much of a proof of existence as I’m willing to get into.
There is also a “Message” class inside where you can play around with how many corrupted bits your message might have versus how long your message is versus the size of Hamming matrix you use. The defaults are set at 3 corrupt bits in a set of 4000 bits, with the error checking done with the Hamming(7,4) code. You can run this by downloading hamming_codes.py and running “python hamming_codes.py” from the command line.
The specific directory with this project is located inside the “HammingCodes” folder. Possible experiments with this code later, but now I need sleep!
We are now in a position to actually write down the Hamming (7,4) code. As explained in the previous three entries, we want some way of both detecting that an error occurred and of correcting that error. As with many problems, this is much easier once you know it is possible (thanks, Richard Hamming!) In particular, last time we proved that it is necessary to send a 7-bit message in our scheme of correcting errors for a 4-bit message, but is it sufficient? An easy way to deduce the solution in this case (and then to see the pattern that proves the general case) is to require that our parity check detects the location of any 1-bit error.
Specifically, someone will receive a (possibly corrupt) 7-bit string v, and we want a matrix that will output 0 if all is well, or communicate a number 1-7 if one of the bits is corrupt. It takes 3 bits to communicate 8 numbers (8 = 23), so our parity matrix H (following wikipedia’s notation) must be 3 x 7. To make it easy to remember, we’ll even have column j be the binary representation for j. More directly:
Now we can work backwards (again, we’re assuming an answer exists), and for reasons that may be clear later, we’ll set our three parity bits to be the three “singleton” columns of H, so that the “coded” message v = (p1,p2,d1,p3,d2,d3,d4). Then if everything goes smoothly, we have that Hv = 0, so that
0 = p1+d1+d2+d4
0 = p2+d1+d3+d4
0 = p3+d2+d3+d4.
Notice also that if one bit gets corrupted, this is equivalent to sending the message v+ej, and
G(v+ej) = 0+gj,
where gj is the jth column of G (which is the binary representation of the number j). Hence multiplying a message with a 1-bit mistake gives us the index of the corrupt bit.
But this tells us how we must encode our message m = (d1,d2,d3,d4) as well. We want a matrix G so that Gm = v = (p1,p2,d1,p3,d2,d3,d4). But the above gives us a linear condition for what this matrix must look like (and an explanation for why the parity bits are all “singletons”).
Finally we want to “decode” our message, which is also straightforward at this point, since it will just be the matrix which returns the non-parity bits from the encoded message.
As a review, and to wrap everything up:
1. Start with a message m = (1,0,0,1)
2. Transmit the message v = Gm = (0,0,1,1,0,0,1)
3. Check the parity by confirming that Hv = (0,0,0).
4. Decode the message Rv = (1,0,0,1), as desired.
Wikipedia’s also got an explanation involving Venn diagrams which I did not much like, though I may write a bit about Venn diagrams themselves in the future…
Another week down, 56 miles planned, and 56.6 done. Definitely dialed back the easy day effort this week — wore a heart rate monitor that kept the pace low most days, but threw in some Oslerians (2-1-:30-:30) on a hilly Wednesday run and had a good 6 mile tempo on Saturday. Tempo speed still not quite up to snuff — mile splits Saturday were 6:04-5:58-6:06-5:49-5:47-5:59 for a 5:57 average — looking for something more like 5:40 eventually. Overall, happy with the effort. We’ve got 66 on the menu for next week, including 5 x mile at <5:30 on Wednesday and 17(!) mi on Saturday with ~7mi/40mins tempo.
Week summary on Strava.
So now we know the general idea behind Hamming error correcting codes, and how one might construct and visualize hypercubes. Now suppose we want to encode, in an error correcting way, 4 bits. Recall that this means finding a hypercube with enough vertices that we can designate 16 (=42) of them, *and* pick those 16 so that no two “symbol vertices” are closer than distance 2. This means each “symbol vertex” has a disjoint neighborhood of distance 1.
A back of the envelope calculation gives a necessary condition to allow this: an n dimensional hypercube has 2n vertices and each vertex has n neighbors (so a “symbol neighborhood” takes up n+1 vertices). Hence it is necessary that n satisfy
16*(n+1) ≤ 2n.
More generally, to encode m bits in n bits, we require 2m*(n+1) ≤ 2n. Note without proof (for now, hopefully soon by construction) that this is also a sufficient condition. Interesting from an efficiency point of view is seeing where equality exists.
Taking logs (base 2) of both sides, and realizing that log(n+1) is an integer only when (n+1) is a power of 2, so m = n-log(n+1), or, letting n :=2k-1, m = 2k-1 – k. In fact, one may (and we may) describe a whole class of Hamming (2k-1, 2k-1 – k) codes.
The discussion on error-correcting codes is about to get a little hypercube heavy (never a good state to be in), and a brief foray into how to construct/visualize them may be in order. I’ll take the liberty of defining an n-dimensional (unit) hypercube as a shape whose
1. vertices are located at coordinates made of entirely 0’s and 1’s, and
2. has an edge wherever two vertices are distance 1 apart.
This would take two more things to make a complete definition: I should let you move the cube about however you like (no reason to have it fixed is space), and I should tell you about the 2-D faces, 3-D hyperfaces, and so on up to the (n-1)-D hyperfaces. You can use that first one if you want, but I’ll ignore the second. I think I did a good job of defining what’s called the 1-skeleton of a very particular n-dimensional hypercube.
Anyways. Wednesday had pictures of a 2-cube and 3-cube. What about the 4-cube? Or 5-cube? It will help to consider this all from a less analytic, more graph theory (or, if that sounds technical, “pictures and problem solving”) point of view. Condition 1 for a hypercube says that there are 2n vertices, all the binary sequences of length n. Then condition 2 says that two vertices are connected if you can change one vertex’s binary sequence to the other’s by changing a single bit. We’ll go one step further, by just coloring particles on a line: white for 0, black for 1 (this is something of a homage to my undergraduate thesis advisor’s work with polyhedra).
The only two things left to do are to draw the vertices and arrange them in nice ways (that is, fine a “nice” projection).
Below is the image from the wikipedia 5-, 6-, and 7- cubes. Note the some of the vertices are laying on top of eachother. I’ll leave it as an exercise to the reader to label these vertices with the appropriate binary sequences.