# Large data sets

1200 points on the sphere, each connected to its 7 nearest neighbors.

I was thinking about the topic that Rice’s VIGRE program has focused on in the past, namely how to visualize large data sets.  Now, discrete data is not the greatest thing for an analyst, as opposed to, say, a manifold.  Along this line of thinking, one might ask:

If you know that a data set comes from some manifold, is there a way to detect which manifold?

Towards answering this, I wrote a quick program in MATLAB that:
1. Draws random points from a parametrized surface (so far just a sphere and a torus, but it would be easy to use any parametrized surface),
2. Draws a line connecting each point to its n nearest neighbors.

The data without a "ghosted" surface. You can still tell the data set has a nonzero genus (i.e., a "hole"), but it is somewhat hard to see depth without rotating the picture (so I've included the surface in all the others). This is 400 points on a torus, each connected to its 3 nearest neighbors.

The program also “ghosts” in the parametrized surface to see how well this method does at illustrating the surface.

At this stage, the program is just something to play around with a bit, but I think the images you get from it are really great!   One of the problems it still has is that not every vertex has n edges leaving from it (since the relation “is one of the n nearest vertices” is not symmetric, which is easy to see with a sketch).  This also slows down computation since really I am drawing 2*N*n lines, even though only about N*n of them are displayed.  Scroll down to see all the images.

The sphere with N = 400, n = 3.

The torus with N = 400, n = 3.

N = 400, n = 7.

N = 1200, n = 3.

N = 1200, n = 3.

N = 2000, n = 7.

1. Looks nice, but you are drawing a kind of 1-skeleton of something that you expect to be 2-dimensionals. Why not draw a 2-dimensional approximation, e.g., by filling in a triangle for every triple of points $v_1, v_2, v_3$ in which $v_j$ is among the n nearest neighbors of $v_i$ for all $i,j$? What is the smallest value of n for which the resulting set is (with high probability) connected?