Large data sets

1200 points on the sphere, each connected to its 7 nearest neighbors.

I was thinking about the topic that Rice’s VIGRE program has focused on in the past, namely how to visualize large data sets.  Now, discrete data is not the greatest thing for an analyst, as opposed to, say, a manifold.  Along this line of thinking, one might ask:

If you know that a data set comes from some manifold, is there a way to detect which manifold?

Towards answering this, I wrote a quick program in MATLAB that:
1. Draws random points from a parametrized surface (so far just a sphere and a torus, but it would be easy to use any parametrized surface),
2. Draws a line connecting each point to its n nearest neighbors.

The data without a "ghosted" surface. You can still tell the data set has a nonzero genus (i.e., a "hole"), but it is somewhat hard to see depth without rotating the picture (so I've included the surface in all the others). This is 400 points on a torus, each connected to its 3 nearest neighbors.

The program also “ghosts” in the parametrized surface to see how well this method does at illustrating the surface.

At this stage, the program is just something to play around with a bit, but I think the images you get from it are really great!   One of the problems it still has is that not every vertex has n edges leaving from it (since the relation “is one of the n nearest vertices” is not symmetric, which is easy to see with a sketch).  This also slows down computation since really I am drawing 2*N*n lines, even though only about N*n of them are displayed.  Scroll down to see all the images.

The sphere with N = 400, n = 3.

The torus with N = 400, n = 3.

N = 400, n = 7.

N = 1200, n = 3.

N = 1200, n = 3.

N = 2000, n = 7.


2 comments on “Large data sets

  1. Looks nice, but you are drawing a kind of 1-skeleton of something that you expect to be 2-dimensionals. Why not draw a 2-dimensional approximation, e.g., by filling in a triangle for every triple of points v_1, v_2, v_3 in which v_j is among the n nearest neighbors of v_i for all i,j? What is the smallest value of n for which the resulting set is (with high probability) connected?

  2. Yes, this stuff was mostly just playing with some code. I think what you describe is a logical next step, though one would have to play around with the conditions to fill in a triangle. The torus I think is particularly interesting, because I can think of many filling algorithms that might “fill in” the hole in the middle.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s