Gravity Monkey

projects: cloudbrain

see cloudbrain in action:
click on a name to view:
<% File ff = new File("/usr/share/tomcat/gravitymonkey/ROOT/cloudbrain/data"); String[] flist = ff.list(); for (int w = 0; w < flist.length; w++){ String tfile = flist[w]; if (tfile != null){ tfile = tfile.trim(); } else { tfile = ""; } if (tfile.endsWith(".dat")){ tfile = tfile.substring(0, tfile.length() - 4); %> <%=tfile%>
<% } } %>
click here to request a cloudbrain based on your tags

Ingredients: This project is built in processing, using RSS feeds from del.icio.us and using Classifer4J.

Time to take a stretch after a successful, fun and intense upgrade to Mologogo. I've started to amass a bunch of links in my del.icio.us account. It's not just a bunch of random junk, but it's stuff that I made a point of noting that I had to remember -- at least enough to go to del.icio.us to post it. Tag clouds (snapshot on the right) are cool, and it's a nice way to quickly see the tags, and thus, topics that are most interesting to me. But I wanted to know more about each tag, to know more about what's under each: What makes that topic more important to me than that topic? How are my tags interrelated? Are there things that connect seemingly disperate topics -- such as "buddhism" and "J2ME" and "wifi"? That is, other than me?
[front view] [top, right-ish view]	Here is the weekend's worth of wondering: cloudbrain. Read details about what it does and how it works below, or just go and view the applet. Here's how it works: First, I grab all my tags and the associated links off my RSS feed from del.icio.us Next, I go and grab each and every page that I've bookmarked, parsing off the needless stuff (which is exactly what I wrote for shrunq to deal with full HTML, so that was easy) Then, run the pages through Classifier4J. It was super easy to use, especially with this simple tutorial. Using the VectorClassifier class, I got the terms (and their weights) that result from each tag Using this information in processing I plotted each tag (in white) in a 3-dimensional space, where the tags that have more associated links are closer to you (see [front view] to the left). This approximates the del.icio.us cloud -- the tags that are more important are larger (actually, they are the same size as the others, just closer in Z-space when viewed from the front). Behind each tag I drew (almost) every term that Classifier4J had segmented out in an mostly transparent grey. Spinning the whole thing (see [top, right-ish view] on the left), you can see that some tags have a lot of terms tailing off behind. For my tags shown here, you can really see how far "J2ME" is in front of the rest.
Ok, so, that's kinda cute, showing my tag cloud in 3-D. But not enough, right? I had really wanted to see what kind of learning or intelligence is sitting there within all that information. Let's keep going, then: Next, the applet will go and get some URL's which I haven't seen (in this case it gets URLs from del.icio.us/popular...no tags, just the URLs). If I click on the lower right of the applet, it goes and grabs that URL, parses it the same way, and loads it into the same system. Now I shuffle the cloudbrain -- If there is a term on the new page that exists within the current terms, make it a brighter blue. Move the tags that are a better match closer and brighter, the tags that don't match up move further back and darker. The weights are shown in smaller type below each tag. Show the text of the document, word-by-word, in red in front of the cloudbrain -- unless it's the same as a term within the cloudbrain, then show it in place behind the tags. This creates a cool flashing, vaguely sequential effect throughout the cloudbrain. Across the top you can see the current URL and the tags that best match (and their weight -- from 0.0 to 1.0). You can also see the overall rating for the page (also from 0.0 to 1.0), which is the average of all the tags. I don't really think that's the best way to get at an overall rating, ultimately, but just another reference point for this purposes of this little excercise.
So, some nice eye-candy. Does it work? Well, amazingly, yeah. Obviously the tags that had more content were more successful in auto-classifying new content -- based on my tags, a site clearly more about "linux" was overridden by the "programming" category, for example. But it is able to assert a best match that is quite appropriate for my tag categories, even based on the limited data that my del.icio.us tags are able to represent (in my case, approximately 31 different URLs in my RSS feed). Pages that are relevant to my cloudbrain appear alive and bright -- reds bouncing throughout the brain, bright white tags showing relevance...while pages that aren't relevant at all are quiet and dark. A nice dramatization, in fact. Now what? There are lots of possibilities to consider: use a different, perhaps larger initial source, for establishing my cloudbrain (browser history?) compare more diverse documents -- from general news RSS feeds? check out Feed on Feeds, perhaps? focus on recommendations -- define a set of "not-interested" documents to try out the Bayesian filtering expand the visualization -- order all the terms, connect the terms and nodes, set it to music! support live creation of cloudbrains (right now, this would be too processor & bandwidth prohibitive) For now, I would like to invite you to play with the cloudbrains that I have put up, and if you're interested, lemme know and I can try to add your del.icio.us profile to the list.