As you may have noticed, I’ve added categories to this site. This means that each post is categorized into one or more category, categorically. I initially shunned this feature because it so often seems pointless and leads to having a dozen categories, most of which have one or two posts associated with them while the everything else goes into a kind of demilitarized blogging Shangri-la like “General” or “Daily Life” or “Misc.” But I wanted a way for people who were only interested in say Samantha to find an archive of stories only related to her while ignoring the rest of my inane ramblings. You can do that now.
Once I decided to do this, though, the main task before me was to define the categories. A cursory glance at my archives showed a fair variety in the subject matter of posts, but an underlying factor structure was not crystal clear. To resolve this, I endeavored to apply my six years of graduate school in psychology and do some kind of scientific data reduction. Specifically, I applied cluster analysis, which is a multivariate statistical procedure that takes a sample of observations about entities and organizes those entities into more homogeneous groups.
To start, I took all the 322 blog posts and had a group of subject matter experts rate each one on a variety of dimensions related to content, tone, voice, subject matter, reading level, word count, and the frequency with which I had used the word “poop.” These data were entered into a SAS dataset and analyzed using SAS’s PROC CLUSTER procedure. The output provided a wealth of information about the data’s possible underlying structure, but of particular interest was the Semipartial R2. Using this statistic for each of the solutions in the last fifteen iterations of the clustering procedure, I created the following Fusion Plot:
As you can see, there is a sharp dropoff in the Semipartial R2 at around 4 clusters, suggesting that to be an optimal solution to the data. Indeed, the four-cluster model explained over 85% of the variance in the original data, and this hypothesis was further supported by a dendogram that suggests a four to six cluster solution:
Finally, a plot of the four-cluster solution in multidimensional space using canonical variables pretty strongly suggested four (or possibly five) clusters:
Given these scientific results, I arrived at the following four categories for my blog:
The “General” category could have been further broken down upon rational review of the data, resulting in smaller categories like Gaming, Books, Movies, Family News, and Stupid Observations, but I decided that none of those individual topics would be of interest enough to most visitors to warrant splitting them out.
You guys are totally buying that I did all this work, right? Right? Pffftt.
Anyway, the way Movable Type handles category archives, though, has me pulling my hair out a bit. I want to have date-based archives, too, but I want category-specific archives for the Photo of the Day. But to have that, it kind of messes up the other archives so that you can only browse individual entries (like through a permalink link) within a category and not across categories. It really ticks me off, so if you know of a solution let me know. If I can’t figure anything out, I’ll probably end up creating a separate blog for the Photo of the Day, output it (and its archives) to a static file, and include it in this main site with server-side includes. What a pain.
Finally, you may also notice that I changed the layout of blog entries. It occurred to me that there were three types of elements to a blog entry: those about the entry (the date, the title, the author), the entry itself, and those related to what you can do in response to the entry (link to it, comment on it, find similar entries). So I separated them. Title and date are at the top (I trimmed author, since I’m the only one on this site), and then put the comment link, the permalink, and the category archive link at the bottom. The latter also makes sense in that you don’t force people to scroll back up in order to comment or get a permalink.
So, hope you enjoy the fruits of my labor. There are more tweaks to come, as well as a total redesign if I can get around to it.
You have too much time on your hands!
I did most of this last night while watching “Harold and Kumar go to White Castle.” There may be a connection.
As I said, you have too much time on your hands. 🙂
How was dodge ball?
We saw Garden State this weekend.
Dodge Ball was surprisingly funny. I laughed out loud a lot more often than I expected to. Ben Stiller’s character was typicall goofball, but when Rip Torn started whipping wrenches at the guys I about busted a gut. The two commentators for “ESPN 8 -The Ocho!” (a great joke in and of itself) were also a riot.
I went in to Blockbuster meaning to pick up a “highbrow” movie like Garden State, Eternal Sunchine on the Spotless Mind, or Napoleon Dynamite, but they were all checked out. So I walked out with Dodge Ball and Harold and Kumar. Still, both funny.
We saw Eternal Sunshine, very funky and not quite what I expected but it was pretty good. We can only do so much “highbrow” at one time. Dodge Ball is on my list to see also. We also saw Anchor Man which was pretty good. Will Farrel cracks me up.
I heard Shaun of the Dead was really good. that is on my “to see” list next.
I thought Harold & Kumar go to White Castle was a lot funnier than Dodgeball. It had a lot of totally random gags (like battlesh**s and the cheetah). Although it could have been because I was making the cranberry bread during Dodgeball and didn’t really catch everything.