Tuesday, May 15, 2007

HKUST research on web technologies

While I was skimming the proceedings of WWW2007 conference for interesting ideas (e.g. this one, a simple method of adding security to AJAX mashups) today morning, I saw a paper from HKUST:

Exploring in the Weblog Space by Detecting Informative and Affective Articles

The paper describes a method that classifies blogs into various degrees between "informative" and "affective". Informative blogs, like Alex Russell's, dispense useful information that interest the readers. Affective blogs are diaries describing things that mostly interest the author only. High quality blogs (i.e. those that people want to read) are usually informative.

The method shouldn't be treated as an absolute measure of blog quality, however. Lets take a look at a random paragraph from Joel on Software, a popular informative blog:

That's why I'm incredibly honored that they invited me to write a guest editorial about recruiting and internships in this month's issue. Thanks to professional editing, it feels a little bit polished compared to my usual style. I don't think I would write, "Ah, college." I do remember writing, "Get me a frosty cold orange juice, hand-squeezed, and make it snappy!"

Highlighted in red is one of the top feature phrases indicating affective blogs, as described by the paper. Guess what the algorithm would classify the above paragraph, and Joel's blog entries in general? I don't have the software on my hands so I can't test it and get the data, but the above paragraph has a word that is ranked as a top representative feature in the affective category, and none in the informative category. And that's not just an isolated example:

Microsoft finally put Lookout back up for download, but they sure weren't happy about it. ... The story has a happy ending.

A number of years ago a programmer friend of mine worked for a company...

... it wouldn't be such a bad thing to take Air France and change planes at CDG.

Among other things, this week I've been working on the new office design with our architect, Roy Leone [flash site].

Microsoft did the only thing that made sense...

I've been nattering on about this topic for well over 5000 words and I don't really feel like we're getting anywhere.

Thanks to professional editing, it feels a little bit polished compared to my usual style.

I had a chance to visit 7 World Trade Center today...

The "like" example above assumes that there's no word sense disambiguation (or something similar) in their algorithm. Since the "like" in the paper and the "like" in my example has different meanings. But hey, the paper didn't mention WSD at all.

On the other hand, the only informative features I could find from Joel's blog today are "project" and "report". They appear much less frequently than affective features in Joel's blog.

Joel's blog, however, is widely regarded as highly informative by software engineers. It's just that Joel prefers to write his blog entries in an informal and personal style. But anyway, this is still an interesting reading in seeing how computers can attempt to "understand" and filter information these days.

1 comment:

Anonymous said...

Visit http://www.dofollowarticles.com to post articles in the following categories: rollover, sausages, meetings, mantlepiece, budgeting., footprint, buying and more...