Monday, December 31, 2007

New year presents from Opera

Thank you Hallvord :D

Also, a late thank you to Yusuf from Outblaze Ltd. for the FON router and T-shirt (but most importantly, the experience sharing).

Thursday, December 27, 2007

GWT's solution to the JavaScript dynamic memory leak problem.

Any experienced AJAX developer should have seen this: You wrote a web application with a lot of JavaScript logic for the front end. It looks beautiful, it runs fast, it passes all feature tests. But there's one problem: it occupies more and more memory as it runs.

There's no way you can prevent it in Firefox, due to memory fragmentation at the very least (technically that is not memory leak, but that makes no difference to users), caches, and possible memory mismanagement by plugins. But memory leaks in IE are preventable, and it was one of the major topics for my FYP - WT Toolkit. The idea is trying to solve the memory leak problem as far as possible, and leave it to the browser developers (Firefox/Gecko, that is) if it isn't possible to solve in JavaScript.

WT Toolkit's approach is straightforward - since memory leaks in IE are usually made from circular references between DOM nodes and JavaScript objects (including functions), we try to eliminate the need to reference to any DOM nodes in JavaScript programming. It's simple in theory, but difficult to implement in practice.

Now I've graduated from college, but I'm facing the same problem in another project - FCKeditor. More precisely, I'm facing the problem in FCKeditor's floating dialog branch. Tools like Drip or sIEve, while excellent, can only help a bit in detecting JavaScript memory leaks - it is entirely possible that both of them do not detect any memory leaks but your application still contains dynamic leaks (i.e. increasing memory usage BEFORE page refresh), and I've talked about that in my FYP report.

So now I'm having to do the same research all over again. But this time the problem is harder - for WT Toolkit, I could simply look over the existing literature about the nature of the IE memory leaks, build the toolkit in a way that it can be mathematically proved that it cannot leak under certain circumstances (the "circumstances" here have to be flexible enough to make the toolkit practical, of course), and only develop web applications under the allowed circumstances. FCKeditor, on the other hand, does not have a toolkit built with such provisions.

Right... So I have to read the literature on JavaScript memory leaks all over again, and code again, how wonderful. While reading the literatures I found a piece of gem on this topic from the Google Web Toolkit team:

It discusses Google's solution to the very same problem that I tried to solve in WT Toolkit. It's not ideal, but it's simpler than WT Toolkit's approach. The only thing that it assumes is that the web developer won't call element.removeChild(widget) and won't make his own reference cycles in JavaScript - not a big problem for GWT since the web developer is supposed to write in Java.

Saturday, December 22, 2007

Got Mac OS X Leopard today.

It got me thinking... How many screw ups does it take to end up with Windows Vista's current UI design, given they already had a good example to copy from?

Saturday, December 8, 2007

WriteArea: Use FCKeditor everywhere you want!

How many times you have encountered this: You want to write a blog post - it may be on Blogger, it may be on Xanga, it may be on Wordpress - but the HTML editor provided by the blog site does not do what you want. Say, you want to draw a table with Xanga's HTML editor, which button is that? Well, sorry, there's no such thing. How about making indented list items in Blogger? No can do. Wanting to add a Flash animation in your blog post to spice things up a bit? Well, you need to resort to editing HTML source - a tedious, painful, and error prone process for non-professionals.

Introducing WriteArea

WriteArea is a Firefox plugin that lets you write your blog posts, or anything requiring the use of HTML, with FCKeditor - the most popular web based rich text editor of the world. It is a new Firefox plugin that lets you convert any HTML text areas in any web site into a full-featured FCKeditor editing dialog. With it, you can create tables in your Xanga posts, you can create indented list items in Blogger, you can add Flash animations to your blog posts without fiddling with arcane HTML code, you can have the ability to write things in subscript and superscript, and the ability to edit the properties of tables and images with right click menues... WriteArea is a little plugin gives you a whole world of possibilities in web authoring.

Using WriteArea

WriteArea can be activated from any text area boxes of any website. For blog sites, that is usually the text area provided by the "Edit Source" or "Edit HTML" feature. To give you a rough idea of how WriteArea works, I'll give you an example how I wrote this post with WriteArea in Blogger.

Step 1: Switch to "Edit HTML" in Blogger, and activate the plugin by right clicking on the text area. Step 1: Activate WriteArea plugin.

Step 2: Write your message in the WriteArea dialog that pops up.

Step 3: Click "Save" in the WriteArea dialog to save the HTML code into the blog site's text area. If you want to edit your post again just right click on the text area and activate WriteArea again.

Step 4: Publish your blog post when you've done editing it.

Getting WriteArea

WriteArea can be downloaded here. Since the plugin is new, it is being sandboxed right now and is available only to registered users in You'll also need to enable showing sandboxed plugins in your user preferences after you've registered and logged into the Mozilla plugin site before downloading it. A plugin being sandboxed also means it is a beta and thus it's likely it will have some bugs. So, be a responsible open source software user, report any bugs you've found and write a favorable review for it if you think it is useful.

Tuesday, September 18, 2007

Early history of the infotech industry - Triumph of the Nerds

From the first affordable microcomputer (MITS Altair 8800), to the Homebrew Computer Club in Stanford University, to IBM and Microsoft's deals and struggles, to Netscape and the beginning of the web just before the 21st century. I almost forgot the documentary's name when someone asked me about the history between IBM and Microsoft today. So I figured, I should write this down.

It is not a computer science lecture, but interestingly, it tells me more about how far the infotech industry has come - CPUs, compilers, operating systems, networking, 3D graphics, artificial intelligence; accounting packages, customer relationship management systems (CRM), enterprise resource planning (ERP), electronic communication network (ECN), quant funds, Google... it all started from people playing with boxes without keyboards, monitors or even printers. The substance behind the tech IPO crazies, tech bubbles, and Web 2.0, is not just hard work. Simply working hard cannot do these.

"Do you want to spend the rest of your life selling sugared water, or do you want a chance to change the world?"

And Apple's 1984 advertisement... it may look like just another annoying advertisement in the eyes of the public. But for anyone who's read George Orwell, he knows whoever behind the advertisement, has a far greater goal than just earning money.

It would be a damned shame if I ever forget about these things.

Episode 1, Part 1
Episode 1, Part 2
Episode 1, Part 3
Episode 1, Part 4
Episode 1, Part 5
Episode 1, Part 6

Episode 2, Part 1
Episode 2, Part 2
Episode 2, Part 3
Episode 2, Part 4
Episode 2, Part 5
Episode 2, Part 6

Episode 3, Part 1
Episode 3, Part 2
Episode 3, Part 3
Episode 3, Part 4
Episode 3, Part 5
Episode 3, Part 6

Thursday, September 6, 2007

Tyan GT20 B5191 1U server

Bought it yesterday, although the server chassis had to be booked in advance for 2 to 3 weeks.

The server will be co-located in one of HKNet's data centers next week.

Server configuration:
Chassis: Tyan GT20 B5191 barebone
CPU: Intel Core 2 Quad Q6600
Harddisk: Seagate 7200.10 250GB

Thursday, August 30, 2007

Content aware resizing algorithm for images

This computer graphics research has been greatly hyped lately due to the recent report on Slashdot. It is based upon a simple idea but it achieves amazing effects. Not only can it resize images with the minimum information loss or perceptible distortions, it can also be used to achieve similar effects (but not exactly the same) to image inpainting.

The paper can be read here,
Video demonstration of the algorithm, and... next one is the best
Third party implementation of the algorithm that you can download and hack.

Debugging, profiling JavaScript and detecting memory leaks by source code instrumentation

I found a paper from Microsoft Research yesterday which describes a reliable way for debugging JavaScript remotely on multiple platforms and... here comes my favorite part... reliably detecting IE6 memory leaks in JavaScript.

Why not use Drip instead? While Drip is useful for catching memory leaks, there are many cases where Drip does not work. For example, Drip 0.5 does not catch memory leaks in FCKeditor.

How about lapsed listeners? The paper has talked about lapsed listeners in section 6.2 but I can't see any solution they mentioned in the paper. Perhaps I've overlooked that.

The paper also talks about remote, multi-client performance profiling and many other nice things that could be of tremendous help in modern web application development. While products like Firebug and Tito Web Studio already provides profiling and basic debugger support, they are browser specific and the performance data applies only to the developer's workstation. It is always possible that a web application runs dramatically differently on other computers that the developer has not expected it to, and that's where the paper's approach can help.

You can read the paper here.

Monday, August 20, 2007

Kernel based virtualization on Fedora Core 7

Linux kernel running Windows XP (well, the installer, at least) via Intel Vanderpool technology. Hmm... the irony.

The installer runs noticeably slower than VMWare though. Also, there are some show stopper usability bugs with Fedora Core 7's Virtualization Manager which are stopping me from fully installing Windows XP in the VM console... thus forcing me to call the VM engine (QEMU/KVM) from the command line to do the second stage of WinXP installation.

Update: Just found the reason for the slow down. While installing Windows XP on QEMU/KVM, the user must press F5 (instead of F7 as indicated by KVM's FAQ) at the beginning of the installation and choose "Standard PC" as the computer type. The problem is documented here.

Friday, August 10, 2007

Microsoft Seadragon and Photosynth demonstartion

The first demo (Seadragon) is a ZUI implementation of a photo manager. The second demo (Photosynth) is a 3D photo stitching and modeling software.

Both are amazing stuff.

Wednesday, June 27, 2007

Moving up in the open source world

BitAnarch... WT Toolkit... FCKeditor

I've just been hired as a core developer of the FCKeditor project, currently one of the top 10 projects in It's a very popular piece of software in the Web 2.0 universe.

Alas, very few people in Hong Kong understand what I'm (and have been) doing. The business culture of Hong Kong simply don't appreciate new, cutting edge technology, seeing it as just "toys".

Sunday, June 24, 2007

Amway Pan-PRD project competition - catastrophic failure

I passed the "self-demo" part but screwed up the final presentation part. The HKUST team that went to the final competition in China is the Fung Shui Design System team. The whole failure was very funny, there was no common reason you can apply to that.

First off, I was not stressed at all - the most common reason. When was the last time you saw me being nervous of anything? It's not that I don't care, but I always have the mindset that it's useless to struggle at the last minutes - so why not relax and see how things turn out? I was very tired listening to people talking all day though. To me, talking and listening is much more tiresome than reading and writing, and even thinking.

Then, the presentation was wrong in the beginning (the judges were not looking for demonstrations in the second part), and there's the difficulty of explaining AJAX, the hundreds of current problems with AJAX app development, the toolkit itself, the theories and algorithms inside, and its business value within 10 minutes to a panel who's not been looking at this specific area recently.

So at the end... the judges looked very puzzled. Which was kind of expected seeing that I used 50 minutes in the FYP presentation at HKUST, and even with 50 minutes I was skipping lots of details. Same with that 115-page FYP report, many details missing. Current problems, others' solutions, algorithms, human factors, businesses, ... there were simply too many things to talk about for 10 minutes.

And to add a further insult to the whole screwy presentation, the judges asked me the practical uses of the project - so the whole time they didn't get anything. I happened to answer by giving them a demo of how to program it, but they were expecting me to answer how businesses could use that. EditGrid, PCMS, FCKeditor, Google Maps, Google Office... it's all in my brain. That TnC company even did the web page for iProA and they are living off AJAX technologies. I had all the answers and more, but sorry, time's up. GG.

Permanent fake job adverts, in America

Why are there so many job advertisements requiring you to know everything and 20 years of .NET experience? It's not ignorance in the employer's part... don't be so naive.

Now the exact same thing doesn't happen in Hong Kong. But then I can't help but notice the local college graduates seem to have a "single measure to fitness" mindset to job seeking, as if it's HKCEE or HKALE. The real world doesn't work like that - the "fitness" in the employer's mind can often be completely different to what you have in mind. They could be looking for someone stupid, cheap and doesn't complain.

Sunday, June 10, 2007

Lunch with Prof. Paul Chu

I, and other CSE Best FYP Award recipients, attended an award ceremony at the President's Lodge on the 8th of June. I was honored to be able to have a lunch with some of the most famous people in HKUST, such as President Chu and Prof. Roland Chin.

I drank a little bit too much wine in the lunch though. You can see my face was slightly red in the photo, and I was feeling somewhat dizzy.

Monday, May 28, 2007

Things to do...

29 May 12:30 - ECON115 final exam (Sports Hall)
31 May 09:30 - HUMA099G final exam (LTA)
1 June 12:45 - CPEG appreciation lunch (LG7)
1 June 15:00 - Meeting at TnC Ltd. Office
4 June 17:00 - Deadline for submitting Pan-PRD competition materials
8 June 12:30 - Best FYP Award Presentation
1H June - Submit industrial training logbook (long overdue)

Sunday, May 20, 2007

Final examination

Right... after the FYP presentation comes the final examination of the final semester of my UG life.

Thursday, May 17, 2007

WT Toolkit presentation at HKUST

There will be a presentation of WT Toolkit in Hong Kong University of Science and Technology this Saturday. In addition to demonstrating WT Toolkit, we (i.e. me and Marco) will also discuss the difficulties and pitfalls facing AJAX developers, and we will compare WT Toolkit with other popular AJAX toolkits like Prototype and Dojo.

Date: 19th May, 2007
Time: 17:20 - 18:00 HKT (+0800)
Venue: Hong Kong University of Science and Technology, Room 4480

Wednesday, May 16, 2007

WT Toolkit broke into SourceForge top 500

Just in time for me to present my FYP on Saturday.

Tuesday, May 15, 2007

HKUST research on web technologies

While I was skimming the proceedings of WWW2007 conference for interesting ideas (e.g. this one, a simple method of adding security to AJAX mashups) today morning, I saw a paper from HKUST:

Exploring in the Weblog Space by Detecting Informative and Affective Articles

The paper describes a method that classifies blogs into various degrees between "informative" and "affective". Informative blogs, like Alex Russell's, dispense useful information that interest the readers. Affective blogs are diaries describing things that mostly interest the author only. High quality blogs (i.e. those that people want to read) are usually informative.

The method shouldn't be treated as an absolute measure of blog quality, however. Lets take a look at a random paragraph from Joel on Software, a popular informative blog:

That's why I'm incredibly honored that they invited me to write a guest editorial about recruiting and internships in this month's issue. Thanks to professional editing, it feels a little bit polished compared to my usual style. I don't think I would write, "Ah, college." I do remember writing, "Get me a frosty cold orange juice, hand-squeezed, and make it snappy!"

Highlighted in red is one of the top feature phrases indicating affective blogs, as described by the paper. Guess what the algorithm would classify the above paragraph, and Joel's blog entries in general? I don't have the software on my hands so I can't test it and get the data, but the above paragraph has a word that is ranked as a top representative feature in the affective category, and none in the informative category. And that's not just an isolated example:

Microsoft finally put Lookout back up for download, but they sure weren't happy about it. ... The story has a happy ending.

A number of years ago a programmer friend of mine worked for a company...

... it wouldn't be such a bad thing to take Air France and change planes at CDG.

Among other things, this week I've been working on the new office design with our architect, Roy Leone [flash site].

Microsoft did the only thing that made sense...

I've been nattering on about this topic for well over 5000 words and I don't really feel like we're getting anywhere.

Thanks to professional editing, it feels a little bit polished compared to my usual style.

I had a chance to visit 7 World Trade Center today...

The "like" example above assumes that there's no word sense disambiguation (or something similar) in their algorithm. Since the "like" in the paper and the "like" in my example has different meanings. But hey, the paper didn't mention WSD at all.

On the other hand, the only informative features I could find from Joel's blog today are "project" and "report". They appear much less frequently than affective features in Joel's blog.

Joel's blog, however, is widely regarded as highly informative by software engineers. It's just that Joel prefers to write his blog entries in an informal and personal style. But anyway, this is still an interesting reading in seeing how computers can attempt to "understand" and filter information these days.

Monday, May 14, 2007

WT Toolkit broke into SourceForge top 1000

Well, a day of top 1000 isn't too hard to do. BitAnarch broke into top 10 for a few days in 2003. But still, this is good. Considering the 190,000 ranked projects in SourceForge, we're firmly in the top 1% of all projects.

Sunday, May 13, 2007

AJAX frameworks are NOT pointless

This was a response I posted to Slashdot a week ago. Why am I reposting it here? It's because I found my own post back when I was searching on Google today. In particular, I found other bloggers and websites bookmarking or discussing the stuff I wrote. So I guess the stuff I wrote was useful? Then maybe you want to know about it too. So, here it is:

There are many little funny things that just happens when you're coding a web application in JavaScript without a framework/library/toolkit helping you. Unless you're really an AJAX/JavaScript wizard, coding an AJAX-enabled web application on your own and mixing online code receipts is a very dangerous thing to do.

Browser inconsistencies

This is the most obvious one, but only the entry to the rabbit hole. If you are not familiar with the example (maybe not exactly the same, but any AJAX web developer worth his salt should have seen one like that) I give below, then please, PLEASE, do yourself, your fellow developers and your users a favor, resist the urge to hack things together for once, use a mature AJAX framework.

An important part of AJAX is that you need to update what is displayed on the web browser in the client side (by JavaScript), without refreshing the page. This implies that you're very likely to have to create and destroy DOM nodes on the fly. Now, how do you create a radio button in JavaScript?

How about...

var node = document.createElement("input");
node.type = "radio" = ...
node.value = ...
That's what you would do if you follow the DOM standard. But sorry, this does not work. Try to create a radio button with the above code segment in Internet Explorer 6, you'll get a broken radio button - you can't select it. The correct way to create a radio button by DOM manipulation is described in this MSDN article []:

newRadioButton = document.createElement("<INPUT TYPE='RADIO' NAME='RADIOTEST' VALUE='Second Choice'>")

Memory leaks

The last one was easy. Do you know you can make a web application that leaks memory like a sieve in Internet Explorer 6 by making a simple circular reference like the following one?

var node = document.createElement("div");
node.someAttr = node;
If you're a good programmer, I might have sounded an alarm in your head right now - any circular references involving DOM nodes in IE6 results in memory leaks that persist after URL changes or page refreshes - unless you use an AJAX toolkit that takes care of the issue for you. Have you assigned a DOM as an attribute value under another DOM node in the past? Yes? Then you'd better check your web application for memory leaks with Drip [], now.

What's more, it's not just assigning DOM nodes as attributes that would result in memory leaks, closures in JavaScript can also form circular references and cause memory leaks. What makes closures particularly dangerous is that circular references with closures are not easy to spot. For example, the following code segment leaks:

var node = document.createElement("div");
var clickHandler = function(){};
node.onclick = clickHandler;
Looks innocent enough, but you've already formed a leaky circular reference here. node->clickHandler->node.

For more information about memory leaks under IE6, read these:

Mihai Bazon's blog entry []
MSDN's lengthy and confusing description of the problem []

The XMLHttpRequest object is not as simple as you think

Much of the magic of AJAX comes from the XMLHttpRequest object (or its ActiveX equivalent, or an iframe, etc.), right? Sure. If you're only doing something simple via AJAX (like, updating the server time), then you can just copy an XMLHttpRequest code snippet from sites like this [] and hack away, right?

Wrong! Those XMLHttpRequest code snippets are one of the very reasons why people are thinking AJAX as a hack - it sometimes doesn't work! The XMLHttpRequest code snippet given on Apple's site can be broken in commonly encountered situations, and you can simulate that yourself:

  1. Write a simple AJAX web application that retrieves and displays the current server time on a web browser using Apple's code snippet.

  2. Test it yourself under normal conditions. So it works and it's safe to use, right? Let's see...

  3. Change your computer's routing table such that you can have no route to the web server.

  4. Now test your application again in Firefox. Your application should fail. But does it fail gracefully? No. You see an error message in Firefox's error console stating that the XMLHttpRequest object's status attribute cannot be read. If you have coded something to handle AJAX request failures, your handler won't be called.

Why is that happening? It is because, any socket errors happening during an AJAX request will cause the onreadystatechange handler to be called under Firefox, yet the status attribute cannot be read. Reading it causes a JavaScript error and stops JavaScript execution (unless you add a try...catch... block there, but that assumes you already know about the problem so it's moot)! Under Internet Explorer, reading the status attribute in the same situation gives you the socket error code instead. Don't know about these stuff? Please, use a mature AJAX framework.

Performance problems

Coding AJAX applications is just like writing things in C++ or Java - so long as you're using efficient algorithms, your application should run fast, right?

Of course, you are wrong again. Let's say... in some part of your application, you want to concatenate a lot of string fragments together to form a long string in a for loop, how do you do it? How about...

var targetString = "";
for(var i=0;i<someArray.length;i++)
targetString += someArray[i];
That's the way most programmers would think of, intuitively. But the performance of that sucks under Internet Explorer. The correct way to combine strings under JavaScript is to use the Array.join() operation. You can read more about this here []. The optimization I talked about is also implemented in Dojo Toolkit (kudos to Alex Russell), and I believe any reasonably robust AJAX framework should have it too. Not knowing about such problems, had you hacked together a fairly sophisticated AJAX web application yourself, you would be running into performance hell sooner or later.

Taking 646ms to combine strings still doesn't sound very slow for you, right? There are many more performance traps in JavaScript. Do you know there's a very significant performance difference between the following two code snippets?

First code snippet:

// placing 5000 "Hello World" messages in random positions
for(var i=0;i<5000;i++)
var node = document.createElement("div");
node.appendChild(document.createTextNode("Hello World!"));
document.body.appendChild(node); = "absolute"; = parseInt(Math.random() * 800) + "px"; = parseInt(Math.random() * 800) + "px";
Second code snippet:

// placing 5000 "Hello World" messages in random positions
for(var i=0;i<5000;i++)
var node = document.createElement("div");
node.appendChild(document.createTextNode("Hello World!")); = "absolute"; = parseInt(Math.random() * 800) + "px"; = parseInt(Math.random() * 800) + "px";
The only difference between the two code snippets is the placement of the document.body.appendChild() line. But if you actually test them out, the second code snippet is much faster, under both IE and Firefox. The performance difference has nothing to do with your algorithms - you just shuffled one line of code around; it has to do with how the browser render the randomly placed DIV nodes. Ever wondered why your hacked together web application is taking half a minute running JavaScript after all the files are loaded?

So, unless you're already a programming god or don't mind spending lots of time solving bugs that you shouldn't have solved; you really, really should use some of these AJAX frameworks if you're making anything fairly sophisticated with AJAX.

Friday, May 11, 2007

Feeling sick today...

My throat felt a little dry after eating some Bolognese spaghetti (it's just a $25 dish at fast food restaurants, not expensive stuff) for lunch yesterday. I thought that was normal, coz the spaghetti was somewhat spicy, and the soup was spicy too. I went on to attend lessons and meetings as usual. I slept at 6pm that day (yes, you read that correctly, 6pm, I have crazy sleeping times).

I got up by 2am today, and that little dryness I had in my throat turned into pain. Oops? Just what I had done wrong? I certainly ain't overworking myself these days. But I have a presentation to do today, that sucks. :(

Thursday, May 10, 2007 goes down for a day

There's no electricity to my home today due to a routine checkup. Server will get back online tomorrow.

Sunday, May 6, 2007

Delayed execution idea scrapped, but the optimizations stayed

After some more thoughts, delayed execution is found to be stupid. It requires the web developer to change their code to get the benefits, and it sometimes breaks your application.

What's this delayed execution stuff about originally? It's actually a trick to get around browser inefficiencies in rendering DOM nodes with CSS attributes.

Consider the following two code snippets:

// placing 5000 "Hello World" messages in random positions
for(var i=0;i<5000;i++)
var node = document.createElement("div");
node.appendChild(document.createTextNode("Hello World!"));
document.body.appendChild(node); = "absolute"; = parseInt(Math.random() * 800) + "px"; = parseInt(Math.random() * 800) + "px";

// placing 5000 "Hello World" messages in random positions
for(var i=0;i<5000;i++)
var node = document.createElement("div");
node.appendChild(document.createTextNode("Hello World!")); = "absolute"; = parseInt(Math.random() * 800) + "px"; = parseInt(Math.random() * 800) + "px";

Both code snippets place 5000 randomly positioned "Hello World!" messages in the browser window. The two code snippets differ only in the placement of the document.body.appendChild() line. Running the first code snippet in Firefox can take 1 minute or more, but running the second one takes only a few seconds. The second code snippet provides a more than 10x speedup compared to the first code snippet.

Similar phenomenon can be observed in Internet Explorer also, but only with much more complicated logic, so we'll not go over that. But anyway, the moral of the story is, modifying some CSS attributes (especially positioning attributes) is harmful after the DOM node is already visible.

So what did the scrapped delayed execution idea has to do with these browser weirdnesses? The delayed execution idea was meant to help in batch widget creation and batch CSS style manipulations. e.g. when you're creating 100 widgets in a single pass. It speeds up widget creation or CSS style manipulation by making a common ancestor DOM node of the widgets being manipulated/created invisible before executing the performance sensitive code, and making the ancestor node visible again after execution.

Sounds like a hack - yes it is a hack. It sometimes breaks your application code, it requires you to change your application code to use it. But as shown in the videos, it worked.

Now, the hack is scrapped, before it is even released. And that's because we've got a more consistent way of implementing the same optimization in WT Toolkit, without the need of using hacks.

So what do we have for 0.3.3 now:

  1. Massively increased widget creation performance in Internet Explorer, without needing the developer to change a single line of code.
  2. No performance improvement in Firefox if you don't change your application code... Oops?! But hey, that would be the same if we implemented the delayed execution hack.

But what if you want to make your WT Toolkit application run faster in Firefox? Just pass null as the parentWidget argument to the widget constructor as much as possible, and add the widget to the document tree only after you've done all the CSS manipulations.

Say, if you have

var n = new wtButton(myParent, "Yes!");
n.setAbsolutePosition(x, y);

Then, the optimized version would be

var n = new wtButton(null, "Yes!");
n.setAbsolutePosition(x, y);

Actually, you can perform the manual optimization with WT Toolkit 0.3.2 too.

Saturday, May 5, 2007

The art of presentations

I watched a total of four presentations and did one myself last Friday. Out of the four presentations that I saw, three were group presentations done for course projects, and the other one is a solo presentation done by an engineer in IELM311.

In the group presentations I watched, there was one presenter that was extremely remarkable - remarkably bad and unnatural. Good presentations feel like an old friend talking to you, even though you've never met the presenter beforehand. This guy... he spoke "perfect" English during the whole presentation, more perfect than native speakers - there was not even the slightest pause in his presentation. He just kept talking talking and talking, jumping around mechanically as if those were gestures, with a smile always so wide on his face that he looked schizophrenic.

But aren't these stuff what our English teachers taught? Of course, nobody taught you to deliver your gesture mechanically, yet there's always somebody who goes too far in following those lessons.

The presenter on Friday got me recalling another presenter I saw when I was in a public speaking competition in form 6 - there was another presenter from another top secondary school that acted exactly like him. That presenter also spoke perfect English - with appropriate pauses this time, even. But there was something very unnatural with him - his body was swinging like a pendulum the whole time during his presentation. Looking at him makes you feel like attending a rave party. The judge (who was a foreigner) gave him a very low grade as a result.

What is a good presentation? I've seen good and exciting presentations where the presenter didn't even speak good English (e.g. Tam Wai Ho's presentation in IELM311). The differentiating quality between good presenters and mediocre presenters is their ability to make the audience feel comfortable and keep them thinking instead of falling asleep. When you're seeing a product presentation and you're thinking, "Hey, this product seems amazing, what uses do I have for it? How did they do it? Are there any modifications that I'll need if I were to buy it?", then you're looking at a good presenter. This unique quality cannot be emulated by simply speaking good English (you can even do without that) or having tons of gestures in your presentation, as your English teacher would have taught you. But how did the good presenters do that? I wish I know. But understanding the audience should be the first step, since a good presentation directs the thoughts of the audience.

Where are the good presentations? Apple have them.

Thursday, May 3, 2007

WT Toolkit FYP Poster on display in HKUST Academic Concourse

If you come to HKUST, you can find our poster at the "Software Technologies" section of the CS FYP poster displays, in front of Lecture Theatres A and B.

Wednesday, May 2, 2007

Information wants to be free: 09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-c0


So, the DeCSS debacle all over again. Somebody print that on a t-shirt.

Monday, April 30, 2007

Why is Google Bot not crawling WT Toolkit website?

To monitor traffic to WT Toolkit's website, I sneaked in some PHP code that logs down information about incoming visitors from 24th April - just 6 days ago.

At an early stage of a project, I wasn't too concerned about human visitors (that aren't too many, honestly), I was concerned about the search engine bots. The log file I got indicated that Googlebot would visit my site daily, but it stopped at the main page and did not crawl further. So every day, there's an isolated Googlebot log entry visiting the main page once and didn't do anything else.

2007-04-25 22:05:58 Mozilla/5.0 (compatible; Googlebot/2.1; + /xoops/modules/wtHome/ ref=
That does not make sense, there are plenty of simple links on my front page that any search engine crawler should be able to crawl. But then, these isolated log entries repeated every day, Google just didn't crawl my project website. What's worse, searching for "" on Google still gives me the "Generated Javascript Documentation" result, which indicates that Google completely ignored the new project website despite that fact that they have seen the main page a few times already.

While there's some fancy Javascript trickery on my project website (like the project logo), most of the project site is written in traditional PHP/HTML such that search engine crawlers can easily understand it. The project website looks perfectly legible even if you disabled Javascript. What can possibly go wrong here?

I found a tool today, that (claims to be) is able to simulate what Googlebot sees from your website.

"Be The Bot"

So I entered "" into the tool, and surprise! It says Googlebot sees a completely empty page there.

How could that happen? Immediately I thought of the redirecting index.php I put up in the root directory of WT Toolkit's project website. It only had one line of PHP code (three lines if you count the php opening and closing brackets):
header("location: xoops/");
I put it there because I installed XOOPS (which is the CMS behind WT Toolkit's project website) under the xoops directory, and not the root directory. I did that for convenience. Going inside "xoops/" would give you yet another redirection, which gets you to the "Home" module's URL "/xoops/module/wtHome/".

Was Googlebot not able to process the redirection? It seems to be able to follow the redirections, otherwise it wouldn't be visiting "/xoops/modules/wtHome/" in the log file. Be The Bot's simulation also left the same log entry in my site log file, however.

So I entered the URL without redirections to Be The Bot:

This time, it displayed the project website correctly, albeit without the images.

Something was definitely wrong there. The log file indicates that Be The Bot was redirected to "/xoops/modules/wtHome" successfully, yet it couldn't retrieve the HTML correctly. Without redirection, the correct HTML content was retrieved. XOOPS might be part of the problem here, but I'm not sure.

Anyway, this means I have to restructure the project web site a bit so that the main page can be retrieved without redirection. This is not difficult... Done. No redirections for the main page now.

Let's see if Google could crawl it correctly tomorrow or a few days later.

WT Toolkit 0.3.3 Performance Optimizations

A problem plaguing WT Toolkit ever since its birth is performance. WT Toolkit 0.1.x and 0.2.x felt slow all the time because of the garbage collector running in the background. 0.3.0 eliminated the background garbage collector and yet we kept it automatic so the programmer doesn't have to care about lapsed listeners (well, most of the time).

But at 0.3.2, our performance is still bad compared to other popular toolkits like Dojo Toolkit and Qooxdoo. Widget creation latencies increase linearly in a very quick manner as the number of on-screen widgets increases. The effect isn't very noticeable under Firefox, but WT Toolkit 0.3.2 definitely felt slow under Internet Explorer 6 or 7.

Well... not anymore for the upcoming WT Toolkit 0.3.3! Even though I've already submitted my FYP final report, new work has begun on performance optimizations! Yeah, baby!

How much have we optimized? Let's see what a little trick called "delayed execution" (available in WT Toolkit 0.3.3) can do...

Before optimizations:

After optimizations:

As a result of the work on performance optimizations, I left the work on WT Toolkit website to Marco. He couldn't complete it on 29th April because he had other academic work to do at that time. But anyway, we're having steady progress on WT Toolkit's website, we'll be seeing more and more amazing things as time goes on. :-)

Sunday, April 29, 2007


Group norm 的確係不好挑戰的

算, 我係 hea 的. 我無野好講.

Friday, April 27, 2007

FYP Poster for WT Toolkit

Drawn and color-printed out last night with GIMP and Inkscape. Marco pasted the individual A4 sheets on the poster board and handed it in to CSE department.

Wednesday, April 25, 2007

Plans for WT Toolkit

Now that we are fairly feature complete, time for some publicity.

27/4/2007 - FYP Poster
29/4/2007 - Completed WT Toolkit Website
1/5/2007 - Submit WT Toolkit to Ajaxian
1/5/2007 - Submit WT Toolkit to freshmeat
1/5/2007 - Submit WT Toolkit to Open Directory Project
13/5/2007 - Visual programming demo for WT Toolkit
19/5/2007 - FYP Code CD (Not sure what contents are needed, probably a Linux LiveCD)
21/5/2007 - FYP Presentation (schedule - we are group DE3)

Monday, April 23, 2007

Watched Michael and John's FYT pre-presentation today

Why did I go to the presentation? That's because I talked to Michael about his research today morning and I found it interesting.

The topic was implementing an efficient DHT on an ad-hoc mobile network. Efficient DHTs for fixed-line, broadband Internet are already there, like Chord and Pastry, and everybody is using those knowingly or unknowingly. Michael's research is about how to make DHTs efficient on ad-hoc mobile networks, which is much harder than implementing DHTs on top of our everyday IP network. Some difficulties include:

1. Message routing. Mobile nodes do not and should not have fixed routes like our desktop computers. Although routing on the physical network can be partially solved by things like AODV, you still have to make sure the hops on the DHT's overlay network are efficient. e.g. assuming you've got perfect data routing in the physical network, it's still useless if one of the DHT hops goes to another country with a 12-hour timezone difference - your message will be hopping across many many many nodes in the physical network for just one DHT hop.

2. Bandwidth overhead. (??) I don't know how bad the problem is since I haven't seen the simulations myself. Probable causes I've heard are AODV-style flooding and Bloom filter inefficiencies. Gnutella-style implementations were mentioned for the audience to point and laugh at, I guess.

One of the related papers to M and J's work:

Now what's Michael and John's proposed solution... They proposed a DHT that's organized in a tree-like fashion, instead of the ring/skip list type seen in Chord or Pastry. The root node in the tree is called a "landmark", which should have a fixed location and has no extra hardware resource requirements when compared to other nodes. Their algorithm takes care of the physical routing as well so there's no need for AODV flooding or playing with Dijkstra's algorithm as in SrcRR. No AODV, no route request flooding, less bandwidth overhead. Bloom filters are used in narrowing/selecting paths in the tree, which is very intuitive and easy to understand (just a simple trick with bits, with the hard probability maths done for you 30 years ago), despite the seemingly cryptic name.

Prof. Gary Chan asked lots of questions during the presentation, he had a very sharp sense for things that seemed to be "strange" or inefficient. The object duplication algorithm (put in there to make p2p swarming possible) in John's presentation was one of the quirks Gary spotted, the algorithm seemed like a placeholder, I guessed it shouldn't be too hard to correct though.

So what I've got from the seminar... let's see
1. Revision of some old algorithms (Bloom filters... I almost forgot them completely, never used them once in the past few years), learned some new ones, and some new problems.
2. The 40 minutes presentation time I've got for my FYP is preciously short. Michael and John's presentation went for like 1.5 hours, and they were still missing on some details.
3. I need to keep my audience interested by doing demonstrations, with both WT Toolkit and WT Toolkit's competitors.

Regular Expressions - how good theory is ignored in popular software

You think the regular expression implementation in Java, Perl, Python, PHP, Ruby and PCRE (which is a C library) should have been refined many many times and thus highly optimized? Think again.

The title of the article is "Regular Expression Matching Can Be Simple And Fast", but what's more interesting is the subtitle - "(but is slow in Java, Perl, PHP, Python, Ruby, ...)". Slow, how slow? Look at the first graph of the article, for some pattern matching inputs, Perl 5.8.7's built-in regular expression matching is millions times slower than a 40-year-old algorithm.

How can that happen? Well... it could be argued that the expression used in the example is a pathological case. But is it a pathological problem in theory? i.e. not belonging to P, or belonging to P with a very large exponent? Well, obviously not. Otherwise, the 40-year-old algorithm wouldn't be able to perform the matching quickly as well.

What actually happened here was this... all the popular programming language developers (Java, Python, Perl, PHP, etc.) copied/borrowed their implementation from a popular extended regular expression matching algorithm that was known to be "fast enough", but not known to be provably fast. 40+ years of theories of finite automata went into the trash bin when programmers (including the guy who invented the correct algorithm 40 years ago!) needed to release softwares fast and neglected to spend time to think about the mathematics behind.

The regular expression engine that the article's author described was only a very simple one, however. Can it be expanded to processing modern extended regular expressions without going into the same performance hell of Perl, PCRE, Python, etc.? The author gave some justifications that it could, but he was very light on the details. Even if he has missed out some details that makes his proposal infeasible, however, it still stands that the regex engines we're using every day are far from optimal.

Biometrics a fad?

How secure is it to use your fingerprint as an authentication token? Much research has been done to that, so it must be secure, right?

But wait a moment... you leave your fingerprints everywhere, every day. It's pretty much public information. And using public information as a secret key sounds like a dumb idea, doesn't it?

Yup... it's dumb. Everybody can crack a fingerprint scanner with a printer, transparency slides, PCB etching tools, and any moldable plastics. It's at its heart security by obscurity. And it's remarkable how much bullshit went into that "unbreakable door lock" in the video. Using moisture as an authentication condition?! On come on, is moisture really so scarce or secret on Earth? Now what's next? Iris scanners? Your iris pattern can be captured everywhere, in 3D, even... it might be a little bit more difficult to capture and reproduce, but it's public information, nonetheless. If what they are betting on is the resolution of cameras (which can definitely be improved as time goes on), then they're relying on security by obscurity.

It's remarkable how far snake oil technologies can make into the market, government institutions, and even academia.

By the way, the video rocks! It feels like reading an early issue of the Phrack magazine (much of the hacks don't work anymore, of course. But wait... the fork bomb still works ) or some of the classical papers/theses (like, Chord). Easy to read, concrete procedures, concrete results, and profound implications.

WT Toolkit listed under

Just saw this when I was searching in Google. Good to know there are people who know my project exists, and there are other people doing the same thing as me.


Among the projects, the only other ones I can recognize are CK-ERP and RMSS. CK-ERP's author, C.K. Wu, has worked on the project for many years. He posted many advertisements in local newsgroups. Sadly, there's rarely any public replies to him. There should have been quite a number of people talking to him privately though, as ERP systems are generally very expensive and have major impact on business.