Monday, December 31, 2007
Thursday, December 27, 2007
So now I'm having to do the same research all over again. But this time the problem is harder - for WT Toolkit, I could simply look over the existing literature about the nature of the IE memory leaks, build the toolkit in a way that it can be mathematically proved that it cannot leak under certain circumstances (the "circumstances" here have to be flexible enough to make the toolkit practical, of course), and only develop web applications under the allowed circumstances. FCKeditor, on the other hand, does not have a toolkit built with such provisions.
Saturday, December 22, 2007
Saturday, December 8, 2007
How many times you have encountered this: You want to write a blog post - it may be on Blogger, it may be on Xanga, it may be on Wordpress - but the HTML editor provided by the blog site does not do what you want. Say, you want to draw a table with Xanga's HTML editor, which button is that? Well, sorry, there's no such thing. How about making indented list items in Blogger? No can do. Wanting to add a Flash animation in your blog post to spice things up a bit? Well, you need to resort to editing HTML source - a tedious, painful, and error prone process for non-professionals.
WriteArea is a Firefox plugin that lets you write your blog posts, or anything requiring the use of HTML, with FCKeditor - the most popular web based rich text editor of the world. It is a new Firefox plugin that lets you convert any HTML text areas in any web site into a full-featured FCKeditor editing dialog. With it, you can create tables in your Xanga posts, you can create indented list items in Blogger, you can add Flash animations to your blog posts without fiddling with arcane HTML code, you can have the ability to write things in subscript and superscript, and the ability to edit the properties of tables and images with right click menues... WriteArea is a little plugin gives you a whole world of possibilities in web authoring.
WriteArea can be activated from any text area boxes of any website. For blog sites, that is usually the text area provided by the "Edit Source" or "Edit HTML" feature. To give you a rough idea of how WriteArea works, I'll give you an example how I wrote this post with WriteArea in Blogger.
Step 1: Switch to "Edit HTML" in Blogger, and activate the plugin by right clicking on the text area.
Step 2: Write your message in the WriteArea dialog that pops up.
Step 3: Click "Save" in the WriteArea dialog to save the HTML code into the blog site's text area. If you want to edit your post again just right click on the text area and activate WriteArea again.
Step 4: Publish your blog post when you've done editing it.
WriteArea can be downloaded here. Since the plugin is new, it is being sandboxed right now and is available only to registered users in http://addons.mozilla.org. You'll also need to enable showing sandboxed plugins in your user preferences after you've registered and logged into the Mozilla plugin site before downloading it. A plugin being sandboxed also means it is a beta and thus it's likely it will have some bugs. So, be a responsible open source software user, report any bugs you've found and write a favorable review for it if you think it is useful.
Tuesday, September 18, 2007
It is not a computer science lecture, but interestingly, it tells me more about how far the infotech industry has come - CPUs, compilers, operating systems, networking, 3D graphics, artificial intelligence; accounting packages, customer relationship management systems (CRM), enterprise resource planning (ERP), electronic communication network (ECN), quant funds, Google... it all started from people playing with boxes without keyboards, monitors or even printers. The substance behind the tech IPO crazies, tech bubbles, and Web 2.0, is not just hard work. Simply working hard cannot do these.
"Do you want to spend the rest of your life selling sugared water, or do you want a chance to change the world?"
And Apple's 1984 advertisement... it may look like just another annoying advertisement in the eyes of the public. But for anyone who's read George Orwell, he knows whoever behind the advertisement, has a far greater goal than just earning money.
It would be a damned shame if I ever forget about these things.
Episode 1, Part 1
Episode 1, Part 2
Episode 1, Part 3
Episode 1, Part 4
Episode 1, Part 5
Episode 1, Part 6
Episode 2, Part 1
Episode 2, Part 2
Episode 2, Part 3
Episode 2, Part 4
Episode 2, Part 5
Episode 2, Part 6
Episode 3, Part 1
Episode 3, Part 2
Episode 3, Part 3
Episode 3, Part 4
Episode 3, Part 5
Episode 3, Part 6
Thursday, September 6, 2007
Bought it yesterday, although the server chassis had to be booked in advance for 2 to 3 weeks.
The server will be co-located in one of HKNet's data centers next week.
Chassis: Tyan GT20 B5191 barebone
CPU: Intel Core 2 Quad Q6600
RAM: DDR2 1GB
Harddisk: Seagate 7200.10 250GB
Thursday, August 30, 2007
This computer graphics research has been greatly hyped lately due to the recent report on Slashdot. It is based upon a simple idea but it achieves amazing effects. Not only can it resize images with the minimum information loss or perceptible distortions, it can also be used to achieve similar effects (but not exactly the same) to image inpainting.
The paper can be read here,
Video demonstration of the algorithm, and... next one is the best
Third party implementation of the algorithm that you can download and hack.
Why not use Drip instead? While Drip is useful for catching memory leaks, there are many cases where Drip does not work. For example, Drip 0.5 does not catch memory leaks in FCKeditor.
How about lapsed listeners? The paper has talked about lapsed listeners in section 6.2 but I can't see any solution they mentioned in the paper. Perhaps I've overlooked that.
The paper also talks about remote, multi-client performance profiling and many other nice things that could be of tremendous help in modern web application development. While products like Firebug and Tito Web Studio already provides profiling and basic debugger support, they are browser specific and the performance data applies only to the developer's workstation. It is always possible that a web application runs dramatically differently on other computers that the developer has not expected it to, and that's where the paper's approach can help.
You can read the paper here.
Monday, August 20, 2007
Linux kernel running Windows XP (well, the installer, at least) via Intel Vanderpool technology. Hmm... the irony.
The installer runs noticeably slower than VMWare though. Also, there are some show stopper usability bugs with Fedora Core 7's Virtualization Manager which are stopping me from fully installing Windows XP in the VM console... thus forcing me to call the VM engine (QEMU/KVM) from the command line to do the second stage of WinXP installation.
Update: Just found the reason for the slow down. While installing Windows XP on QEMU/KVM, the user must press F5 (instead of F7 as indicated by KVM's FAQ) at the beginning of the installation and choose "Standard PC" as the computer type. The problem is documented here.
Friday, August 10, 2007
Wednesday, June 27, 2007
I've just been hired as a core developer of the FCKeditor project, currently one of the top 10 projects in SourceForge.net. It's a very popular piece of software in the Web 2.0 universe.
Alas, very few people in Hong Kong understand what I'm (and have been) doing. The business culture of Hong Kong simply don't appreciate new, cutting edge technology, seeing it as just "toys".
Sunday, June 24, 2007
I passed the "self-demo" part but screwed up the final presentation part. The HKUST team that went to the final competition in China is the Fung Shui Design System team. The whole failure was very funny, there was no common reason you can apply to that.
First off, I was not stressed at all - the most common reason. When was the last time you saw me being nervous of anything? It's not that I don't care, but I always have the mindset that it's useless to struggle at the last minutes - so why not relax and see how things turn out? I was very tired listening to people talking all day though. To me, talking and listening is much more tiresome than reading and writing, and even thinking.
Then, the presentation was wrong in the beginning (the judges were not looking for demonstrations in the second part), and there's the difficulty of explaining AJAX, the hundreds of current problems with AJAX app development, the toolkit itself, the theories and algorithms inside, and its business value within 10 minutes to a panel who's not been looking at this specific area recently.
So at the end... the judges looked very puzzled. Which was kind of expected seeing that I used 50 minutes in the FYP presentation at HKUST, and even with 50 minutes I was skipping lots of details. Same with that 115-page FYP report, many details missing. Current problems, others' solutions, algorithms, human factors, businesses, ... there were simply too many things to talk about for 10 minutes.
And to add a further insult to the whole screwy presentation, the judges asked me the practical uses of the project - so the whole time they didn't get anything. I happened to answer by giving them a demo of how to program it, but they were expecting me to answer how businesses could use that. EditGrid, PCMS, FCKeditor, Google Maps, Google Office... it's all in my brain. That TnC company even did the web page for iProA and they are living off AJAX technologies. I had all the answers and more, but sorry, time's up. GG.
Now the exact same thing doesn't happen in Hong Kong. But then I can't help but notice the local college graduates seem to have a "single measure to fitness" mindset to job seeking, as if it's HKCEE or HKALE. The real world doesn't work like that - the "fitness" in the employer's mind can often be completely different to what you have in mind. They could be looking for someone stupid, cheap and doesn't complain.
Sunday, June 10, 2007
I, and other CSE Best FYP Award recipients, attended an award ceremony at the President's Lodge on the 8th of June. I was honored to be able to have a lunch with some of the most famous people in HKUST, such as President Chu and Prof. Roland Chin.
I drank a little bit too much wine in the lunch though. You can see my face was slightly red in the photo, and I was feeling somewhat dizzy.
Monday, May 28, 2007
31 May 09:30 - HUMA099G final exam (LTA)
1 June 12:45 - CPEG appreciation lunch (LG7)
1 June 15:00 - Meeting at TnC Ltd. Office
4 June 17:00 - Deadline for submitting Pan-PRD competition materials
8 June 12:30 - Best FYP Award Presentation
1H June - Submit industrial training logbook (long overdue)
Sunday, May 20, 2007
Thursday, May 17, 2007
Date: 19th May, 2007
Time: 17:20 - 18:00 HKT (+0800)
Venue: Hong Kong University of Science and Technology, Room 4480
Wednesday, May 16, 2007
Tuesday, May 15, 2007
Exploring in the Weblog Space by Detecting Informative and Affective Articles
The paper describes a method that classifies blogs into various degrees between "informative" and "affective". Informative blogs, like Alex Russell's, dispense useful information that interest the readers. Affective blogs are diaries describing things that mostly interest the author only. High quality blogs (i.e. those that people want to read) are usually informative.
The method shouldn't be treated as an absolute measure of blog quality, however. Lets take a look at a random paragraph from Joel on Software, a popular informative blog:
That's why I'm incredibly honored that they invited me to write a guest editorial about recruiting and internships in this month's issue. Thanks to professional editing, it feels a little bit polished compared to my usual style. I don't think I would write, "Ah, college." I do remember writing, "Get me a frosty cold orange juice, hand-squeezed, and make it snappy!"
Highlighted in red is one of the top feature phrases indicating affective blogs, as described by the paper. Guess what the algorithm would classify the above paragraph, and Joel's blog entries in general? I don't have the software on my hands so I can't test it and get the data, but the above paragraph has a word that is ranked as a top representative feature in the affective category, and none in the informative category. And that's not just an isolated example:
Microsoft finally put Lookout back up for download, but they sure weren't happy about it. ... The story has a happy ending.
A number of years ago a programmer friend of mine worked for a company...
... it wouldn't be such a bad thing to take Air France and change planes at CDG.
Among other things, this week I've been working on the new office design with our architect, Roy Leone [flash site].
Microsoft did the only thing that made sense...
I've been nattering on about this topic for well over 5000 words and I don't really feel like we're getting anywhere.
Thanks to professional editing, it feels a little bit polished compared to my usual style.
I had a chance to visit 7 World Trade Center today...
The "like" example above assumes that there's no word sense disambiguation (or something similar) in their algorithm. Since the "like" in the paper and the "like" in my example has different meanings. But hey, the paper didn't mention WSD at all.
On the other hand, the only informative features I could find from Joel's blog today are "project" and "report". They appear much less frequently than affective features in Joel's blog.
Joel's blog, however, is widely regarded as highly informative by software engineers. It's just that Joel prefers to write his blog entries in an informal and personal style. But anyway, this is still an interesting reading in seeing how computers can attempt to "understand" and filter information these days.
Monday, May 14, 2007
Friday, May 11, 2007
I got up by 2am today, and that little dryness I had in my throat turned into pain. Oops? Just what I had done wrong? I certainly ain't overworking myself these days. But I have a presentation to do today, that sucks. :(
Thursday, May 10, 2007
Sunday, May 6, 2007
What's this delayed execution stuff about originally? It's actually a trick to get around browser inefficiencies in rendering DOM nodes with CSS attributes.
Consider the following two code snippets:
// placing 5000 "Hello World" messages in random positions
var node = document.createElement("div");
node.style.position = "absolute";
node.style.left = parseInt(Math.random() * 800) + "px";
node.style.top = parseInt(Math.random() * 800) + "px";
// placing 5000 "Hello World" messages in random positions
var node = document.createElement("div");
node.style.position = "absolute";
node.style.left = parseInt(Math.random() * 800) + "px";
node.style.top = parseInt(Math.random() * 800) + "px";
Both code snippets place 5000 randomly positioned "Hello World!" messages in the browser window. The two code snippets differ only in the placement of the document.body.appendChild() line. Running the first code snippet in Firefox can take 1 minute or more, but running the second one takes only a few seconds. The second code snippet provides a more than 10x speedup compared to the first code snippet.
Similar phenomenon can be observed in Internet Explorer also, but only with much more complicated logic, so we'll not go over that. But anyway, the moral of the story is, modifying some CSS attributes (especially positioning attributes) is harmful after the DOM node is already visible.
So what did the scrapped delayed execution idea has to do with these browser weirdnesses? The delayed execution idea was meant to help in batch widget creation and batch CSS style manipulations. e.g. when you're creating 100 widgets in a single pass. It speeds up widget creation or CSS style manipulation by making a common ancestor DOM node of the widgets being manipulated/created invisible before executing the performance sensitive code, and making the ancestor node visible again after execution.
Sounds like a hack - yes it is a hack. It sometimes breaks your application code, it requires you to change your application code to use it. But as shown in the videos, it worked.
Now, the hack is scrapped, before it is even released. And that's because we've got a more consistent way of implementing the same optimization in WT Toolkit, without the need of using hacks.
So what do we have for 0.3.3 now:
- Massively increased widget creation performance in Internet Explorer, without needing the developer to change a single line of code.
- No performance improvement in Firefox if you don't change your application code... Oops?! But hey, that would be the same if we implemented the delayed execution hack.
Say, if you have
var n = new wtButton(myParent, "Yes!");
Then, the optimized version would be
var n = new wtButton(null, "Yes!");
Actually, you can perform the manual optimization with WT Toolkit 0.3.2 too.
Saturday, May 5, 2007
In the group presentations I watched, there was one presenter that was extremely remarkable - remarkably bad and unnatural. Good presentations feel like an old friend talking to you, even though you've never met the presenter beforehand. This guy... he spoke "perfect" English during the whole presentation, more perfect than native speakers - there was not even the slightest pause in his presentation. He just kept talking talking and talking, jumping around mechanically as if those were gestures, with a smile always so wide on his face that he looked schizophrenic.
But aren't these stuff what our English teachers taught? Of course, nobody taught you to deliver your gesture mechanically, yet there's always somebody who goes too far in following those lessons.
The presenter on Friday got me recalling another presenter I saw when I was in a public speaking competition in form 6 - there was another presenter from another top secondary school that acted exactly like him. That presenter also spoke perfect English - with appropriate pauses this time, even. But there was something very unnatural with him - his body was swinging like a pendulum the whole time during his presentation. Looking at him makes you feel like attending a rave party. The judge (who was a foreigner) gave him a very low grade as a result.
What is a good presentation? I've seen good and exciting presentations where the presenter didn't even speak good English (e.g. Tam Wai Ho's presentation in IELM311). The differentiating quality between good presenters and mediocre presenters is their ability to make the audience feel comfortable and keep them thinking instead of falling asleep. When you're seeing a product presentation and you're thinking, "Hey, this product seems amazing, what uses do I have for it? How did they do it? Are there any modifications that I'll need if I were to buy it?", then you're looking at a good presenter. This unique quality cannot be emulated by simply speaking good English (you can even do without that) or having tons of gestures in your presentation, as your English teacher would have taught you. But how did the good presenters do that? I wish I know. But understanding the audience should be the first step, since a good presentation directs the thoughts of the audience.
Where are the good presentations? Apple have them.
Thursday, May 3, 2007
Wednesday, May 2, 2007
So, the DeCSS debacle all over again. Somebody print that on a t-shirt.
Monday, April 30, 2007
At an early stage of a project, I wasn't too concerned about human visitors (that aren't too many, honestly), I was concerned about the search engine bots. The log file I got indicated that Googlebot would visit my site daily, but it stopped at the main page and did not crawl further. So every day, there's an isolated Googlebot log entry visiting the main page once and didn't do anything else.
I found a tool today, that (claims to be) is able to simulate what Googlebot sees from your website.
"Be The Bot"
So I entered "http://wt-toolkit.sourceforge.net" into the tool, and surprise! It says Googlebot sees a completely empty page there.
How could that happen? Immediately I thought of the redirecting index.php I put up in the root directory of WT Toolkit's project website. It only had one line of PHP code (three lines if you count the php opening and closing brackets):
<php?I put it there because I installed XOOPS (which is the CMS behind WT Toolkit's project website) under the xoops directory, and not the root directory. I did that for convenience. Going inside "xoops/" would give you yet another redirection, which gets you to the "Home" module's URL "/xoops/module/wtHome/".
Was Googlebot not able to process the redirection? It seems to be able to follow the redirections, otherwise it wouldn't be visiting "/xoops/modules/wtHome/" in the log file. Be The Bot's simulation also left the same log entry in my site log file, however.
So I entered the URL without redirections to Be The Bot: http://wt-toolkit.sourceforge.net/xoops/modules/wtHome/
This time, it displayed the project website correctly, albeit without the images.
Something was definitely wrong there. The log file indicates that Be The Bot was redirected to "/xoops/modules/wtHome" successfully, yet it couldn't retrieve the HTML correctly. Without redirection, the correct HTML content was retrieved. XOOPS might be part of the problem here, but I'm not sure.
Anyway, this means I have to restructure the project web site a bit so that the main page can be retrieved without redirection. This is not difficult... Done. No redirections for the main page now.
Let's see if Google could crawl it correctly tomorrow or a few days later.
But at 0.3.2, our performance is still bad compared to other popular toolkits like Dojo Toolkit and Qooxdoo. Widget creation latencies increase linearly in a very quick manner as the number of on-screen widgets increases. The effect isn't very noticeable under Firefox, but WT Toolkit 0.3.2 definitely felt slow under Internet Explorer 6 or 7.
Well... not anymore for the upcoming WT Toolkit 0.3.3! Even though I've already submitted my FYP final report, new work has begun on performance optimizations! Yeah, baby!
How much have we optimized? Let's see what a little trick called "delayed execution" (available in WT Toolkit 0.3.3) can do...
As a result of the work on performance optimizations, I left the work on WT Toolkit website to Marco. He couldn't complete it on 29th April because he had other academic work to do at that time. But anyway, we're having steady progress on WT Toolkit's website, we'll be seeing more and more amazing things as time goes on. :-)
Sunday, April 29, 2007
Friday, April 27, 2007
Wednesday, April 25, 2007
27/4/2007 - FYP Poster
29/4/2007 - Completed WT Toolkit Website
1/5/2007 - Submit WT Toolkit to Ajaxian
1/5/2007 - Submit WT Toolkit to freshmeat
1/5/2007 - Submit WT Toolkit to Open Directory Project
13/5/2007 - Visual programming demo for WT Toolkit
19/5/2007 - FYP Code CD (Not sure what contents are needed, probably a Linux LiveCD)
21/5/2007 - FYP Presentation (schedule - we are group DE3)
Monday, April 23, 2007
The topic was implementing an efficient DHT on an ad-hoc mobile network. Efficient DHTs for fixed-line, broadband Internet are already there, like Chord and Pastry, and everybody is using those knowingly or unknowingly. Michael's research is about how to make DHTs efficient on ad-hoc mobile networks, which is much harder than implementing DHTs on top of our everyday IP network. Some difficulties include:
1. Message routing. Mobile nodes do not and should not have fixed routes like our desktop computers. Although routing on the physical network can be partially solved by things like AODV, you still have to make sure the hops on the DHT's overlay network are efficient. e.g. assuming you've got perfect data routing in the physical network, it's still useless if one of the DHT hops goes to another country with a 12-hour timezone difference - your message will be hopping across many many many nodes in the physical network for just one DHT hop.
2. Bandwidth overhead. (??) I don't know how bad the problem is since I haven't seen the simulations myself. Probable causes I've heard are AODV-style flooding and Bloom filter inefficiencies. Gnutella-style implementations were mentioned for the audience to point and laugh at, I guess.
One of the related papers to M and J's work:
Now what's Michael and John's proposed solution... They proposed a DHT that's organized in a tree-like fashion, instead of the ring/skip list type seen in Chord or Pastry. The root node in the tree is called a "landmark", which should have a fixed location and has no extra hardware resource requirements when compared to other nodes. Their algorithm takes care of the physical routing as well so there's no need for AODV flooding or playing with Dijkstra's algorithm as in SrcRR. No AODV, no route request flooding, less bandwidth overhead. Bloom filters are used in narrowing/selecting paths in the tree, which is very intuitive and easy to understand (just a simple trick with bits, with the hard probability maths done for you 30 years ago), despite the seemingly cryptic name.
Prof. Gary Chan asked lots of questions during the presentation, he had a very sharp sense for things that seemed to be "strange" or inefficient. The object duplication algorithm (put in there to make p2p swarming possible) in John's presentation was one of the quirks Gary spotted, the algorithm seemed like a placeholder, I guessed it shouldn't be too hard to correct though.
So what I've got from the seminar... let's see
1. Revision of some old algorithms (Bloom filters... I almost forgot them completely, never used them once in the past few years), learned some new ones, and some new problems.
2. The 40 minutes presentation time I've got for my FYP is preciously short. Michael and John's presentation went for like 1.5 hours, and they were still missing on some details.
3. I need to keep my audience interested by doing demonstrations, with both WT Toolkit and WT Toolkit's competitors.
The title of the article is "Regular Expression Matching Can Be Simple And Fast", but what's more interesting is the subtitle - "(but is slow in Java, Perl, PHP, Python, Ruby, ...)". Slow, how slow? Look at the first graph of the article, for some pattern matching inputs, Perl 5.8.7's built-in regular expression matching is millions times slower than a 40-year-old algorithm.
How can that happen? Well... it could be argued that the expression used in the example is a pathological case. But is it a pathological problem in theory? i.e. not belonging to P, or belonging to P with a very large exponent? Well, obviously not. Otherwise, the 40-year-old algorithm wouldn't be able to perform the matching quickly as well.
What actually happened here was this... all the popular programming language developers (Java, Python, Perl, PHP, etc.) copied/borrowed their implementation from a popular extended regular expression matching algorithm that was known to be "fast enough", but not known to be provably fast. 40+ years of theories of finite automata went into the trash bin when programmers (including the guy who invented the correct algorithm 40 years ago!) needed to release softwares fast and neglected to spend time to think about the mathematics behind.
The regular expression engine that the article's author described was only a very simple one, however. Can it be expanded to processing modern extended regular expressions without going into the same performance hell of Perl, PCRE, Python, etc.? The author gave some justifications that it could, but he was very light on the details. Even if he has missed out some details that makes his proposal infeasible, however, it still stands that the regex engines we're using every day are far from optimal.
But wait a moment... you leave your fingerprints everywhere, every day. It's pretty much public information. And using public information as a secret key sounds like a dumb idea, doesn't it?
Yup... it's dumb. Everybody can crack a fingerprint scanner with a printer, transparency slides, PCB etching tools, and any moldable plastics. It's at its heart security by obscurity. And it's remarkable how much bullshit went into that "unbreakable door lock" in the video. Using moisture as an authentication condition?! On come on, is moisture really so scarce or secret on Earth? Now what's next? Iris scanners? Your iris pattern can be captured everywhere, in 3D, even... it might be a little bit more difficult to capture and reproduce, but it's public information, nonetheless. If what they are betting on is the resolution of cameras (which can definitely be improved as time goes on), then they're relying on security by obscurity.
It's remarkable how far snake oil technologies can make into the market, government institutions, and even academia.
By the way, the video rocks! It feels like reading an early issue of the Phrack magazine (much of the hacks don't work anymore, of course. But wait... the fork bomb still works ) or some of the classical papers/theses (like, Chord). Easy to read, concrete procedures, concrete results, and profound implications.
Just saw this when I was searching in Google. Good to know there are people who know my project exists, and there are other people doing the same thing as me.
Among the projects, the only other ones I can recognize are CK-ERP and RMSS. CK-ERP's author, C.K. Wu, has worked on the project for many years. He posted many advertisements in local newsgroups. Sadly, there's rarely any public replies to him. There should have been quite a number of people talking to him privately though, as ERP systems are generally very expensive and have major impact on business.