At an early stage of a project, I wasn't too concerned about human visitors (that aren't too many, honestly), I was concerned about the search engine bots. The log file I got indicated that Googlebot would visit my site daily, but it stopped at the main page and did not crawl further. So every day, there's an isolated Googlebot log entry visiting the main page once and didn't do anything else.
I found a tool today, that (claims to be) is able to simulate what Googlebot sees from your website.
"Be The Bot"
So I entered "http://wt-toolkit.sourceforge.net" into the tool, and surprise! It says Googlebot sees a completely empty page there.
How could that happen? Immediately I thought of the redirecting index.php I put up in the root directory of WT Toolkit's project website. It only had one line of PHP code (three lines if you count the php opening and closing brackets):
<php?I put it there because I installed XOOPS (which is the CMS behind WT Toolkit's project website) under the xoops directory, and not the root directory. I did that for convenience. Going inside "xoops/" would give you yet another redirection, which gets you to the "Home" module's URL "/xoops/module/wtHome/".
Was Googlebot not able to process the redirection? It seems to be able to follow the redirections, otherwise it wouldn't be visiting "/xoops/modules/wtHome/" in the log file. Be The Bot's simulation also left the same log entry in my site log file, however.
So I entered the URL without redirections to Be The Bot: http://wt-toolkit.sourceforge.net/xoops/modules/wtHome/
This time, it displayed the project website correctly, albeit without the images.
Something was definitely wrong there. The log file indicates that Be The Bot was redirected to "/xoops/modules/wtHome" successfully, yet it couldn't retrieve the HTML correctly. Without redirection, the correct HTML content was retrieved. XOOPS might be part of the problem here, but I'm not sure.
Anyway, this means I have to restructure the project web site a bit so that the main page can be retrieved without redirection. This is not difficult... Done. No redirections for the main page now.
Let's see if Google could crawl it correctly tomorrow or a few days later.