Part of Bill's incredibly stupid web diary. Read some more today, yerhear!
|
Those of you with half decent web servers will know of the gem of information known at the server logs. All the information you need about who is doing what wih your web site. My own ISP Force 9 neatly packages the logs up every day and leaves them on the server.
An entry in log looks a bit like this...
snipped.ufl.edu - - [19/Dec/2002:01:24:04 +0000] "GET /docs/dist.html HTTP/1.1" 200 24254 "http://dmoz.org/Computers/Computer_Science/Distributed_Computing/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)" www.bacchae.co.uk
So, on the 19th of December, someone from the University of Florida downloaded my primer on distributed computing using Microsoft Internet Explorer 6.0, coming from a link from the Open Directory Project.
Simple eh.
I was plesently surprised to see that Microsoft Excel can (almost) import these log files quite correctly. (It decided to split the date up into two fields.) This allowed me to easily query the logs and see what's going on. This bought up a few surprises. Anyway, here's a few nuggets of information I found, looking at the week from the 15th of December to the 22nd of December, 2002.
My diary is getting a good readership, but not from search engines. Accesses from people's browser bookmarks are in charge here, although a few came in from the time I mentioned it in my kuro5hin diary. One referal came from wilwheaton.net, but I have no no idea how.
So where were all the search engines? Well, it seems that both google and altavista are still pointing to my old diary files. Before I started this diary as you see here, I used to manually edit one and keep the entries in the "/docs" directory, same place as I keep my well-linked-to distributed computing primer. Now all my diary entries have moved to "/diary/arc", and the old "/docs" files are not linked from anywhere. Yet three weeks and 20 googlebot scans later, people are still finding my old diary entries. Fortunately, I left the old documents where they were, so vistors still have something to read.
My two articles on the collapse of ITV Digital are usually found by people looking for technical information or porn. "sky digital frequencies" and "how to get porn on sky digital". Sorry lads.
Top of the 404 chart are the non-existant "robots.txt" and "favicon.ico". Perhaps I should write some. A few incredibly evil people tried looking for "formmail.pl". Ner ner.
Ah well. Readership is reasonable, but something is up with search engines. Best I could hope for I 'spose.