Theo Todman's Web Page - Notes Pages
Website Documentation
Website Generator Documentation - Visitor Stats
(Text as at 28/09/2022 10:24:58)
Introduction
- This document covers the following functions performed by clicking buttons on the front screen:-
- Visitor Stats (cmdVisitorStats_Click)
- To see the Code, click on the procedure name above.
Functional Notes
- Background: my ISP (namesco.co.uk) uses Webalizer to provide monthly stats of hits against my site. Superficially, these are a rather rosy 5.7k or so hits a day over the 81 months up to 2024-12 (8.9k or so hits a day over the 12 months up to 2024-12), but I suspected that most of these are down to robots – mostly indexers – given that there are about 125k pages on my site as at 11/09/2023. So, I wrote some routines to check this out, and to display the results.
- Tables
- The stats data has to be imported into the following tables in the slave database Site_Hits.accdb:-
→ Hits_Pages, and
→ Hits_Bots
- The first table (Hits_Pages) contains statistics (Hits and Kbytes), by year and month, on all the pages on my website that have been visited. The URL of the page starts at the root directory (so, for example ‘/Fodor_Modularity.pdf’).
- The second table (Hits_Bots) contains statistics, by year and month, on the URLs1 or IP addresses2 that have accessed my site. The high-volume ones are probably robots and indexers, though it’s difficult to tell. It’s possible to Google the IP addresses (best missing off the last byte as IP addresses are usually claimed in groups of 0-255 in the final byte), which I’ve done for the highest volume addresses. See below for further investigations and functions.
- Process
- To acquire this data, I need (at the end of each month, or in bulk periodically) to go to the namesco.co.uk Control Panel and select the “Visitor Stats” option below “FTP Users” in “Hosting Settings”, click on the required month3 in the “Summary by Month” table, then
- Scroll down to “Top 30 of nn,nnn Total URLs” and at the bottom of this table, click on “View All URLs”. Then copy this page into a text file named “Webalizer_yymm.txt” (for the appropriate year and month).
- Next, scroll further down to “Top 30 of m,mmm Total Sites”. At the bottom of this table, click on “View All Sites”. Then copy this page into a text file named “Webalizer_Bots_yymm.txt” (for the appropriate year and month).
- These text files should be placed in the directory “C:\Theo's Files\Websites\Visitor_Stats”.
- Import the files using the button on the front screen. Once imported, to save unnecessary repeat-processing time, the files should be moved to the sub-directory “Processed”.
- Web Output: The import routine also outputs the web-pages:-
→ List of Pages accessed, in descending order
→ List of suspected Robot URLs
- Statistics not Collected: Webalizer contains a lot of other statistics that I don’t currently use, including:-
- Daily and Hourly totals – ie. By particular days of the month and hours of the day – analogous to the monthly ones.
- Stats of Files, Pages, Visits and Sites – which I don’t record – in addition to Hits and Kbytes which I do.
- Top 10 entry and exit pages.
- Referrers4.
- Search strings: there are very few of these – of the order of 5-10 a month – and I don’t know how they are recorded. I’ve saved these (going right back to April 2018) in case they prove interesting. Saved as Webalizer_Searches_yymm.txt.
- User agents.
- Countries (top 30; currently 108 in total – including .net, .com and the like).
Technical Notes
In-Page Footnotes:
Footnote 1: Footnote 2: Footnote 3:
- Stats are only displayed for 12 months.
- So, if I don’t collect the stats within a year of them being recorded, they might be lost.
- This is particularly a potential problem for those stats I decide not to record.
- However, I’ve found that by overtyping the “yymm” in the URL, the stats are actually still there! They go back to April 2018, when my ISP was taken over and moved to a new platform.
- In fact, if doing a ‘catch-up’ it’s most efficient to use this over-typing method after using the web-page for the first month.
Footnote 4:
- Theoretically these could be interesting, but the vast majority of the pages referring – over 1,000 different pages a month – are pages within my own site, which may just be people wandering about my site.
- The usage of these referring pages varies hugely. The largest number is about 80k / month for “Direct Request”, which the Webalizer documentation suggests will be bookmarks or keyed in URLs, but I suspect robots.
- Over 1k / month come from Google, which is strange, given the paucity of Search Strings.
- It may be worth importing these pages and stats and analysing them further.
Table of the Previous 2 Versions of this Note:
Summary of Notes Citing This Note
To access information, click on one of the links in the table above.
Text Colour Conventions
- Blue: Text by me; © Theo Todman, 2025