Theo Todman's Web Page - Notes Pages
Status: Web-Tools (2020 - June)
(Text as at 03/07/2020 22:09:07)
(For earlier versions of this Note, see the table at the end)
Rationale for this Project
- This Project was alluded to briefly in a footnote on research methodology in my original Research Proposal1 under the head Research - Internet Technology2. When last at Birkbeck, I wrote a more extensive paper3 defending the Project and describing its rationale. Now that my PhD is in suspense, I have decided to take this Project further. There’s a lot to do: still quite a few items on the “wish list”. It is fairly critical as an enabler for my research, so I need to get a move on as I want it all out of the way before I re-start4 formal research.
- For documentation on my website (currently password protected) follow the links below:-
- Functional5 Documentation6.
- Technical7 Documentation.
- Other Websites8
- I’ve created and continue to maintain a small website for a music group Julie and I attend – the Enigma Ensemble.
- I established the Hutton Bridge Club Website in 11Q4 using the standard Bridgewebs service, but with a couple of competitions using my own routines. This was handed over in 15Q3, but I’ve taken it on again as of March 2020, not that there’s currently much to do. It needs a spring clean, but I’m waiting to see whether the club (and its members) survive the coronavirus pandemic.
- In 16Q3 I created the Mountnessing Bridge Club Archive website, using the vast bulk of the pages from their legacy site, as the club had moved to Bridgewebs and lost its historical data. As of March 2020 I’ve taken over the aforementioned Mountnessing Bridge Club website itself.
- Sometime around 2005, I created a website for Dr. Sophie Botros, one of my supervisors at Birkbeck, but we then lost touch and it got maintained (very badly) by some desktop support outfit. In 15Q2 I took it back on again and spruced it up a bit, and maintain it periodically until 19Q3, when it was taken on by a professional outfit, Bookswarm. The “Sophie Botros” link in this bullet is now to their version of the site.
- I created and / or ran a multitude of other bridge websites, but as of January 2018 I have either handed them over or mothballed them9:-
- In 15Q1, I took over the support and development of the Essex Contract Bridge Association (ECBA) website, which also uses Bridgewebs, but is very much larger. I wrote a lot of code10 to make this job less tedious. The site was handed over in 17Q4.
- For several years, I collected data11 on bridge activity in the Billericay/Brentwood area (initially needed for a project to set up a new consolidated club) by “scraping” data off web pages, consolidating it into a database and modelling it in various ways.
- I used this data to generate websites with a multitude of ladders for small clubs (Essex Bridge Results). These are now mothballed.
- I created and maintained a new website for the First Class Bridge Academy, giving it “small clubs” ladders (Bernie's Ladders Archive) as these were easy to maintain with little intervention.
- In 16Q3 I created the Mountnessing Bridge Club Archive website, using the vast bulk of the pages from their legacy site, as the club had moved to Bridgewebs and lost its historical data.
- I created a website for displaying the textual and grammatical analyses and appendices of Pete’s PhD on the Acts of the Apostles. It exists in two versions: Acts: Live Site and Acts: Test Site.
Summary of Progress during April - June 2020
Website (Total Hours = 126.5)
- I spent 138.25 hours in 20Q2 on this Project, or related work (332.5 hours YTD, where for "YTD" - Year to Date - I mean the (academic) year that commenced in October 2019). That's 134.6% of the planned effort (116.6% YTD). Overall, 19% of my Project effort in the Quarter was directed towards this project (making 15.9% YTD) - as against 14.8% planned (12.6% YTD).
- In 20Q2 I made quite a lot of progress documenting and improving the Cross-Referencing algorithms12 and – as it turned out a related matter – improving the run times of the full Website regeneration routines. Consequently, I overspent my time budget by over a third. Time well spent, though.
- Completed items included:-
- Own Website:
- Enhanced the Functor processing to allow parameters. Previously, I had to create a new Functor ID each time for a very similar requirement. The main problem with this old method was that these Functors (where they call another Functor, or subroutine from a row in Functor) didn't feed through into the Documenter subroutines properly.
- Truncated Cross_Reference_Changes table (from over 1M rows to under 50k) which should be self-maintaining, but isn't. This seems to have improved the regeneration times of Notes, Archived Notes and some other objects now that insertions into this table are quicker. Needs a permanent solution as part of my Cross-Referencing project.
- Enabled "alternate names" for identically-named authors (to avoid middle-initial = X):-
- Set up and populated Author_Name_Display
- Ensured Author_Name_Display appears on the Author page.
- Determined why the monthly regeneration process for Authors has ballooned from under 4 hours in January 2020 to over 17 hours in April 2020. Fix on 11/04/20 - time more than halved to 8.25 hours - by re-engineering the 'menu' process that counts the items linked to further down the page.
- Further reduced the monthly regeneration process time for Authors from 8.25 hours to 13 minutes! Fixed by materialising and indexing a 'view' to table Authors_Cited_By_All_List.
- Determined why the monthly regeneration process for Book Summaries had ballooned from 1.1 hours in January 2020 to 3.5 hours in April 2020. Partly fixed by by materialising and indexing the view (now table) Book_Citings_List_New, so now takes 39 minutes. Further improvement required, given that Paper Summaries only takes 5 minutes.
- Determined why the monthly regeneration process for Book-Paper Abstracts had doubled from 2.4 hours in January 2020 to 4.8 hours in April 2020. Successfully reduced to just upder one hour by materialising and indexing BookPaperAbstracts_List.
- Created various generic Functors to help Document Cross-Referencing by producing cross-tabs and lists from queries. Then the associated stats can be regenerated each time the documentation13 is regenerated (as TEMP): Functor_21, Functor_22, Functor_23, Number_Format.
- As revealed by Spider: Sundry uncategorised. Refs failing. 30 items. Fixed manually.
- Determined why the monthly regeneration process for Paper Abstracts had ballooned from just over 6 hours in January 2020 to over 17 hours in April 2020. Reduced to just under 4 hours as a result of materialising and indexing the view (now table) Paper_Citings_List_New. Further improvements required.
- Determined why the monthly regeneration process for Paper Summaries had ballooned from just over 3.5 hours in January 2020 to around 11.5 hours in April 2020. Fixed by materialising and indexing the view (now table) Paper_Citings_List_New, so now takes 6 minutes!
- Full Website Regeneration took 60 hours in April 2020. This has now been substantially fixed, as of end June 2020, in that regeneration now takes just under 11 hours. While further improvements are possible, the best use of time is to avoid the need for regular regeneration altogether by completing my Cross-referencing project.
- Corrective work was split out into the items requiring improvement, must of which have now been implemented.
- History and analysis:-
- This 'balooning' was on my new laptop which has solid state disks - it was taking 36 hours on my old laptop, then halved when I got the replacement, but then nearly doubled!
- I investigated why this was so, and improved performance mainly by putting in a trace (using a timer - GetTickCount - found on the web that allows logging of elapsed time in milliseconds, using StartTimer and EndTimer) and finding the 'pinch points'
- I suspected that it was a new release of MS Access that's caused a problem; maybe a lost index or something similar. I couldn't find any evidence of this, but a mumber of queries were taking a second or so to execute, which small times multiplied up drastically when the number of pages on my site - over 100k - are taken into account. Quite why a cliff-edge had been reached, I know not!
- Three areas had been particularly impacted: Authors, Paper Abstracts & Paper Summaries, which have all trippled or worse. These are covered as seperate developments under 'Authors' and 'Papers'.
- Book Summaries and Book-Paper Abstracts had also at least doubled, but this has had less of an impact as they took much less time in the first place.
- Other regeneration processes - in particular 'Notes' - didn't appear to have been impacted.
- Wrote Check_Database_Size to check that the size of the database isn't at risk of breaching the 2GB limit. Required because of the need to re-materialise various views that have been instituted to speed up the full website regeneration routines, and which need to be run each time cmdRecalculate_Click is run, which is often.
- As revealed by Spider: WebLinks_Tester_Full_Map.htm (etc). Refs failing. 184 items. res://ieframe.dll/ in Returned_URL.
- WebLinks_Tester_Full & WebLinks_Tester_Full_Map: Reformated Jump Table to 30 columns and multiple rows.
- WebLinks_Tester: If the URLs of WebRefs are changed between Spider runs, the URL mapping fails, and the WebRefs appear at the end of the last page, wrongly categorised. Fixed to ensure they appear on their own page, added to the jump table as 'WebRef Missing' and added an explanation.
- Webrefs_Update: Improved the processing of this sub which controls IE to check the URLs corresponding to WeRefs are still valid.
- Improved the recovery processing after IE fails or becomes detached. Now seems to work perfectly!
- Improved processing for 404 (page not found) errors - detect them where URL returned differs from that requested (so the error is correctly categorised) and also where it is the same (so the error is detected) - by using the GetElementsByTagName method to look through Title and H1 elemnts (where they exist).
- Note that there is still an issue for pdfs where the above methods don't work; though the page returned is an HTML or XML page, the changed URL still claims to be a pdf, so I can't check in case it's a real pdf and the process fails.
- Amended WebRef 'Name' links to #Off-Page_Link_WxxxW style so referencing from other pages is possible (useful for Aeon)
- Other Websites:
- Full details for 20Q2 are given below14:-
Website Others (Total Hours = 11.75)
- Website - Development (Total Hours = 93.25)
- Improvements to CreatePaperCitingsWebPages (0.75 hours)
- Website - Generator - Amend WebRef 'Name' link to #Off-Page_Link_WxxxW style (1.5 hours)
- Website - Generator - Automate Aeon Page output - Add total 'unread' count + 'read' narrative (0.75 hours)
- Website - Generator - Compact / Repair re-open '2Gb' problem - set MaxLocksPerFile to 1,000,000 (from 9,500) (0.25 hours)
- Website - Generator - Correct Functor_16 to remove hyperlinks from Title (1.75 hours)
- Website - Generator - Correct Quarterly Reporting for YTD % Planned (0.5 hours)
- Website - Generator - Correct Subtotalling of Aeon Webref items in Summary task List reports (1.25 hours)
- Website - Generator - Document Cross-Referencing (17 hours)
- Website - Generator - Document Cross-Referencing: Create Functor_21 (2.75 hours)
- Website - Generator - Document Cross-Referencing: Create Functor_22 (0.5 hours)
- Website - Generator - Document Cross-Referencing: Create Functor_23 (2 hours)
- Website - Generator - Document Cross-Referencing: Create Number_Format (1 hour)
- Website - Generator - Document Functors (2.5 hours)
- Website - Generator - Enhance Functor processing to allow parameters (2.25 hours)
- Website - Generator - Fix various bugs in the Spider_WebLinks_Tester set of subroutines (2.75 hours)
- Website - Generator - Fixes re Broken Links revealed by Spider (7.5 hours)
- Website - Generator - Full Website Regen - Performance Improvements - Authors (11.5 hours)
- Website - Generator - Full Website Regen - Performance Improvements - Book Summaries (1.5 hours)
- Website - Generator - Full Website Regen - Performance Improvements - Book-Paper Abstracts (2.5 hours)
- Website - Generator - Full Website Regen - Performance Improvements - Check on database size (1.25 hours)
- Website - Generator - Full Website Regen - Performance Improvements - Cross-Referencing (2.25 hours)
- Website - Generator - Full Website Regen - Performance Improvements - General planning (1.5 hours)
- Website - Generator - Full Website Regen - Performance Improvements - Notes Archive (2.75 hours)
- Website - Generator - Full Website Regen - Performance Improvements - Paper Abstracts (2.5 hours)
- Website - Generator - Full Website Regen - Performance Improvements - Paper Summaries (3.75 hours)
- Website - Generator - Investigate & Fix WebRefs_Update checker for 404 check not working (4.25 hours)
- Website - Generator - Investigate & Fix WebRefs_Update checker for restarts not working (1.5 hours)
- Website - Generator - Investigate 2Gb query message on Compact & Repair (1.25 hours)
- Website - Generator - Modify Functor_17 to segragate Audio Files to Music Page (2 hours)
- Website - Generator - Site Map (8.25 hours)
- Website - Generator - Spider - Monitor performance & Main database size (0.5 hours)
- Website - Update 'Photos' Page with Family history photos for Vicky (1 hour)
→ See "Software Development - Website - Development" (93.25 hours)
- Website - Education (Total Hours = 1.5)
- Website - Infrastructure (Total Hours = 12.75)
- Email - Mailbox compromised: changed passwords (1 hour)
- Email - Mailbox not compacting (0.75 hours)
- Google - Passwords Compromised? (0.25 hours)
- Microsoft Windows 10 / MS Office - Releases, Bugs & Periodic Re-boots (4.5 hours)
- Naomi's BT connection in new flat (1 hour)
- PC Backups / OneDrive (1.25 hours)
- Re-installing PdfElement (0.25 hours)
- Set up Bluejeans Conferencing system for Quiz (1.75 hours)
- Sky Q Fixes (1.5 hours)
- Ubisoft Password Compromised? (0.5 hours)
→ See "Admin - Website - Admin & Maintenance" (12.75 hours)
- Website - Maintenance (Total Hours = 19)
- 20Q1 Status Reports (0.75 hours)
- Updated my Home page (0.75 hours)
- Website - Generator - WebRefs - Manual / Automatic URL Checks & Fixes (6.75 hours)
- Website - Periodic Full Regeneration (8.25 hours)
- Website - Run Web Spider (2.5 hours)
→ See "Admin - Website - Admin & Maintenance" (19 hours)
- Website Others - Enigma Ensemble
- Website Others - Hutton DBC Maintenance
- Website Others - Mountnessing DBC Maintenance
Plans for the Near Future
The Plan below is taken automatically from the Priority 1 items on my Development Log, as published in my Outstanding Developments15 Report. I’ve again increased the weekly allocation marginally to 10 hours. This is to allow further work on my Cross-Referencing project.
- Own Website: Priority 1 Items By Category:-
- Compact and Repair Problems
- On compacting and repairing my main database I sometimes get the error "The query cannot be completed. Either the size of the query result is larger than the maximum size of a database (2 GB), or there is not enough temporary storage space on the disk to store the query result".
- It happens 3 times while the database is re-opening.
- There is lots of space, and the database is only 600Mb (and the error started when it was under 500Mb).
- This mostly happens after I've run long processes, so I usually close the database, re-open it and then try the compact and repair. Usually this works, but not always. But I then try again and the message disappears.
- I strongly suspect that this is MS Access itself re-indexing tables, and blowing up a temporary database, but I can’t find any evidence for this on-line. Or help, other than suggestions to split databases and do other sensible things. Given that the error occurs when the database is re-opening, with no temporary file visible, is very strange.
- 17/04/20 - set MaxLocksPerFile to 1,000,000 (from the default 9,500). Sadly, it doesn't seem to have made a difference.
- Complete XRef-re-engineering project:-
- Ensure all links and link-pages use the new XRef table, and pension off the old tables.
- Look into writing out specific object-identifiers, and linking thereto for Citations, rather than paragraph references. An issue is multiple instances of the same object in a document.
- Check all link-types still work and fix any errors.
- Complete the auto-triggering of regeneration of “associated” link pages.
- Fix update bug in Convert_Webrefs.
- Fix Bug whereby PaperSummary pages seem to have “Works-” and “Books/Papers-” Citings that refer to the same link-pages.
- Document the process!
- Review effectiveness of hyperlinking method in the light of PhD and Philosophy of Religion experience.
- Where possible, use ID rather than NAME for in-page hyperlinks
- Investigate Record-count discrepancies:-
- How do website files work as far as counts are concerned?
- Why aren't they recorded in Backup_History, nor the fact that the website was backed up?
- Different counts depending on whether new or old laptop is backed up. Investigate 63k discrepancy - lower on new laptop.
- Review architecture to improve performance; Need to document first
- Further improve the time to regenerate Book Summaries. Now takes about 39 minutes, but should be under 5 minutes!
- Investigate whether multiple Subject/Topic/Subtopic usage leads anywhere (ie. are just the first (of 3) actually used). Fix anything amiss.
- Reformat the PaperCitings pages:-
Do the same for BookCitings
- Include only useful information on the detail pages; but if there are multiple links from the same object, include them on the same line as 'extra links' as in Authors' Citations (copy the code).
- Include counts on the summary page.
- Develop auto-reconciliation routines vs EBU results download
- Investigate the error reports from the Documenter, especially unused variables & queries.
- Provide Functional Documentation for Website Generator (using Notes)
- "Sitepoint (Learnable) - Sitepoint Learnable Web Development Courses": Membership cancelled, but plan what to do with the eBooks in my possession.
- Read "PC Pro - Computing in the Real World".
- Read "White (Ron) & Downs (Timothy Edward) - How Computers Work: The Evolution of Technology".
- iCloud for Windows: Re-install & solve 'The upload folder for iCloud Photos is missing' problem. Try on new Laptop.
- Add "Note Alternates" to Note pages.
- Add option in Auto-Reference Notes to automatically ignore words containing certain strings that include the key-word (eg. ignore 'grace' and 'trace' when indexing 'race')
- Add option in Auto-Reference Notes to only confirm new items (leaving previously-flagged items untouched)
- Allow the option to concatenate Notes in the Printed version (ie. linearly embed them essay-style), rather than treating the hyperlinks as footnotes – but still keep the hyperlink & cross-referencing in place.
- For use as "disclaimers" - eg. for "Plug Notes".
- For Thesis / essays: the difficulty here is the need for linking passages to make the text run smoothly.
- As part of the Cross-Referencing project, check out the consistent treatment of Note 87516, which should be universally ignored. Recently, links to it appeared on Book-Summaries, Book_Paper_Abstracts and Note_Book_Links, as a Note referencing a Book. The critical item was a row on the Note_Book_Links table.
- Determine why very long printable notes (eg. Level 3+ for Note 17017) are being truncated. Probably suppress them in any case, as they take far too long to load.
- Fix bugs in multi-level footnoting in Printable Notes – the referencing is going wrong.
- Investigate Note_Links: Section references seem to be incorrect
- Printable Notes: fix the bug whereby the “private” flag is round the wrong way.
- Split Aeon Page18 into multiple sub-pages (either by topic or by priority)
- Suppress the publication of the Printable versions of Temp Notes
- The monthly regeneration process for Paper Abstracts still takes just over 5 hours. Problem is with Cross_Reference_Deletions and Cross_Reference_Additions. Cannot be fixed until the cross-referencing project is fully complete and documented.
- Develop software & procedure to make adding more content to the photos pages easier to undertake.
- Timeline software: Add photos for Holidays & Family History
- Determine why Recalculation & Changed Book/Papers produce unneeded regeneration.
- Analyse the results of the data collection exercise and design a plan of campaign to fix broken Internal links and prevent recurrence.
- Correct the code so the problems discovered by the Spider don’t recur.
- Delete 'orphan pages' that are never linked to, ie. Use the Spider to prune redundant pages19 automatically where possible.
- Fix the historical data where errors are uncovered by the Spider. An easier task now the site has a full-regen function.
- The size of the main database bloats to over 1.6Gb during the spider run, so is approaching the 2Gb limit.
- Use Check_Database_Size, with a parameter, to monitor the size – output a message along the those reporting the compact / repair of the Slave database).
- Put a checks in to STOP if over 1.8Gb.
- Determine a solution as the limit is pushed. Some tables are “local” for performance reasons and are later copied to the Slave … maybe move them?
- The Spider was generating WebRefs. Procedurally, this ought not to have been possible.
I've re-opened the case!
- The major problem turned out to be because unprocessed20 URLs got added to the end of the last WebLinks_Tester_Brief page, which then got Spidered. I've stopped this happening, so hopefully the problem will not recur. The fix was made in 18Q2.
- However, 4 other creations appeared - dated 18/05/18 - from the run of 10/07/18. The creation date was from the previous spider run, but the IDs show that they were produced in the latest run.
- Quarterly Project Reports: Correct Functor_08. The Project Planned YTD % keeps having to be bodged!
- Look into Sistrix Smart21. Errors and warnings itemised are:-
- Duplicate content: seems to be variants on theotodman.com
- Title Tags: Empty, too long, identical
- Page Not Found
- Filesize in excess of 1Mb
- Meta-Description: Empty
- Few words on Page
- H1: Not used, used multiple times per page, identical across pages
- Pictures: Alt attribute missing
Other Websites: Priority 1 Items By Category:-
- Documentation & Bug-fixes: Phase 2
- Re-document the procedures in the light of recent changes.
- Resolve issues generated / revealed by the spider.
- Investigate - and fix where possible - broken links.
- Improve WebRefs checker (Webrefs_Update) further to check for Error 403 "Forbidden". This will often involve finding way of checking pdfs where the returned page is in fact HTML or XML (see DevLog Ref 379).
- Investigate defunct items. Populate Defunct_Explanation in WebRefs_Table and include in relevant WebLinks_Tester reports. Consider use of FairUse (Link (Fair Use)) for documents no longer available that I'd downloaded.
- Reformat WebLinks_Tester.htm, WebLinks_Tester_Map.htm, WebLinks_Tester_Full.htm & WebLinks_Tester_Full_Map.htm
- Clarify 'truncated': Display, not link
- Allow more space for 'link returned', 'issue' and 'display text'
- The 'As Above" lines waste space. Only for Notes Archive? Consolidate onto single second line.
- Reformat WebLinks_Tester_Brief: Allow more space for 'link returned', 'issue' and 'display text'
Summary of Progress to Date
This is hived off to various separate documents, which have now been harmonising and / or consolidated:-
- Summary of Progress to Date22.
- Outstanding Developments23,
- Functional Documentation24,
- A summary of time expended across the years developing my website25 is at "Software Development - Website - Development".
In-Page Footnotes:Footnote 4:
- Well, in a sense, I’ve missed the boat as I’m now putting effort into my research, though in an informal basis, so will need to continue with both projects in parallel.
- This was always likely to be necessary, as new features will always arise in use. It’s a prototype methodology, after all.
- This is very tedious to produce and consequently is both incomplete and out of date.
- This is much more fun, as it’s a purely technical task.
- I’ve written a vastly-improved general-purpose technical documenter for MS Access.
- It’s a shame to abandon the “mini websites” with all their ladders, as it’s rather well done.
- However, I couldn’t waste time on these after I’d abandoned bridge.
- In particular, for the ECBA “Victor Ludorum” competition.
- I cannot hand any of this code over, so the tedium will return, though not to me!
- I had agreed to share this data sometime early in 2018
- But will wait until asked again, as I doubt it’ll be of any real use to anyone.
- Note that where fixes or small enhancements are made to a previously “completed” development, I don’t announce it again against the list of “completed” items above, though the work appears in the full list for the quarter.
- Note that Backup_Prune_Ctrl deletes (relevant) pages that weren't regenerated in the last full site-regen, but this isn't the same thing.
- These are URLs that were used in web pages but hadn't yet been converted to the +WnnnW+ format, so appeared at the end with no WebRef ID.
- See Sistrix
- This used to be called Optimizr, see Optimizr (which now auto-forwards to Sistrix).
- A quick look doesn’t show it to be an obvious scam, but I need to double-check.
- An unsolicited analysis of my site turned up monthly from Optimizr from January 2015 to October 2017, listing a large number of “problems” that I think I know about, but which are in the queue to address.
- It restarted in February 2018, under the Sistrix name (this seems to have been associated with Optimizr since November 2015).
- The free version of this software is restricted to 1,000 pages, which is a very small proportion of my Site, though I may be able to point it to difference base-URLs.
- But I do need to address the problems validly itemised, and a sub-set is still useful.
- As distinct from developing other peoples’ websites – time which is also recorded against this project, but not against this task.
Table of the Previous 12 Versions of this Note: (of 77)
Summary of Note Links from this Page
To access information, click on one of the links in the table above.
Summary of Note Links to this Page
To access information, click on one of the links in the table above.
Authors, Books & Papers Citing this Note
||Website - Development
Text Colour Conventions
- Black: Printable Text by me; © Theo Todman, 2020
- Blue: Text by me; © Theo Todman, 2020