Theo Todman's Web Page - Notes Pages

Status Reports

Status: Web-Tools (2020 - June)

(Text as at 03/07/2020 22:09:07)

(For earlier versions of this Note, see the table at the end)

Rationale for this Project

Summary of Progress during April - June 2020
  1. I spent 138.25 hours in 20Q2 on this Project, or related work (332.5 hours YTD, where for "YTD" - Year to Date - I mean the (academic) year that commenced in October 2019). That's 134.6% of the planned effort (116.6% YTD). Overall, 19% of my Project effort in the Quarter was directed towards this project (making 15.9% YTD) - as against 14.8% planned (12.6% YTD).
  2. In 20Q2 I made quite a lot of progress documenting and improving the Cross-Referencing algorithms12 and – as it turned out a related matter – improving the run times of the full Website regeneration routines. Consequently, I overspent my time budget by over a third. Time well spent, though.
  3. Completed items included:-
    1. Own Website:
      • Architecture
        1. Enhanced the Functor processing to allow parameters. Previously, I had to create a new Functor ID each time for a very similar requirement. The main problem with this old method was that these Functors (where they call another Functor, or subroutine from a row in Functor) didn't feed through into the Documenter subroutines properly.
        2. Truncated Cross_Reference_Changes table (from over 1M rows to under 50k) which should be self-maintaining, but isn't. This seems to have improved the regeneration times of Notes, Archived Notes and some other objects now that insertions into this table are quicker. Needs a permanent solution as part of my Cross-Referencing project.
      • Authors
        1. Enabled "alternate names" for identically-named authors (to avoid middle-initial = X):-
          1. Set up and populated Author_Name_Display
          2. Ensured Author_Name_Display appears on the Author page.
        2. Determined why the monthly regeneration process for Authors has ballooned from under 4 hours in January 2020 to over 17 hours in April 2020. Fix on 11/04/20 - time more than halved to 8.25 hours - by re-engineering the 'menu' process that counts the items linked to further down the page.
        3. Further reduced the monthly regeneration process time for Authors from 8.25 hours to 13 minutes! Fixed by materialising and indexing a 'view' to table Authors_Cited_By_All_List.
      • Books
        1. Determined why the monthly regeneration process for Book Summaries had ballooned from 1.1 hours in January 2020 to 3.5 hours in April 2020. Partly fixed by by materialising and indexing the view (now table) Book_Citings_List_New, so now takes 39 minutes. Further improvement required, given that Paper Summaries only takes 5 minutes.
      • Books/Papers
        1. Determined why the monthly regeneration process for Book-Paper Abstracts had doubled from 2.4 hours in January 2020 to 4.8 hours in April 2020. Successfully reduced to just upder one hour by materialising and indexing BookPaperAbstracts_List.
      • Documenter
        1. Created various generic Functors to help Document Cross-Referencing by producing cross-tabs and lists from queries. Then the associated stats can be regenerated each time the documentation13 is regenerated (as TEMP): Functor_21, Functor_22, Functor_23, Number_Format.
      • Notes
        1. As revealed by Spider: Sundry uncategorised. Refs failing. 30 items. Fixed manually.
      • Papers
        1. Determined why the monthly regeneration process for Paper Abstracts had ballooned from just over 6 hours in January 2020 to over 17 hours in April 2020. Reduced to just under 4 hours as a result of materialising and indexing the view (now table) Paper_Citings_List_New. Further improvements required.
        2. Determined why the monthly regeneration process for Paper Summaries had ballooned from just over 3.5 hours in January 2020 to around 11.5 hours in April 2020. Fixed by materialising and indexing the view (now table) Paper_Citings_List_New, so now takes 6 minutes!
      • Process
        1. Full Website Regeneration took 60 hours in April 2020. This has now been substantially fixed, as of end June 2020, in that regeneration now takes just under 11 hours. While further improvements are possible, the best use of time is to avoid the need for regular regeneration altogether by completing my Cross-referencing project.
        2. Corrective work was split out into the items requiring improvement, must of which have now been implemented.
        3. History and analysis:-
          1. This 'balooning' was on my new laptop which has solid state disks - it was taking 36 hours on my old laptop, then halved when I got the replacement, but then nearly doubled!
          2. I investigated why this was so, and improved performance mainly by putting in a trace (using a timer - GetTickCount - found on the web that allows logging of elapsed time in milliseconds, using StartTimer and EndTimer) and finding the 'pinch points'
          3. I suspected that it was a new release of MS Access that's caused a problem; maybe a lost index or something similar. I couldn't find any evidence of this, but a mumber of queries were taking a second or so to execute, which small times multiplied up drastically when the number of pages on my site - over 100k - are taken into account. Quite why a cliff-edge had been reached, I know not!
          4. Three areas had been particularly impacted: Authors, Paper Abstracts & Paper Summaries, which have all trippled or worse. These are covered as seperate developments under 'Authors' and 'Papers'.
          5. Book Summaries and Book-Paper Abstracts had also at least doubled, but this has had less of an impact as they took much less time in the first place.
          6. Other regeneration processes - in particular 'Notes' - didn't appear to have been impacted.
        4. Wrote Check_Database_Size to check that the size of the database isn't at risk of breaching the 2GB limit. Required because of the need to re-materialise various views that have been instituted to speed up the full website regeneration routines, and which need to be run each time cmdRecalculate_Click is run, which is often.
      • WebRefs
        1. As revealed by Spider: WebLinks_Tester_Full_Map.htm (etc). Refs failing. 184 items. res://ieframe.dll/ in Returned_URL.
        2. WebLinks_Tester_Full & WebLinks_Tester_Full_Map: Reformated Jump Table to 30 columns and multiple rows.
        3. WebLinks_Tester: If the URLs of WebRefs are changed between Spider runs, the URL mapping fails, and the WebRefs appear at the end of the last page, wrongly categorised. Fixed to ensure they appear on their own page, added to the jump table as 'WebRef Missing' and added an explanation.
        4. Webrefs_Update: Improved the processing of this sub which controls IE to check the URLs corresponding to WeRefs are still valid.
          1. Improved the recovery processing after IE fails or becomes detached. Now seems to work perfectly!
          2. Improved processing for 404 (page not found) errors - detect them where URL returned differs from that requested (so the error is correctly categorised) and also where it is the same (so the error is detected) - by using the GetElementsByTagName method to look through Title and H1 elemnts (where they exist).
          3. Note that there is still an issue for pdfs where the above methods don't work; though the page returned is an HTML or XML page, the changed URL still claims to be a pdf, so I can't check in case it's a real pdf and the process fails.
        5. Amended WebRef 'Name' links to #Off-Page_Link_WxxxW style so referencing from other pages is possible (useful for Aeon)
    2. Other Websites:
      • Nothing to Report.
  4. Full details for 20Q2 are given below14:-
Website (Total Hours = 126.5)
  1. Website - Development (Total Hours = 93.25)
    • Improvements to CreatePaperCitingsWebPages (0.75 hours)
    • Website - Generator - Amend WebRef 'Name' link to #Off-Page_Link_WxxxW style (1.5 hours)
    • Website - Generator - Automate Aeon Page output - Add total 'unread' count + 'read' narrative (0.75 hours)
    • Website - Generator - Compact / Repair re-open '2Gb' problem - set MaxLocksPerFile to 1,000,000 (from 9,500) (0.25 hours)
    • Website - Generator - Correct Functor_16 to remove hyperlinks from Title (1.75 hours)
    • Website - Generator - Correct Quarterly Reporting for YTD % Planned (0.5 hours)
    • Website - Generator - Correct Subtotalling of Aeon Webref items in Summary task List reports (1.25 hours)
    • Website - Generator - Document Cross-Referencing (17 hours)
    • Website - Generator - Document Cross-Referencing: Create Functor_21 (2.75 hours)
    • Website - Generator - Document Cross-Referencing: Create Functor_22 (0.5 hours)
    • Website - Generator - Document Cross-Referencing: Create Functor_23 (2 hours)
    • Website - Generator - Document Cross-Referencing: Create Number_Format (1 hour)
    • Website - Generator - Document Functors (2.5 hours)
    • Website - Generator - Enhance Functor processing to allow parameters (2.25 hours)
    • Website - Generator - Fix various bugs in the Spider_WebLinks_Tester set of subroutines (2.75 hours)
    • Website - Generator - Fixes re Broken Links revealed by Spider (7.5 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - Authors (11.5 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - Book Summaries (1.5 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - Book-Paper Abstracts (2.5 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - Check on database size (1.25 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - Cross-Referencing (2.25 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - General planning (1.5 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - Notes Archive (2.75 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - Paper Abstracts (2.5 hours)
    • Website - Generator - Full Website Regen - Performance Improvements - Paper Summaries (3.75 hours)
    • Website - Generator - Investigate & Fix WebRefs_Update checker for 404 check not working (4.25 hours)
    • Website - Generator - Investigate & Fix WebRefs_Update checker for restarts not working (1.5 hours)
    • Website - Generator - Investigate 2Gb query message on Compact & Repair (1.25 hours)
    • Website - Generator - Modify Functor_17 to segragate Audio Files to Music Page (2 hours)
    • Website - Generator - Site Map (8.25 hours)
    • Website - Generator - Spider - Monitor performance & Main database size (0.5 hours)
    • Website - Update 'Photos' Page with Family history photos for Vicky (1 hour)
      → See "Software Development - Website - Development" (93.25 hours)
  2. Website - Education (Total Hours = 1.5)
  3. Website - Infrastructure (Total Hours = 12.75)
    • Email - Mailbox compromised: changed passwords (1 hour)
    • Email - Mailbox not compacting (0.75 hours)
    • Google - Passwords Compromised? (0.25 hours)
    • Microsoft Windows 10 / MS Office - Releases, Bugs & Periodic Re-boots (4.5 hours)
    • Naomi's BT connection in new flat (1 hour)
    • PC Backups / OneDrive (1.25 hours)
    • Re-installing PdfElement (0.25 hours)
    • Set up Bluejeans Conferencing system for Quiz (1.75 hours)
    • Sky Q Fixes (1.5 hours)
    • Ubisoft Password Compromised? (0.5 hours)
      → See "Admin - Website - Admin & Maintenance" (12.75 hours)
  4. Website - Maintenance (Total Hours = 19)
    • 20Q1 Status Reports (0.75 hours)
    • Updated my Home page (0.75 hours)
    • Website - Generator - WebRefs - Manual / Automatic URL Checks & Fixes (6.75 hours)
    • Website - Periodic Full Regeneration (8.25 hours)
    • Website - Run Web Spider (2.5 hours)
      → See "Admin - Website - Admin & Maintenance" (19 hours)
Website Others (Total Hours = 11.75)
  1. Website Others - Enigma Ensemble
  2. Website Others - Hutton DBC Maintenance
  3. Website Others - Mountnessing DBC Maintenance

Plans for the Near Future

The Plan below is taken automatically from the Priority 1 items on my Development Log, as published in my Outstanding Developments15 Report. I’ve again increased the weekly allocation marginally to 10 hours. This is to allow further work on my Cross-Referencing project.
  1. Own Website: Priority 1 Items By Category:-
    • Architecture
      1. Compact and Repair Problems
        1. On compacting and repairing my main database I sometimes get the error "The query cannot be completed. Either the size of the query result is larger than the maximum size of a database (2 GB), or there is not enough temporary storage space on the disk to store the query result".
        2. It happens 3 times while the database is re-opening.
        3. There is lots of space, and the database is only 600Mb (and the error started when it was under 500Mb).
        4. This mostly happens after I've run long processes, so I usually close the database, re-open it and then try the compact and repair. Usually this works, but not always. But I then try again and the message disappears.
        5. I strongly suspect that this is MS Access itself re-indexing tables, and blowing up a temporary database, but I can’t find any evidence for this on-line. Or help, other than suggestions to split databases and do other sensible things. Given that the error occurs when the database is re-opening, with no temporary file visible, is very strange.
        6. 17/04/20 - set MaxLocksPerFile to 1,000,000 (from the default 9,500). Sadly, it doesn't seem to have made a difference.
      2. Complete XRef-re-engineering project:-
        1. Ensure all links and link-pages use the new XRef table, and pension off the old tables.
        2. Look into writing out specific object-identifiers, and linking thereto for Citations, rather than paragraph references. An issue is multiple instances of the same object in a document.
        3. Check all link-types still work and fix any errors.
        4. Complete the auto-triggering of regeneration of “associated” link pages.
        5. Fix update bug in Convert_Webrefs.
        6. Fix Bug whereby PaperSummary pages seem to have “Works-” and “Books/Papers-” Citings that refer to the same link-pages.
        7. Document the process!
      3. Review effectiveness of hyperlinking method in the light of PhD and Philosophy of Religion experience.
      4. Where possible, use ID rather than NAME for in-page hyperlinks
    • Backups
      1. Investigate Record-count discrepancies:-
        1. How do website files work as far as counts are concerned?
        2. Why aren't they recorded in Backup_History, nor the fact that the website was backed up?
        3. Different counts depending on whether new or old laptop is backed up. Investigate 63k discrepancy - lower on new laptop.
      2. Review architecture to improve performance; Need to document first
    • Books
      1. Further improve the time to regenerate Book Summaries. Now takes about 39 minutes, but should be under 5 minutes!
    • Books/Papers
      1. Investigate whether multiple Subject/Topic/Subtopic usage leads anywhere (ie. are just the first (of 3) actually used). Fix anything amiss.
      2. Reformat the PaperCitings pages:-
        1. Include only useful information on the detail pages; but if there are multiple links from the same object, include them on the same line as 'extra links' as in Authors' Citations (copy the code).
        2. Include counts on the summary page.
        3. Document!
        Do the same for BookCitings
    • Bridge
      1. Develop auto-reconciliation routines vs EBU results download
    • Documenter
      1. Investigate the error reports from the Documenter, especially unused variables & queries.
      2. Provide Functional Documentation for Website Generator (using Notes)
    • Education
      1. "Sitepoint (Learnable) - Sitepoint Learnable Web Development Courses": Membership cancelled, but plan what to do with the eBooks in my possession.
      2. Read "PC Pro - Computing in the Real World".
      3. Read "White (Ron) & Downs (Timothy Edward) - How Computers Work: The Evolution of Technology".
    • Infrastructure
      1. iCloud for Windows: Re-install & solve 'The upload folder for iCloud Photos is missing' problem. Try on new Laptop.
    • Notes
      1. Add "Note Alternates" to Note pages.
      2. Add option in Auto-Reference Notes to automatically ignore words containing certain strings that include the key-word (eg. ignore 'grace' and 'trace' when indexing 'race')
      3. Add option in Auto-Reference Notes to only confirm new items (leaving previously-flagged items untouched)
      4. Allow the option to concatenate Notes in the Printed version (ie. linearly embed them essay-style), rather than treating the hyperlinks as footnotes – but still keep the hyperlink & cross-referencing in place.
        1. For use as "disclaimers" - eg. for "Plug Notes".
        2. For Thesis / essays: the difficulty here is the need for linking passages to make the text run smoothly.
      5. As part of the Cross-Referencing project, check out the consistent treatment of Note 87516, which should be universally ignored. Recently, links to it appeared on Book-Summaries, Book_Paper_Abstracts and Note_Book_Links, as a Note referencing a Book. The critical item was a row on the Note_Book_Links table.
      6. Determine why very long printable notes (eg. Level 3+ for Note 17017) are being truncated. Probably suppress them in any case, as they take far too long to load.
      7. Fix bugs in multi-level footnoting in Printable Notes – the referencing is going wrong.
      8. Investigate Note_Links: Section references seem to be incorrect
      9. Printable Notes: fix the bug whereby the “private” flag is round the wrong way.
      10. Split Aeon Page18 into multiple sub-pages (either by topic or by priority)
      11. Suppress the publication of the Printable versions of Temp Notes
    • Papers
      1. The monthly regeneration process for Paper Abstracts still takes just over 5 hours. Problem is with Cross_Reference_Deletions and Cross_Reference_Additions. Cannot be fixed until the cross-referencing project is fully complete and documented.
    • Photos
      1. Develop software & procedure to make adding more content to the photos pages easier to undertake.
      2. Timeline software: Add photos for Holidays & Family History
    • Process
      1. Determine why Recalculation & Changed Book/Papers produce unneeded regeneration.
    • Spider
      1. Analyse the results of the data collection exercise and design a plan of campaign to fix broken Internal links and prevent recurrence.
        1. Correct the code so the problems discovered by the Spider don’t recur.
        2. Delete 'orphan pages' that are never linked to, ie. Use the Spider to prune redundant pages19 automatically where possible.
        3. Fix the historical data where errors are uncovered by the Spider. An easier task now the site has a full-regen function.
      2. The size of the main database bloats to over 1.6Gb during the spider run, so is approaching the 2Gb limit.
        1. Use Check_Database_Size, with a parameter, to monitor the size – output a message along the those reporting the compact / repair of the Slave database).
        2. Put a checks in to STOP if over 1.8Gb.
        3. Determine a solution as the limit is pushed. Some tables are “local” for performance reasons and are later copied to the Slave … maybe move them?
      3. The Spider was generating WebRefs. Procedurally, this ought not to have been possible.
        1. The major problem turned out to be because unprocessed20 URLs got added to the end of the last WebLinks_Tester_Brief page, which then got Spidered. I've stopped this happening, so hopefully the problem will not recur. The fix was made in 18Q2.
        2. However, 4 other creations appeared - dated 18/05/18 - from the run of 10/07/18. The creation date was from the previous spider run, but the IDs show that they were produced in the latest run.
        I've re-opened the case!
    • Status
      1. Quarterly Project Reports: Correct Functor_08. The Project Planned YTD % keeps having to be bodged!
    • Technology
      1. Look into Sistrix Smart21. Errors and warnings itemised are:-
        1. Duplicate content: seems to be variants on
        2. Title Tags: Empty, too long, identical
        3. Page Not Found
        4. Filesize in excess of 1Mb
        5. Meta-Description: Empty
        6. Few words on Page
        7. H1: Not used, used multiple times per page, identical across pages
        8. Pictures: Alt attribute missing
    • WebRefs
      1. Documentation & Bug-fixes: Phase 2
        1. Re-document the procedures in the light of recent changes.
        2. Resolve issues generated / revealed by the spider.
        3. Investigate - and fix where possible - broken links.
      2. Improve WebRefs checker (Webrefs_Update) further to check for Error 403 "Forbidden". This will often involve finding way of checking pdfs where the returned page is in fact HTML or XML (see DevLog Ref 379).
      3. Investigate defunct items. Populate Defunct_Explanation in WebRefs_Table and include in relevant WebLinks_Tester reports. Consider use of FairUse (Link (Fair Use)) for documents no longer available that I'd downloaded.
      4. Reformat WebLinks_Tester.htm, WebLinks_Tester_Map.htm, WebLinks_Tester_Full.htm & WebLinks_Tester_Full_Map.htm
        1. Clarify 'truncated': Display, not link
        2. Allow more space for 'link returned', 'issue' and 'display text'
        3. The 'As Above" lines waste space. Only for Notes Archive? Consolidate onto single second line.
      5. Reformat WebLinks_Tester_Brief: Allow more space for 'link returned', 'issue' and 'display text'
  2. Other Websites: Priority 1 Items By Category:-

Summary of Progress to Date

This is hived off to various separate documents, which have now been harmonising and / or consolidated:-
  1. Summary of Progress to Date22.
  2. Outstanding Developments23,
  3. Functional Documentation24,
  4. A summary of time expended across the years developing my website25 is at "Software Development - Website - Development".

In-Page Footnotes:

Footnote 4: Footnote 5: Footnote 7: Footnote 9: Footnote 10: Footnote 11: Footnote 14: Footnote 19: Footnote 20: Footnote 21: Footnote 25:

Printable Version:

Table of the Previous 12 Versions of this Note: (of 77)

Date Length Title
04/04/2020 00:14:24 19563 Status: Web-Tools (2020 - March)
19/01/2020 23:41:17 19103 Status: Web-Tools (2019 - December)
10/10/2019 23:58:34 18052 Status: Web-Tools (2019 - September)
14/07/2019 20:29:46 16642 Status: Web-Tools (2019 - June)
05/04/2019 10:36:29 16128 Status: Web-Tools (2019 - March)
06/01/2019 23:36:58 18445 Status: Web-Tools (2018 - December)
10/10/2018 16:43:41 22079 Status: Web-Tools (2018 - September)
06/07/2018 18:56:10 16773 Status: Web-Tools (2018 - June)
05/04/2018 10:48:00 16588 Status: Web-Tools (2018 - March)
05/01/2018 00:11:31 13295 Status: Web-Tools (2017 - December)
09/10/2017 23:25:26 11848 Status: Web-Tools (2017 - September)
20/07/2017 14:34:05 11297 Status: Web-Tools (2017 - June)

Note last updated Reference for this Topic Parent Topic
03/07/2020 22:09:07 520 (Status: Web-Tools (2020 - June)) Status: Summary (2020 - June)

Summary of Note Links from this Page

Aeon Papers CT Introduction Internet Technology and Philosophy Research - Internet Technology Research - Proposal
Test Note - Auto-XRef Website - Outstanding Developments (2020 - August) Website - Progress to Date (2020 - August) Website Generator Documentation - Control Page Website Generator Documentation - Cross-Referencing
Websites Maintained by Theo Todman        

To access information, click on one of the links in the table above.

Summary of Note Links to this Page

Internet Technology and Philosophy King's Maths Questions Simon - T1S1T1 Status: Consciousness Studies (2020 - June) Status: Priority Task List (2020: August)
Status: Summary (2020 - June), 2 Status: Summary Task List (2020: July - August) Status: Summary Task List (YTD: 19Q4 - 20Q3) Theo Todman's Philosophy Page Theo Todman's Website Maintainance History
Website - Progress to Date (2020 - August), 2, 3, 4, 5 Website Generator Documentation - Control Page      

To access information, click on one of the links in the table above.

Authors, Books & Papers Citing this Note

Author Title Medium Extra Links Read?
Software Development Website - Development Paper Low Quality Abstract    

Text Colour Conventions

  1. Black: Printable Text by me; © Theo Todman, 2020
  2. Blue: Text by me; © Theo Todman, 2020

© Theo Todman, June 2007 - August 2020.Please address any comments on this page to output:
Website Maintenance Dashboard
Return to Top of this PageReturn to Theo Todman's Philosophy PageReturn to Theo Todman's Home Page