Theo Todman's Web Page - Notes Pages


Website Documentation

Website Generator Documentation - Cross-Referencing

(Text as at 04/10/2020 00:27:22)

(For earlier versions of this Note, see the table at the end)


Introduction


The Cross_Reference Table Itself
Detailed Processing
  1. Cross_Reference Deletions
  2. Cross_Reference Additions
  3. Cross_Reference_Changes
    • Following the first round of investigation and documentation, I’ve decided to delete all rows from this table more than 40 days old (or prior to the last Website Regen (as determined by query Website_Regen_Last_Run_Start) if this is earlier), using the Sub Cross_Reference_Changes_Prune (which uses Cross_Reference_Zapper). This is a temporary expedient until I introduce changes for non-Notes (Notes are already fully implemented). I’ve done this to see if it improves performance, which does seem to be the case.
    • The record-counts now appear in the following table (provided by Functor_21 using query Cross_Reference_Changes_By_Type):-
       
      Type_Calling ↓Type_Called →ABNPWTOTAL
      B 5241   246
      N 19,77114,8375017,71417,92770,299
      N_A    2 2
      P 14314 44875680
      TOTAL 19,91915,0925018,16418,00271,227

    • Key:-
      • A = Author
      • B = Book
      • I = Image
      • N = Note
      • N_A5 = Archived Note
      • P = Paper
      • W = WebRef
      • Calling types are in the first column, called types are the other column headings
      • Note that Images and WebRefs, by their nature, can be called, but cannot call.
    • This table (according to Functor_23, option 3) has 71,227 rows, as of 04/10/2020, split by month (using Functor_22, Cross_Reference_Changes_By_Month):-
      • 2020_08: 5,529
      • 2020_09: 46,209
      • 2020_10: 19,489
    • Rows are added using two complex queries, but before describing them it’s worth describing what’s been going on. The table Cross_Reference_Zapper is populated with all the cross-references from the changed calling objects held in Cross_Reference, prior to the new ones being added in. They are removed from the Cross_Reference table ready for these new cross-references to be loaded. By the time we get to adding rows to Cross_Reference_Changes, the changes to Cross_Reference have already been applied, but comparison with Cross_Reference_Zapper tell us which pages to regenerate based on both deleted and added cross-references.
    • So, the queries are:-
      1. Cross_Reference_Changes_Deletions_Add is run first. If anything that was deleted hasn’t been replaced, the called pages have to be regenerated.
      2. Cross_Reference_Changes_Additions_Add which is slow because of an inner join to the query Cross_Reference_Latest (which is a summation query on Cross_Reference_Zapper) and an outer join to the table Cross_Reference_Zapper (for which, see below).
    • Something very cunning is going on here! Pages have to be regenerated whenever objects that call them have references either added or deleted, hence the two queries. Also, there needs to be some conflict avoidance.
    • In order to improve the run-times of a full website regeneration (where variable Full_Regen is set to True), I’ve removed the updates of Cross_Reference_Changes (but not – of course – of Cross_Reference) from all places where they are invoked. Improvements (as determined by Functor_23, options 4 – 8) have been:-
      1. CreateAbstractWebPages (Paper Abstracts: run time has reduced from 8.17 hours to 1.62 hours on 01/10/2020)
      2. CreateAuthorsWebPages (Authors: Had already reduced to 16 minutes; now 12 minutes on 01/10/2020).
      3. CreateBookPaperAbstractsWebPages (Book/Paper Abstracts: run time reduced from 72 minutes to 13 minutes on 01/10/2020).
      4. Notes_Text_Format
        → Notes: run time reduced from 3.62 hours to 1.12 hours on 01/10/2020.
        → Notes Archived: run time reduced from 2.32 hours to 36 minutes on 01/10/2020.
      This is a sensible move because – on a full re-gen – all pages are being regenerated in any case.
    • Rows are deleted by cmdRecalculate_Click using SQL driven by table Page_Regen, but only for Called_Type of “N”. So, the table only contains a few very recent rows of this type, but multitudes of rows for others, as is shown in the table above. I need to explain why this is the case: if looks like deletions may just have been forgotten.
    • So, what is the table actually used for? Most usages are either diagnostic or maintenance, and the only serious one seems to be Page_Regen_GEN, also invoked by cmdRecalculate_Click.
    • I suspect a fault in that this function regenerates the wrong pages. So, we might be on to something here! However, most pages – ie. authors, book and paper summaries – are regenerated by the badly-named cmdPaperSummaries_Click.
    • On investigation, using query Page_Regen_GEN_Test, a non-updating version of Page_Regen_GEN, there were (before Cross_Reference was truncated to the latest 40-days) 21.1k rows output to Page_Regen, including 4 to Author ID=0 and 2 to Image ID=0 (but these represented over 100k and 3k rows, respectively). Not sure to the purpose of including Images since they don’t have pages to regenerate (WebRefs are already excluded for that reason).
    • Table Page_Regen is then used 4 times in cmdRecalculate_Click:-
      → to warn how many Notes with be regenerated
      → to delete all its rows
      → to regenerate all its rows, as above
      → to regenerate all “called” Notes based on the rows just created.
    • No queries use the table other in the circumstances just listed. So, it seems that the table is not used other than to regenerate Notes implicated in changes to other objects (including Notes).
    • Hence, it looks like the functions envisaged for the Cross_Reference_Changes table have not been fully implemented, and that it can be truncated until they have been!
    • Note that it’s not straightforward to fully implement regeneration of the “impacted” pages, as some are cross-references … more on this later.
    • I now delete all rows more than 40 old days in cmdRecalculate_Click.
  4. Cross_Reference_Zapper

Use of Links in Cross-Reference Pages
Improvements and Rationalisation Required
Performance Improvements



In-Page Footnotes:

Footnote 1: Footnote 2: Footnote 4: Footnote 5: Footnote 6:


Printable Version:



Table of the Previous 2 Versions of this Note:

Date Length Title
03/07/2020 22:09:07 17661 Website Generator Documentation - Cross-Referencing
27/06/2020 00:15:50 15964 Website Generator Documentation - Cross-Referencing



Note last updated Reference for this Topic Parent Topic
04/10/2020 00:27:22 1300 (Website Generator Documentation - Cross-Referencing) None

Summary of Note Links from this Page

Test Note - Auto-XRef        

To access information, click on one of the links in the table above.




Summary of Note Links to this Page

Website - Progress to Date (2020 - October), 2 Website Generator Documentation - Functors, 2, 3, 4      

To access information, click on one of the links in the table above.




Text Colour Conventions

  1. Black: Printable Text by me; © Theo Todman, 2020
  2. Blue: Text by me; © Theo Todman, 2020




© Theo Todman, June 2007 - Oct 2020.Please address any comments on this page to theo@theotodman.com.File output:
Website Maintenance Dashboard
Return to Top of this PageReturn to Theo Todman's Philosophy PageReturn to Theo Todman's Home Page