Theo Todman's Web Page - Notes Pages
Website Documentation
Website Generator Documentation - Cross-Referencing
(Text as at 04/10/2020 00:27:22)
(For earlier versions of this Note, see the table at the end)
Introduction- This document covers1 the use of the table Cross_Reference, both in how it is maintained, and what it is used for.
- In the course of writing this Note I will hopefully complete item 193 in my Development Log, ie. “Complete XRef-re-engineering project”:-
- Ensure all links and link-pages use the new XRef table, and pension off the old tables.
- Look into writing out specific object-identifiers, and linking thereto for Citations, rather than paragraph references. An issue is multiple instances of the same object in a document.
- Check all link-types still work and fix any errors.
- Complete the auto-triggering of regeneration of “associated” link pages.
- Fix update bug in Convert_Webrefs.
- Fix Bug whereby PaperSummary pages seem to have “Works-” and “Books/Papers-” Citings that refer to the same link-pages.
- Document the process!
- This is probably one of the most complex parts of my website generator, but the cross-referencing function is really the reason I embarked on this project in the first place.
The Cross_Reference Table Itself- The record-counts appear in the following table (provided by Functor_21 using query Cross_Reference_By_Type):-
Type_Calling ↓ | Type_Called → | A | B | I | N | N_A | P | W | TOTAL |
---|
A | | 36 | 36 | | 236 | | 34 | 359 | 701 |
---|
B | | 622 | 975 | | 1,746 | | 726 | 832 | 4,901 |
---|
N | | 1,775 | 2,819 | 2,287 | 6,717 | | 9,191 | 3,153 | 25,942 |
---|
N_A | | 4,683 | 22,313 | 7,910 | 1,322 | 24,729 | 47,519 | 12,019 | 120,495 |
---|
P | | 3,642 | 5,084 | 26 | 20,118 | | 12,010 | 6,345 | 47,225 |
---|
TOTAL | | 10,758 | 31,227 | 10,223 | 30,139 | 24,729 | 69,480 | 22,708 | 199,264 |
---|
|
- Key:-
- A = Author
- B = Book
- I = Image
- N = Note
- N_A2 = Archived Note
- P = Paper
- W = WebRef
- “Calling” types are in the first column, “called” types are the other column headings.
- What is meant be “calling” is that references to objects of the “called” type appear in objects of the “calling” type.
- Note that Images and WebRefs, by their nature, can be called, but cannot call.
- The table ought to be self-regulating, and maybe it is. However, there are (as of 04/10/2020 00:26:57, using Functor_22 & query Cross_Reference_By_Year) the following counts of records on the table with timestamps in the years below:-
- 2015: 936
- 2016: 68
- 2017: 3
- 2018: 766
- 2019: 1
- 2020: 197,490
- Since the website is fully regenerated on a monthly basis, cross-references prior to the latest regeneration should not exist, and require investigation. Presumably, any such are either the result of errors during development or of interventions in the system. There are (according to Functor_23, option 1, using query Dud_Cross_References_This_Year) 4110 records for 2020 prior to the October regeneration, which is worrying. Investigate! I have – this is down to an accidental duplicate Paper that I deleted. There needs to be a method for hoovering these up!
- In addition to the Cross_Reference table there are the following related tables:-
- Cross_Reference_Changes
- Cross_Reference_Zapper
- The Cross_Reference table is maintained by the following generic Routines & Queries:-
- Two of these queries are particularly time-consuming – in that they take about a second, which multiplies up drastically when run thousands of times. They are both Append queries, discussed later:
- A potential issue is the table’s primary key – an ID which is a Long integer with a maximum value of 2,147,483,647. It is needed to enable the deletion of records. These are replaced wholesale, so this ID is romping up. Using Functor_23 and query Cross_Reference_MaxID, as of 04/10/2020 00:26:58 it is 15,804,697 - but it's taken 5 years to get this far (though the number of links has been rising). So, it looks as though it won’t be under pressure for a while! It would be possible to reset it, or – given I have a 64-bit system – change the data type to LongLong, which has a maximum value of 9,223,372,036,854,775,807.
Detailed Processing- Cross_Reference Deletions
- Cross_Reference Additions
- Cross_Reference_Changes
- Following the first round of investigation and documentation, I’ve decided to delete all rows from this table more than 40 days old (or prior to the last Website Regen (as determined by query Website_Regen_Last_Run_Start) if this is earlier), using the Sub Cross_Reference_Changes_Prune (which uses Cross_Reference_Zapper). This is a temporary expedient until I introduce changes for non-Notes (Notes are already fully implemented). I’ve done this to see if it improves performance, which does seem to be the case.
- The record-counts now appear in the following table (provided by Functor_21 using query Cross_Reference_Changes_By_Type):-
Type_Calling ↓ | Type_Called → | A | B | N | P | W | TOTAL |
---|
B | | 5 | 241 | | | | 246 |
---|
N | | 19,771 | 14,837 | 50 | 17,714 | 17,927 | 70,299 |
---|
N_A | | | | | 2 | | 2 |
---|
P | | 143 | 14 | | 448 | 75 | 680 |
---|
TOTAL | | 19,919 | 15,092 | 50 | 18,164 | 18,002 | 71,227 |
---|
|
- Key:-
- A = Author
- B = Book
- I = Image
- N = Note
- N_A5 = Archived Note
- P = Paper
- W = WebRef
- Calling types are in the first column, called types are the other column headings
- Note that Images and WebRefs, by their nature, can be called, but cannot call.
- This table (according to Functor_23, option 3) has 71,227 rows, as of 04/10/2020, split by month (using Functor_22, Cross_Reference_Changes_By_Month):-
- 2020_08: 5,529
- 2020_09: 46,209
- 2020_10: 19,489
- Rows are added using two complex queries, but before describing them it’s worth describing what’s been going on. The table Cross_Reference_Zapper is populated with all the cross-references from the changed calling objects held in Cross_Reference, prior to the new ones being added in. They are removed from the Cross_Reference table ready for these new cross-references to be loaded. By the time we get to adding rows to Cross_Reference_Changes, the changes to Cross_Reference have already been applied, but comparison with Cross_Reference_Zapper tell us which pages to regenerate based on both deleted and added cross-references.
- So, the queries are:-
- Cross_Reference_Changes_Deletions_Add is run first. If anything that was deleted hasn’t been replaced, the called pages have to be regenerated.
- Cross_Reference_Changes_Additions_Add which is slow because of an inner join to the query Cross_Reference_Latest (which is a summation query on Cross_Reference_Zapper) and an outer join to the table Cross_Reference_Zapper (for which, see below).
- Something very cunning is going on here! Pages have to be regenerated whenever objects that call them have references either added or deleted, hence the two queries. Also, there needs to be some conflict avoidance.
- In order to improve the run-times of a full website regeneration (where variable Full_Regen is set to True), I’ve removed the updates of Cross_Reference_Changes (but not – of course – of Cross_Reference) from all places where they are invoked. Improvements (as determined by Functor_23, options 4 – 8) have been:-
- CreateAbstractWebPages (Paper Abstracts: run time has reduced from 8.17 hours to 1.62 hours on 01/10/2020)
- CreateAuthorsWebPages (Authors: Had already reduced to 16 minutes; now 12 minutes on 01/10/2020).
- CreateBookPaperAbstractsWebPages (Book/Paper Abstracts: run time reduced from 72 minutes to 13 minutes on 01/10/2020).
- Notes_Text_Format
→ Notes: run time reduced from 3.62 hours to 1.12 hours on 01/10/2020.
→ Notes Archived: run time reduced from 2.32 hours to 36 minutes on 01/10/2020.
This is a sensible move because – on a full re-gen – all pages are being regenerated in any case. - Rows are deleted by cmdRecalculate_Click using SQL driven by table Page_Regen, but only for Called_Type of “N”. So, the table only contains a few very recent rows of this type, but multitudes of rows for others, as is shown in the table above. I need to explain why this is the case: if looks like deletions may just have been forgotten.
- So, what is the table actually used for? Most usages are either diagnostic or maintenance, and the only serious one seems to be Page_Regen_GEN, also invoked by cmdRecalculate_Click.
- I suspect a fault in that this function regenerates the wrong pages. So, we might be on to something here! However, most pages – ie. authors, book and paper summaries – are regenerated by the badly-named cmdPaperSummaries_Click.
- On investigation, using query Page_Regen_GEN_Test, a non-updating version of Page_Regen_GEN, there were (before Cross_Reference was truncated to the latest 40-days) 21.1k rows output to Page_Regen, including 4 to Author ID=0 and 2 to Image ID=0 (but these represented over 100k and 3k rows, respectively). Not sure to the purpose of including Images since they don’t have pages to regenerate (WebRefs are already excluded for that reason).
- Table Page_Regen is then used 4 times in cmdRecalculate_Click:-
→ to warn how many Notes with be regenerated
→ to delete all its rows
→ to regenerate all its rows, as above
→ to regenerate all “called” Notes based on the rows just created. - No queries use the table other in the circumstances just listed. So, it seems that the table is not used other than to regenerate Notes implicated in changes to other objects (including Notes).
- Hence, it looks like the functions envisaged for the Cross_Reference_Changes table have not been fully implemented, and that it can be truncated until they have been!
- Note that it’s not straightforward to fully implement regeneration of the “impacted” pages, as some are cross-references … more on this later.
- I now delete all rows more than 40 old days in cmdRecalculate_Click.
- Cross_Reference_Zapper
Use of Links in Cross-Reference Pages- There are several consolidated sets of pages that are produced that show the links between various (classes of) objects on my website.
- In general, there’s an alphabetic jump table with links to Objects with titles or Authors starting with the letter in question. These jump tables link to Alphabetical lists which themselves link to individual Object-specific pages.
- The Object-Specific pages are themselves linked to from the Summary or Abstract pages for these Objects.
- These pages and routines have grown up over time and were rushed out when needed using a variety of routines using inconsistent methods with a lot of code duplication. The whole lot needs sorting out! In particular, many routines use individual cross-reference tables noted in the next section, rather than Cross_Reference.
- For now, I just list the jump tables and will add comments and links to the generators in due course:-
→ Authors Summary
→ Book Citations
→ Books-to-Notes Link Pages
→ Paper Citations
→ Papers-to-Notes Link Pages
Improvements and Rationalisation Required- Once the process has been fully documented, it needs to be improved to make the links more stable between site-regenerations. Ideally, the in-page links should contain the ID of the called object, so that the return link is exact. There might be an issue where the same object is referenced multiple times within a document. So, there needs to be an incremental number as well. However, maybe the problem would go away if the impacted pages were all regenerated together. Also, I think the issue varies between the different pairs of objects.
- The above-mentioned table was designed to be a generic replacement for – or, maybe, a supplement to6 – various cross-object tables:-
→ Author_Book_Links
→ Author_Book_Links_Temp
→ Author_Paper_Links
→ Author_Paper_Links_Temp
→ Book_Book_Links
→ Book_Note_Links
→ Book_Paper_Links
→ Book_Paper_Solitons
→ Book_Paper_Solitons_Zapper
→ Book_Papers
→ Book_Papers_Full
→ Note_Book_Links
→ Note_Book_Links_Temp
→ Note_Links
→ Note_Link_Zapper
→ Note_Link_Zapper2
→ Note_Link_Zapper3
→ Note_Links_Archived
→ Note_Paper_Links
→ Note_Paper_Links_Temp
→ Paper_Book_Links
→ Paper_Book_XRef_Temp
→ Paper_Books
→ Paper_Note_Links
→ Paper_Paper_Links - The majority of the above tables should probably have been retired by now! However, I need to be careful as some may be used as temporary tables, ultimately populated from Cross_Reference.
Performance Improvements- Further, I note here that there are some very complex queries that dig out the links when pages are being generated. During a full regeneration, these queries (that take a second or so to run) mount up when run thousands of times and bloat the site-regeneration time to around 11 hours. One of the main purposes of this documentation is to determine where the pinch-points are, to avoid unnecessary processing and to “materialise” certain “views” (ie. convert queries to tables).
- The materialised views currently developed bloat the size of the database, so a more subtle method may need to be developed in due course. They are:-
→ Authors_Cited_By_All_List
→ Authors_Summary_Temp
→ Authors_Summary_Temp_Count
→ BookPaperAbstracts_List
→ Book_Citings_List_New
→ Notes_Cited_By_All_List
→ Paper_Citings_List_New
In-Page Footnotes:
Footnote 1: - Or will do, once it has been completed!
Footnote 2: - The Type_Calling and Type_Called are actually “N”, just as for Live Notes, but I’ve added the “_A” to distinguish the two types where the corresponding Calling or Called Timestamps are non-zero.
Footnote 4: Footnote 5: - The Type_Calling and Type_Called are actually “N”, just as for Live Notes, but I’ve added the “_A” to distinguish the two types where the corresponding Calling or Called Timestamps are non-zero.
- For some reason, all these rows and columns have disappeared.
- There were rows (but – unsurprisingly – no columns) in the two earliest versions of this Note.
- Investigate in due course – it may not actually be an issue.
Footnote 6: - It’s use is that it is two-way: it shows those objects that have been impacted by changes to other objects, in that the links to it have changed.
- This functionality has only been implemented for Notes, it seems.
Printable Version:
Table of the Previous 2 Versions of this Note:
Note last updated |
Reference for this Topic |
Parent Topic |
04/10/2020 00:27:22 |
1300 (Website Generator Documentation - Cross-Referencing) |
None |
Summary of Note Links from this Page
To access information, click on one of the links in the table above.
Summary of Note Links to this Page
To access information, click on one of the links in the table above.
Text Colour Conventions
- Black: Printable Text by me; © Theo Todman, 2021
- Blue: Text by me; © Theo Todman, 2021