Theo Todman's Web Page - Notes Pages
Animadversions
Languages on Ling: Comparative Database Summary Page
(Work In Progress: output at 01/09/2023 08:42:03)
Introduction
- After a considerable struggle, I’ve managed to find a way of scraping the data from the Ling website (Ling: Learn) so that:-
- I can quickly review what I’ve learnt of the various languages I’ve been studying,
- I can compare the vocabularies and phraseology of related languages, which is both interesting in its own right, and will help sort out confusion.
- I can look up various words to see how theoretically unrelated languages borrow from one another.
- So far, I’ve written the code to extract from the Ling Review sections both Vocabulary and Dialogues.
- Most of the languages don’t have grammar sections, and when they do, they are of non-standard formats which would be difficult to extract, so I’ve ignored them for these purposes.
- Great thanks to Ling for the data (which has been borrowed without permission). Treat this as an advertisement for their excellent Site and App, both of which provide many features not reproduced here: in particular the progress tests and the pronunciation, which are essential for actually getting a proper feel for the language. Also, providing all 50 lessons for a language in one big splodge (as this site will eventually do) is no help for actually learning anything without first having gone through the gradualist programme that Ling provides.
- I’ve created nine further pages in order to make the data more manageable:-
- Languages on Ling: Vocabulary (Latin Scripts)1: Lesson Order
- Languages on Ling: Vocabulary (Non-Latin Scripts)2: Lesson Order
- Languages on Ling: Dialogue (Latin Scripts)3: Lesson Order
- Languages on Ling: Dialogue (Non-Latin Scripts)4: Lesson Order
- Languages on Ling: Vocabulary CrossTab5: Lesson Order
- Languages on Ling: Vocabulary CrossTab6: Category Order
- Languages on Ling: Vocabulary Phrase CrossTab7: Lesson Order
- Languages on Ling: Vocabulary Phrase CrossTab8: Category Order
- Languages on Ling: Dialogue CrossTab9: Lesson Order
- These pages are under development, as is the actual importing of the data.
- I’ve also produced some restrictions of the above to make it easier to compare Ukrainian with Russian and Polish (I may do the same with other language groupings later).
- Languages on Ling: Ukrainian / Russian / Polish Vocabulary CrossTab10: Lesson Order
- Languages on Ling: Ukrainian / Russian / Polish Vocabulary CrossTab11: Category Order
- Languages on Ling: Ukrainian / Russian / Polish Vocabulary Phrase CrossTab12: Lesson Order
- Languages on Ling: Ukrainian / Russian / Polish Vocabulary Phrase CrossTab13: Category Order
- Languages on Ling: Ukrainian / Russian / Polish Dialogue CrossTab14: Lesson Order
- I have to bear in mind that all this is meant simply to make time spent walking the dog more intellectually interesting, so many avenues that cannot be taken while dog walking cannot be followed up without encroaching on time needed for my other projects.
- For instance, there’s lots of vocabulary in the Examples and the Dialogues that’s not otherwise introduced, so won’t appear on the Vocabulary pages. I don’t expect to find the time to port it over.
Technical Note to Self15
- The importation of the data – while greatly facilitated by the routines I’ve developed – still requires some rather tedious manual intervention.
- The pages I’m after don’t have URLs that define their content uniquely – the page seems to be returned in real-time by Javascript using selections on the pages themselves, and I don’t know how to automate these selections. While I use ADODB.Stream to access the data, I can’t attach this to the on-line page itself, but only to a copy and pasted copy in a local text file. Thankfully – otherwise I’d have given up – the Javascript places a whole Lesson on the page (even though on-line you have to click through item by item), so a single CtrlA/CtrlC grabs the lot.
- So, for each of the 50 Lessons for each of the 24 Languages I’m currently studying, I have to click to the relevant Ling Review page, select and copy it all – for Vocabulary and Dialogue in turn – and paste into a text file in the “C:\Theo's Files\Languages\Ling” directory, named so as to indicate the Language, Lesson and content-type it is. Don’t forget to press “save”!!
- Once these text files are in the right directory, importation is trivial, using the following routines:-
→ Import_Ling_Vocabulary, and
→ Import_Ling_Dialogue
- The way it works for English is the process checks to see if there are any non-empty rows for that Lesson in Ling_Vocabulary_English (or Ling_Dialogue_English). If there are, it assumes that English has been loaded for another Language. Otherwise, it deletes any rows for that Lesson (these will be empty) and inserts the English from the text file.
- As for the Language tables (Ling_Vocabulary_Langauge (Note16), or Ling_Dialogue_Language), all rows for the relevant Lesson are deleted before the text file is imported.
- There is one caveat for the Vocabulary pages. Usually there is a vocabulary item followed by an example, but not always.
- To get round this, I check the length of the putative Vocabulary item (in English) and if it is more than 20 characters long, output a message and pause for a decision: I check the debug window and if it indicates a missing “example”, so that Vocabulary and Examples are out of step, I stop the import and edit the text file adding a fake “example” (3 lines containing only “skip17” [or two such lines for those Languages with Latin scripts requiring no transliteration]).
- Initially, the Vocabulary is loaded into a memory-array, so if one of these problems arises, just terminate it, fix the text file, and re-run. No database updates need removing.
- Currently the Lessons with such issues are:-
- Lesson 3 (two missing Examples: 1 & 7)
- Lesson 5 (one missing Example: 11)
- Lesson 6 (one missing Example: 19)
- Lesson 7 (one missing Example: 11)
- Lesson 9 (three18 missing Examples: 5, 6 & 17)
- Lesson 12 (one missing Example: 15)
- Lesson 13 (two missing Examples: 1 & 14)
- Lesson 16 (two missing Examples: 1 & 23)
- Lesson 18 (one missing Example: 9)
- Lesson 22 (False alert occurs at item 6 … just continue)
- Lesson 23 (one missing Example: 23)
- Lesson 27 (two missing Examples: 1 & 16)
- Lesson 28 (one missing Example: 16)
- Lesson 29 (three missing Examples: 7, 13 & 19)
- Lesson 30 (two missing Examples: 7 & 13)
- So, there are 2,200 of these manual copies and file-name edits19 to do. I have to admit to being hopeless at repetitive tasks that require attention … my mind wanders off and I make mistakes and the whole thing takes many times as long as it should. So, I intend to do this in small chunks over the next few years.
- A key design issue is that all the Languages on Ling have the same set of Lessons, Vocabulary and Dialogue – otherwise the inter-Language comparisons wouldn’t make sense. As far as I can tell, this is currently the case. I imagine it’ll always be so, but it’s possible that Ling will evolve so that the database underlying the App will change over time. To limit any impact of this possibility, I’ve decided to proceed with the copying of all my chosen languages in step, even though I’m not studying them all at the same rate. Then, at least they’ll be consistent (or nearly so) even if not with the live Ling database for the earlier Lessons.
- My first aim – now complete – was to get the first 5 lessons for all 2220 Languages in. This took around 7 hours. So, it’ll take around 63 further hours for the remaining 45 lessons. I’ll probably do this in 5-lesson chunks as and when I get that far in the lessons.
- Progress on this effort can be gleaned from the headers of:-
- Languages on Ling: Vocabulary (Latin Scripts)21: Lesson Order
- Languages on Ling: Vocabulary (Non-Latin Scripts)22: Lesson Order
In-Page Footnotes:
Footnote 15:
- These instructions should be moved to a new Website Documentation Note on Ling and Language.
Footnote 16:
- I’ll need to correct this spelling sometime!
Footnote 17:
- Rather irritatingly, I mistyped this as “snip” in the following files:-
→ Italian – Lessons 12, 13
→ Japanese – Lessons 12, 13
→ Thai – Lessons 16, 18
- This led to “snip” appearing as the phrase (English, Language & Transliteration (where relevant)).
- I fixed the tables, and the import files themselves, though I didn’t re-import them.
Footnote 18:
- Originally I thought there was only one – I missed out ‘meal’ and ‘menu’ as they are consecutive. So, ‘menu’ was taken as an example for ‘meal’, and didn’t appear in the vocabulary.
- This was a bit tricky and tedious to fix. I updated the English and Language tables directly for those items already entered, rather than updating the text files and re-importing.
- To ensure future imports of other languages (the next was Hebrew) work correctly, I had to adjust the Vocab_IDs so the new item appears in the right sequence.
- I marked the import text files as ‘dud’ in case I ever need to re-import them.
Footnote 19:
- As this was tedious and error-prone, I wrote a program (Default_Text_Files) to set these up and park them in directory “C:\Theo's Files\Languages\Ling\zDummyFiles”.
- I move these to the “C:\Theo's Files\Languages\Ling” directory prior to the cutting and pasting.
Footnote 20:
- This list has now expanded to 24 languages with the addition of Ukranian and Polish.
Table of the Previous 5 Versions of this Note:
Note last updated |
Reading List for this Topic |
Parent Topic |
01/09/2023 08:42:03 |
None available |
None |
Summary of Notes Referenced by This Note
To access information, click on one of the links in the table above.
Summary of Notes Citing This Note
To access information, click on one of the links in the table above.
Authors, Books & Papers Citing this Note
Author |
Title |
Medium |
Extra Links |
Read? |
Todman (Theo) |
Brief Thoughts on Language & Languages |
Paper  |
2, 3, 4 |
Yes |
Text Colour Conventions
- Blue: Text by me; © Theo Todman, 2023