Large language model
Wikipedia
Source: Wikipedia; Extract taken 28 June 2023
Paper - Abstract

Paper StatisticsBooks / Papers Citing this PaperNotes Citing this PaperColour-ConventionsDisclaimer


Contents

  1. Introduction
  2. Training
  3. Application to downstream tasks
  4. Properties
  5. History
  6. Architecture
  7. Compression
  8. Evaluation
  9. Interpretation
  10. Wider impact
  11. List of large language models
  12. Further reading

Introduction
  1. A large language model (LLM) is a computerized language model consisting of an artificial neural network with many parameters (tens of millions to billions), trained on large quantities of unlabeled text containing up to trillions of tokens, using self-supervised learning or semi-supervised learning achieved by parallel computing. With advent of transformers, notable for requiring less training time compared to older long short-term memory (LSTM) models, thus enabling large (language) datasets, such as the Wikipedia Corpus and Common Crawl, to be used for training due to parallelization, LLMs emerged around 2018. They are general purpose models, excelling at a wide range of tasks. The level of proficiency exhibited by language models in various tasks, as well as the breadth of tasks they can handle, rely less on the model's design and more on the size of the training corpus, the quantity of parameters, and the computational power achieved by parallel computing. This has shifted the focus of natural language processing research away from the previous paradigm of training specialized supervised models for specific tasks.
  2. Because the training corpus and number of parameters were so extensive, despite being trained on simple tasks such as predicting the next word in a sentence, they have along the way (implicitly) learned syntax and semantics of human language. Although they acquired general "knowledge" about the world, inherent in the statements contained in the training corpus, they also acquired the inaccuracies and biases from those statements.

Comment:

See Wikipedia: Large language model.

Text Colour Conventions (see disclaimer)

  1. Blue: Text by me; © Theo Todman, 2023
  2. Mauve: Text by correspondent(s) or other author(s); © the author(s)



© Theo Todman, June 2007 - Sept 2023. Please address any comments on this page to theo@theotodman.com. File output:
Website Maintenance Dashboard
Return to Top of this Page Return to Theo Todman's Philosophy Page Return to Theo Todman's Home Page