Contents
- Introduction
- Training
- Application to downstream tasks
- Properties
- History
- Architecture
- Compression
- Evaluation
- Interpretation
- Wider impact
- List of large language models
- Further reading
Introduction
- A large language model (LLM) is a computerized language model consisting of an artificial neural network with many parameters (tens of millions to billions), trained on large quantities of unlabeled text containing up to trillions of tokens, using self-supervised learning or semi-supervised learning achieved by parallel computing. With advent of transformers, notable for requiring less training time compared to older long short-term memory (LSTM) models, thus enabling large (language) datasets, such as the Wikipedia Corpus and Common Crawl, to be used for training due to parallelization, LLMs emerged around 2018. They are general purpose models, excelling at a wide range of tasks. The level of proficiency exhibited by language models in various tasks, as well as the breadth of tasks they can handle, rely less on the model's design and more on the size of the training corpus, the quantity of parameters, and the computational power achieved by parallel computing. This has shifted the focus of natural language processing research away from the previous paradigm of training specialized supervised models for specific tasks.
- Because the training corpus and number of parameters were so extensive, despite being trained on simple tasks such as predicting the next word in a sentence, they have along the way (implicitly) learned syntax and semantics of human language. Although they acquired general "knowledge" about the world, inherent in the statements contained in the training corpus, they also acquired the inaccuracies and biases from those statements.
Comment:
See Wikipedia: Large language model.
Text Colour Conventions (see disclaimer)
- Blue: Text by me; © Theo Todman, 2023
- Mauve: Text by correspondent(s) or other author(s); © the author(s)