Definition

1.1. Definition#

First of all, what is a language model?

Language Model

A language model is a model that learns patterns from language data.

Remember this definition does not imply any virtues we may want from language models, i.e., factual, responsible, or explainable.

Ok, so - what is a large language model? Defining LLM is as impossible as defining jazz music 😉 (ok perhaps a bit easier.) Let me fail in this way:

Large Language Model

A large language model is language models that are large enough, usually with more than a billion parameters, to demonstrate zero-shot and few-shot abilities.

See the table below and the trend.

Name

Number of parameters

Birth year

BERT (medium)

0.047B

2018

BERT (base)

0.110B

2018

BERT (large)

0.340B

2018

BERT (xlarge)

1.270B

2018

Some Google Translate models (LSTMs)

0.160 - 0.380B

2016-2020 (estimate)

GPT-1

0.117B

2018

GPT-2

1.5B

2019

GPT-3

1.3 - 175B

2020

BERT (xlarge)[DCLT18] is over 1B, but it doesn’t perform any zero-shot or few-shot abilities as it still remains to be a word and document embedding model.

GPT-1 [RNS+18] is way under 1B and didn’t claim any few-shot or zero-show generalizeability (its title is Improving language understanding by generative pre-training.)
Being 1.5B, GPT-2 [RWC+19] was suddenly too powerful that OpenAI famously said “Due to our concerns about malicious applications of the technology, we are not releasing the trained model”. And what was the title? Language models are unsupervised multitask learners.

A year later, GPT-3 [BMR+20] was present in a paper titled Language Models are Few-Shot Learners.