AI - LLM - What word comes next

Seed text → Predict next word

Probability distribution of possible next words.

Read input (does not have to be purely text, could be any input including sound, images).

Helpful is input and output are real numbers; as therefore can do calculus and gradients.

Lookup Table

Create a lookup table, associating each word with others words in the same context, such as in the same sentence.
- Encode the position of a word in the context.
- Are there nouns at specific positions?
- For each word, are there any adjectives in certain positions, in front of the word?
Determine what is the probability of that word being the next word, from the list of possible words.
Use back-propagation to adjust weights.

Weights = which words are associated with other words.

Bigger numbers matter more.
Smaller numbers are unrelated; matter less.
Use dot.product; as cheaper
- v1 * w1 = v1w1
- v1w1 + v2w2 + v3w3 = dot.product
  - Measures whether this points in the same direction.
    - Positive value means it points (aligns).
    - Zero means it is perpendicular (unrelated).
    - Negative value means it is opposit.

Activation

Use softmax
- Converts weights to all be between zero and one; and add up to one.

Aim for low cost; in determining the next word.

cost formula is: cost = -log(probability).
- A low cost, close to zero, is where the probability is closer to one.
- A low probability (i.e. not predicting the next word), increases cost very steeply.
There is a cost per word, but same applies to entire network.
- Cost of entire network will be sum or avg of each word cost.