====== AI - LLM - What word comes next ======

Seed text -> Predict next word

  * Probability distribution of possible next words.

----

Read input (does not have to be purely text, could be any input including sound, images).

  * Helpful is input and output are real numbers; as therefore can do calculus and gradients. 


Lookup Table

  * Create a lookup table, associating each word with others words in the same context, such as in the same sentence.
    * Encode the position of a word in the context.
    * Are there nouns at specific positions?
    * For each word, are there any adjectives in certain positions, in front of the word?
  * Determine what is the probability of that word being the next word, from the list of possible words.
  * Use back-propagation to adjust weights.


Weights = which words are associated with other words.

  * Bigger numbers matter more.
  * Smaller numbers are unrelated; matter less.
  * Use dot.product; as cheaper
    * v1 * w1 = v1w1
    * v1w1 + v2w2 + v3w3 = dot.product
      * Measures whether this points in the same direction.
        * Positive value means it points (aligns).
        * Zero means it is perpendicular (unrelated).
        * Negative value means it is opposit.

Activation

  * Use softmax
    * Converts weights to all be between zero and one; and add up to one.


Aim for low cost; in determining the next word.

  * cost formula is:  cost = -log(probability).
    * A low cost, close to zero, is where the probability is closer to one.
    * A low probability (i.e. not predicting the next word), increases cost very steeply.
  * There is a cost per word, but same applies to entire network.
    * Cost of entire network will be sum or avg of each word cost.


----