====== AI - LLM - What word comes next ====== Seed text -> Predict next word * Probability distribution of possible next words. ---- Read input (does not have to be purely text, could be any input including sound, images). * Helpful is input and output are real numbers; as therefore can do calculus and gradients. Lookup Table * Create a lookup table, associating each word with others words in the same context, such as in the same sentence. * Encode the position of a word in the context. * Are there nouns at specific positions? * For each word, are there any adjectives in certain positions, in front of the word? * Determine what is the probability of that word being the next word, from the list of possible words. * Use back-propagation to adjust weights. Weights = which words are associated with other words. * Bigger numbers matter more. * Smaller numbers are unrelated; matter less. * Use dot.product; as cheaper * v1 * w1 = v1w1 * v1w1 + v2w2 + v3w3 = dot.product * Measures whether this points in the same direction. * Positive value means it points (aligns). * Zero means it is perpendicular (unrelated). * Negative value means it is opposit. Activation * Use softmax * Converts weights to all be between zero and one; and add up to one. Aim for low cost; in determining the next word. * cost formula is: cost = -log(probability). * A low cost, close to zero, is where the probability is closer to one. * A low probability (i.e. not predicting the next word), increases cost very steeply. * There is a cost per word, but same applies to entire network. * Cost of entire network will be sum or avg of each word cost. ----