What Are Transformer AI/ML Models and How Do They Work?
Great Blog on how transfomers work. They are formed by several blocks, each one with its own function, working together to understand text and generate the next word. These blocks are the following:
Tokenizer : Turns words into tokens.
Embedding : Turns tokens into numbers (vectors)
Positional encoding : Adds order to the words in the text.
Transformer block : Guesses the next word. It is formed by an attention block and a feedforward block.
Attention : Adds context to the text.
Feedforward : Is a block in the transformer neural network, which guesses the next word.
Softmax : Turns the scores into probabilities in order to sample the next word.
And, finally post training.