8
edits
Changes
no edit summary
* '''4. Decoding:''' Begin with start-of-sequence token. Model generates probabilities for the next token in each sequence and passes them through an output layer, including a softmax function that normalizes probabilities. Beam Search chooses which paths to continue following and prunes the rest.
* '''5. Finalize Output:''' Select Beam with highest probability as final translation, convert tokenized output back into a readable sentence with punctuation, capitalization, etc.
Conditional Probability Notes: