It can't internally rewise. The last generation produces a distribution and sometimes the wrong answer gets sampled.
There is no "backspace" token, although it would be cool and fancy if we had that.
The more interesting thing is why does it revise its mistakes. The answer to that is having training examples of fixing your own mistakes in the training data plus some RL to bring out that effect more.
There is no "backspace" token, although it would be cool and fancy if we had that.
The more interesting thing is why does it revise its mistakes. The answer to that is having training examples of fixing your own mistakes in the training data plus some RL to bring out that effect more.