More or less. It was a string given its own token by the tokeniser because of th...

		rcxdude 11 days ago \| parent \| context \| favorite \| on: Why do LLMs freak out over the seahorse emoji? More or less. It was a string given its own token by the tokeniser because of the above, but it did not appear in the training data. Thus it basically had no meaning for the LLM (I think there are some theories that such parts of the networks associated with such tokens may have been repurposed for something else and so that's why the presense of the token in the input messed them up so much)

gpt-oss has similar bad tokens.