I would have thought that the cause is that it statistically has been trained that something like seahorse emoji should exist, so it does the tokens to say "Yes it exists, ..." but when it gets to outputting the token, the emoji does not exist, but it must output something and it outputs statistically closest match. Then the next token that is output has the context of it being wrong and it will go into this loop.
You are describing the same thing, but at different levels of explanation Llamasushi's explanation is "mechanistic / representational", while yours is "behavioral / statistical".
If we have a pipeline: `training => internal representation => behavior`, your explanation argues that the given training setup would always result in this behavior, not matter the internal representation. Llamasushi explains how the concrete learned representation leads to this behavior.
I guess what do we mean by internal representation?
I would think due to training data it's stored the likelihood of certain thing to be as emoji as something like:
1. how appealing seahorses are to humans in general - it would learn this sentiment through massive amount of texts.
2. it would learn through massive amount of texts that emojis -> mostly very appealing things to humans.
3. to some more obvious emojis it might have learned that this one is for sure there, but it couldn't store that info for all 4,000 emojis.
4. to many emojis whether it exists it has the shortcut logic to: how appealing the concept is, vs how frequently something as appealing is represented as emoji. Seahorse perhaps hits 99.9% likelihood there due to strong appeal. In 99.9% of such cases the LLM would be right to answer "Yes, it ...", but there's always going to be 1 out of 1,000 cases where it's wrong.
With this compression it's able to answer 999 times out of 1000 correctly "Yes, it exists ...".
It could be more accurate if it said "Seahorse would have a lot of appeal for people so it's very likely it exists as emoji since emojis are usually made for very high appeal concepts first, but I know nothing for 100%, so it could be it was never made".
But 999 cases, "Yes it exists..." is a more straightforward and appreciated answer. The one time it's wrong, is going to take away less brownie points than 999 short confident answers give over the 1000 technically accurate but non confident answers.
But even the above sentence might not be the full truth. Since it might not be correct about truly why it has associated seahorse to be so likely to exist. It would just be speculating on it. So maybe it would be more accurate "I expect seahorse emoji to likely exist, maybe because of how appealing it is to people and how emojis usually are about appealing things".