Except they know it's wrong as soon as they say it and keep trying and trying ag...

mewpmewp2 · 2025-10-06T07:56:31 1759737391

Arguably it's "hallucinating" at the point where it says "Yes, it exists". If hallucination => weights statistically indicating that something is probably true when it's not. Since everything about LLMs can be thought of as compressed, probability based database (at least to me). You take the whole truth of the World and compress all its facts in probabilities. Some truthness gets lost in the compression process. Hallucination is the truthness that gets lost since you don't have storage to store absolutely all World information with 100% accuracy.

In this case:

1. Statistically weights stored indicate Seahorse emoji is quite certain to exist. Through training data it has probably things like Emoji + Seahorse -> 99% probability through various channels. Either it has existed on some other platform, or people have talked about it enough, or Seahorse is something that you would expect to exist due to some other attributes/characteristics of it. There's 4k emojis, but storing all of 4k emojis takes a lot of space, it would be easier to store this information in such a way where you'd rather define it by attributes on how likely humankind would have developed a certain emoji, what is the demand for certain type of emoji, and seahorse seems like something that would be done within first 1000 of these. Perhaps it's anomaly in the sense that it's something that humans would have expected to statistically develop early, but for some reason skipped or went unnoticed.

2. Tokens that follow should be "Yes, it exists"

3. It should output the emoji to show it exists, but since there's no correct emoji, it will have best answers that are as close to it in meaning, e.g. just horse, or something related to sea etc. It will output that since the previous tokens indicate it was supposed to output something.

4. The next token that is generated will have context that it previously said the emoji should exist, but the token output is a horse emoji instead, which doesn't make sense.

5. Here it goes into this tirade.

But I really dislike thinking of this as "hallucinating", because hallucination to me is sensory processing error. This is more like non perfect memory recall (like people remembering facts slightly incorrectly etc). Whatever happens when people are supposed to tell something detailed about something that happened in their life and they are trained to not say "I don't remember for sure".

What did you eat for lunch 5 weeks ago on Wednesday?

You are rewarded for saying "I ate chicken with rice", but not "I don't remember right now for sure, but I frequently eat chicken with rice during mid week, so probably chicken with rice."

You are not hallucinating, you are just getting brownie points for concise, confident answers if they cross over certain likelihood to be true. Because maybe you eat chicken with rice 99%+ of Wednesdays.

When asked about capital of France, you surely will sound dumb if you were to say "I'm not really sure, but I've been trained to associate Paris really, really close to being capital of France."

"Hallucination" happens on the sweet spot where the statistical threshold seems as if it should be obvious truth, but in some cases there's overlap of obvious truth vs something that seems like obvious truth, but is actually not.

Some have rather called it "Confabulation", but I think that is also not 100% accurate, since confabulation seems a more strict memory malfunction. I think the most accurate thing is that it is a probability based database where output has been rewarded to sound as intelligent as possible. Same type of thing will happen in job interviews, group meetings, high pressure social situations where people think they have to sound confident. People will bluff that they know something, but sometimes making probability based guesses underneath.

Confabulation rather seems like that there was some clear error in how data was stored or how the pathway got messed up. But this is probability based bluffing, because you get rewarded for confident answers.

jjcob · 2025-10-06T09:37:54 1759743474

When I ask ChatGPT how to solve a tricky coding problem, it occasionally invents APIs that sound plausible but don't exist. I think that is what people mean when they talk about hallucinating. When you tell the model that the API doesn't exist, it apologises and tries again.

I think this is the same thing that is happening with the sea horse. The only difference is that the model detects the incorrect encoding on its own, so it starts trying to correct itself without you complaining first.

nomel · 2025-10-06T17:20:23 1759771223

Neat demonstration of simple self awareness.

Melatonic · 2025-10-06T16:59:54 1759769994

Associating the capital of France with a niche emoji doesn't seem similar at all - France is a huge, powerful country and a commonly spoken language.

Would anyone really think you sounded dumb for saying "I am not really sure - I think there is a seahorse emoji but it's not commonly used" ?

DonHopkins · 2025-10-06T11:42:22 1759750942

>"Yes, it exists"

AAAAAAUUUGH!!!!!! (covers ears)

https://www.youtube.com/watch?v=0e2kaQqxmQ0&t=279s

Jensson · 2025-10-06T10:51:45 1759747905

> Except they know it's wrong as soon as they say it and keep trying and trying again to correct themselves.

But it doesn't realize that it can't write it, because it can't learn from this experience as it doesn't have introspection the way humans do. A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.

I feel this sort of self reflection is necessary to be able to match human level intelligence.

ben_w · 2025-10-06T11:40:30 1759750830

> because it can't learn from this experience as it doesn't have introspection the way humans do.

A frozen version number doesn't; what happens between versions certainly includes learning from user feedback on the responses as well as from the chat transcripts themselves.

Until we know how human introspection works, I'd only say Transformers probably do all their things differently than we do.

> A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.

Humans are (like other mammals) a mess: https://en.wikipedia.org/wiki/Phantom_limb

jodrellblank · 2025-10-06T15:39:42 1759765182

Humans do that, you need to read some Oliver Sacks, such as hemispheric blindness or people who don’t accept that one of their arms is their arm and think it’s someone else’s arm, or phantom limbs where missing limbs still hurt.