> I have considerable doubts as to whether this is a substantial problem for current or near-future LLMs
Why so? I am of the opinion that the problem is much worse than that, because the ignorance and detachment from reality that is likely to be reflected in more refined LLMs is that of the general population - creating a feedback machine that doesn’t drive unstable people into psychosis like the LLMs of today, but instead chips away at the general public’s already limited capacity for rational thinking.
Or if they do, it's anecdotal or wrong. Worse, they say it with confidence, which the AI models also do.
Like, I'm sure the models have been trained and tweaked in such a way that they don't lean into the bigger conspiracy theories or quack medicine, but there's a lot of subtle quackery going on that isn't immediately flagged up (think "carrots improve your eyesight" lvl quackery, it's harmless but incorrect and if not countered it will fester)
Because actual mentally disturbed people are often difficult to distinguish from the internet's huge population of trolls, bored baloney-spewers, conspiracy believers, drunks, etc.
And the "common sense / least hypothesis" issues of laying such blame, for profoundly difficult questions, when LLM technology has a hard time with the trivial-looking task of counting the r's in raspberry.
And the high social cost of "officially" blaming major problems with LLM's on mentally disturbed people. (Especially if you want a "good guy" reputation.)
Does it matter whether they are actually mentally disturbed, trolls, etc when the LLMs treat it all with the same weight? That sounds like it makes the problem worse to me, not a point that bolsters your view.
Click the "parent" links until you see this exchange:
>> ...Bing felt like it had a mental breakdown...
> LLMs have ingested the social media content of mentally disturbed people...
My point was that formally asserting "LLMs have mental breakdowns because of input from mentally disturbed people" is problematic at best. Has anyone run an experiment, where one LLM was trained on a dataset without such material?
Informally - yes, I agree that all the "junk" input for our LLMs looks very problematic.
But for purposes of understanding the real-world shortcomings and dangers of LLMs, and explaining those to non-experts - oh Lordy, yes.