Well, ChatGPT is certainly trained on the SCP Wiki. I just fed it: "Continue this prompt: Object Class: Keter" and it spat out a perfectly-formatted SCP entry. It's not impossible that it's getting "interference" from that part of the vector space.
One of the advances I'm keeping my eye out for in AI is some sort of improvement in training capability that allows us to get some sort of AI that is roughly as capable as today's AIs but don't need to have the (metaphorical) entire Internet shoveled at them to work. A non-trivial number of alignment problems could be eliminated or mitigated if the data being fed in was small enough to be more carefully curated and the misalignments could be eliminated at the source, e.g., if we didn't feed the AIs stories about AIs going rogue and taking over the world people would have a harder time wandering into a part of the vector space where the AI starts telling that story to the user. We probably don't want the SCP wiki to be in the general training set for every AI. Some of them, by all means, but probably not all of them.
I'm 99% confident there's currently multiple companies active in the "curated LLM dataset" space, where they go through heaps of data to organize them into curated datasets for just that purpose.
But it's a huge undertaking. Google had the objective of indexing all data in the world 20-odd years ago, and that's just putting it all on a big pile; curating it is an even bigger job that can only partially be automated. Compare it with social media moderation, which is a full-time job for tens- if not hundreds of thousands of people worldwide, and that's after the automated tools have had their first pass. And that's sort-of realtime, but there's 30+ years of that to go through if you want to curate a dataset (and more if you include pre-internet media)
One of the advances I'm keeping my eye out for in AI is some sort of improvement in training capability that allows us to get some sort of AI that is roughly as capable as today's AIs but don't need to have the (metaphorical) entire Internet shoveled at them to work. A non-trivial number of alignment problems could be eliminated or mitigated if the data being fed in was small enough to be more carefully curated and the misalignments could be eliminated at the source, e.g., if we didn't feed the AIs stories about AIs going rogue and taking over the world people would have a harder time wandering into a part of the vector space where the AI starts telling that story to the user. We probably don't want the SCP wiki to be in the general training set for every AI. Some of them, by all means, but probably not all of them.