When hallucinations become books
On explosive growth in LLM hallucinations, the long tail on Amazon, and the hazards of low-cost creators getting inside information about what fake texts to generate
A rather dire story in Scientific American reports an estimate from the Library of Virginia that “15 percent of emailed reference questions it receives are now ChatGPT-generated, and some include hallucinated citations for both published works and unique primary source documents”. These hallucinations send research librarians on unproductive wild goose chases, not to mention misleading the many people who don’t even bother to check for the validity of primary sources.
■ Mar Hicks, who writes about the history of technology, laments that “I’ve already gotten multiple emails from people asking me for fake articles and books that I’ve supposedly written, because chatbots have told them fake references”. To be optimistic about the future of human intelligence requires giving serious consideration to what the worst-case scenario might be, should these patterns continue.
■ It’s easy to conjure the following path: First, a real person with an honest question submits that question to a large language model (Gemini, ChatGPT, Llama, Grok, or any other on the list).
■ Next, that LLM hallucinates a reference. (Without sufficient safeguards, this seems to happen quite a lot.) The real person, believing the reference to exist, searches for it.
■ Here’s where the situation goes truly sideways: If the LLM is tied to a profit-seeking firm, it might capture the request, measuring it as a demand signal. Without appropriate safeguards in place, a purely profit-seeking LLM might then synthesize a book-length text to profit from the apparent demand. (It’s not merely a forecast: AI-generated books already show up on Amazon.) It’s a perfect long-tail play: The costs of production are extremely small, and synthesized texts can be sold at prices much lower than those at which human writers could compete.
■ Finally, the synthesized text makes its way into circulation, generating profits for those who disregard the consequences of contaminating the world’s body of knowledge while crowding out the efforts of real human thinkers. This is a huge problem if readers don’t place a premium on the quality of the publisher. And what are the odds that users depending on LLMs for first-order research are going to do that?
■ Market signals are tremendously useful: A good publisher would be extremely eager to have demand data of the type described here, so they could commission real authors to satisfy the demand.
■ But the widespread abuse of LLMs should give rise to serious reservations about them having the same data: The down-side consequences for the scholarly integrity of real knowledge are downright dire. Rarely are any right answers self-evident in the face of systematically complex problems like this one.
■ We must grapple with it immediately and head-on: We still face persistent and deadly damage traceable to just one example of malicious fabrication, unleashed on the world in 1903. Contemporary technologies make it possible to amplify the same kind of malice at an incomprehensible scale.



