Artificial intelligence and organic source material

On election results, John Stuart Mill, and the inverse tragedy of the commons that underlies the training of artificial intelligence

Nov 18, 2024

A change to the terms of service on the platform formerly known as Twitter has taken effect, granting the company permission (in its words) to "analyze text and other information you provide and to otherwise provide, promote, and improve the Services, including, for example, for use with and training of our machine learning and artificial intelligence models", a permission not explicitly discussed in the previous edition of the terms.

■ That change is roughly coincidental with the results of a Presidential election in which X/Twitter's owner has played a prominent and possibly manipulative role, and the unpopularity of the recent activity has triggered a significant exodus from X/Twitter to competing platform Bluesky. It's unlikely that anyone will successfully tease out how much of the exodus is due to the behavior and how much is due to the policy change, but both are certainly contributory.

■ Content-scraping of any platform in order to train artificial intelligence is sketchy behavior at best. It's already been done by lots of rivals in the artificial intelligence marketplace -- X/Twitter is merely one among several. But it does have a captive data set, which makes the AI training seem particularly targeted at the incumbent users.

■ We've never grappled before with what happens when the whole world has equal access to the same publishing platforms. There was a time not long ago when everyone knew who had published copyright-protected content and who had not: If you didn't have a book on the shelf of a library or a published article in a newspaper or a magazine, then your thoughts hadn't really reached a domain that ever would have been accessible to use for training.

■ The instinct to object to having one's content used to train someone else's computer model is understandable. But there's an uncomfortable inversion to the problem: If all people of goodwill withdraw their ideas from the data sets, then what's left behind will only be a concentrated collection of the bad. What would a library be if John Stuart Mill was omitted while Karl Marx remained?

■ It's a terrible Catch-22. Opting in (to allowing one's content to be used for AI training) means implicitly rewarding bad behavior. Opting out means that what's left behind has a higher concentration of the bad. And it's become evident that artificial intelligence tools are here to stay.

■ Thus what we have is like an inverted tragedy of the commons: What matters to artificial intelligence is not how much of it is "consumed" (since the whole point is that it is an effectively inexhaustible source of output), but how much it can draw from raw inputs that are, on balance, good for society. That those inputs are intermediated by actors who may themselves be of dubious character makes the whole matter even more complicated. It is a complex dynamic without an easy solution.

Discussion about this post

Ready for more?