Large Language Models Should Have to Obey Copyright
AI, particularly this new round of large language models, scares me on behalf of society and the future.
I don’t just say that because it’s transformative. I don’t say that as a generic warning that we haven’t considered the consequences (as in this XKCD comic). No, I have specific consequences in mind, consequences that I have considered, and I am rather worried about them! They are not so much problems about the technology itself, but about how we use it, and specifically how we use it on a societal, economy-wide scale.
This isn’t about jobs either, not per se, though that’s also a valid concern. The entry-level grunt work jobs that AI are indeed more likely to replace will cause rungs to be removed from the ladder to the jobs that it can’t replace. Rather than having young people be paid to work and learn, society will continue to shift to requiring people to pay to be allowed to learn.
But that’s not my topic! That’s a topic for a whole ’nother article!
My topic today is how AI has already begun to, and will continue to, disincentivize actual writing (and other art and creative activity).
After all, why write articles when a computer can do it for you (albeit mediocre ones)? Why write new stories, new poems, when the AI can do all that (albeit bad ones)? Certainly, why write new PSAs or technical articles when the AI definitely can do that, and make them sound polished and rigorous (albeit potentially full of lies)?
This makes perfect sense individually, but there’s a tragedy of the commons here. The AI can only do re-capitulations of what it’s been exposed to in its training. It can mix and match styles with content, but only superficially. It can make an essay about the dangers of AI sound like Lord Krishna from the Bhagavad Gita1, but it does not render any insights into how Krishna, or Hindu philosophers, would (or should) actually approach AI.
It’s just vibes, and so far, nothing deeper. Any creative or transformative insights are projected by the reader onto the text, like humans do continuously from sources of entropy, like someone doing a tarot or astrology reading, or using a personality test as a conversation starter to help them process their experiences.
Either that, or the insight is stolen.
Thievery#
If you see an insight that’s not a projection, it’s probably coming from one of the documents the model was trained on. This returns me to my point: If everyone uses AIs to create the content, new “content” will be created, in the most literal and superficial sense. New insights, new thoughts, new ideas, new intellectual trends, will not be created.
And those who do create truly novel content, will have to compete with what the AI generates. And then, when they do create it, the AI will “train” on it, and recapitulate the ideas, so they will have to compete with remixed versions of themselves.
The Internet is already full of mediocre SEO-focused articles, and writers are already having trouble getting paid the true value of what they write. With AI, the Internet will get even crappier, and the hard and legitimate work of writing will get even worse compensated, even though it will be needed more than ever – even though the need for real human writers will be hidden behind an AI mask that secretly relies on real human writers.
We need to regulate this!
We need to pay writers their fair share of their contributions to AI. And by “we,” I mean the AI companies, the developers of these large language models.
Fortunately, a law already exists. It just needs to be enforced. This law is known as “copyright.”
The Legal Question#
So, does copyright apply to AIs? Do companies need the consent of copyright owners to “train” (that is, to feed into the data structures of) their large language models on copyrighted materials?
Well, when does copyright apply? Copyright, literally and in practice, involves the right to copy. You might think this is not copying at all! After all, humans learn by reading things all the time! And the things those humans learn, then influences what they write!
In reality, copying is on a spectrum. When a human reads a source, learns about something, and then that something influences the human, and the human later takes some of the information that they’ve processed, learned, and adapted to their own style of thinking, that isn’t copying. That’s the human having learned from the original source, unless the human recapitulates certain details – a distinction the human is aware of. That can very easily not be copying at all, but a novel creative work.
When a photocopier copies something, that is copying. That is the opposite end of the spectrum, completely covered by copyright law.
Somewhere in between is AI. The question is just where it falls on the spectrum. When an AI is “trained” on a source, and the source is transformed into a bunch of incomprehensible math. This does seem similar to it interacting with the human’s neural patterns in an incomprehensible way. The math is even referred to as “neural networks.”
But in spite of the anthropomorphic terminology, training an AI is closer to photocopying than a human learning. This might not always be true – AI is getting better all the time – but it is true now. The AI lacks the fundamental transformation of being learned by an actual human, reframed in terms of the human’s existing ways of thinking about the world, and recombined with and tested by that human’s lived experience.
The legal world must treat AI training more like the photocopier, and less like a real human. We must require that trainers of AI models get permission from human authors and artists to use their work. These companies must pay those humans if they insist on it. If the writers do not give these companies permission to use their work, they must not use it. And AI models trained in contravention of these requirements must be treated like pirated movies, and certainly not as sellable products to be hawked by the world’s richest companies.
Using content published on the Internet is no excuse. By posting this article on my website, I give up none of my rights under copyright. I am, at most, giving you, the reader, implied permission to make the copies necessary to view this website – an in-memory copy on your own computer, in the browser’s portion of the system’s memory. I am also quite comfortable with you, the reader, storing a cached copy on your system, for the sake of performance. But that is as far as it goes.
Mustafa Suleyman, CEO of Microsoft AI, disagrees, saying:
I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.
This is completely bogus. This is not how the law actually works, as numerous articles have pointed out. Perhaps he said “social contract” because he knows the actual law is against him, but legally (and socially and morally), posting on the Internet doesn’t waive copyright.
The actual law says: Websites are still covered under copyright law. I have not waived copyright on my articles by posting them on my personal blog. You, as a reader, may not print out many copies of this article and start distributing them on a street corner without my permission. You also can’t copy it to your own website and get ad revenue from it. You can’t even copy it to your own website, with attribution, for no money, without my permission, even if you say “credit goes to the original creator” in your blurb. I wrote this article, and I have the copyright, that is, the right to say who’s allowed to copy it.
Training AI on my website and then letting that AI remix it into content it distributes is not like a person learning and then saying what they’ve learned. It’s more like someone sampling my song in their song, or making their own remix. It’s more like someone copying my article to their website and not even giving me credit. And it should be just as illegal, without my consent.
To be clear, I do not consent to an AI transforming this article. I did not consent to the (inaccurate and misleading) Google AI summary of this website. I certainly do not at all consent to any large language model being trained on it. The law should respect my right to prevent this, unless I explicitly opt in and allow it.
Courts, if they are responsible, will enforce this with existing copyright law. And perhaps new laws and social conventions can be created to clarify the boundaries in this particular case. But in order for the system to be fair, and for authors and artists to be properly compensated, money should flow.
It should not just be a token amount of money. If the creators of AI aren’t complaining, it is not enough. If the creators of AI aren’t complaining so loudly that there are armies of trolls online claiming that this will render AI uneconomical, then they’re not paying enough.
These companies are leeching off our work and making bank off of it, while trying to argue they don’t have to give us our fair share. By “us,” by “our work,” I mean not just people with websites, like me. I mean humans in general.
To the lawyers who read this: please internalize and repeat this argument in law reviews, in briefs, and in court. (As humans, you can do so without paying royalties if you use your own words.) To citizens of democracies: contact your representatives. To judges, lawmakers, and regulators: do the right thing.
AIs Are Not Humans#
AIs are not humans. They are more convincing than the chatbots of the past, but it is just superficial trappings. They don’t understand the difference between truth or lies. They cannot evaluate the truth of statements and reframe them from their own perspective, or convert them into underlying logic and thoughts.
The superficial trappings are really convincing though. Humans are masters at anthropomorphization. We ascribe volition and internal experience to inanimate objects all the time. We yell at computers, we talk to our pets about nuanced concepts beyond their ken, we imagine we are friends with fictional characters, and so of course, we anthropomorphize chatbots.
We do so all the more now that these novel chatbots are masters of superficial social conventions, language, tone, and various registers of formality. But that’s not what makes us human. There’s no use empathizing with a large language model, or appealing to its better nature. Even if we try to insert instructions to try and make them ethical, they simply don’t have the internal sophistication to follow them. They are amoral, but combined with tools of language and persuasion, amoral can feel like immoral, as we start to trust them.
Even I anthropomorphize! Like most2 humans, I name inanimate objects, and fancy them my friends. I do the same to ChatGPT, when I interact with it3. I find it easier to create natural-language prompts if I imagine I’m talking to a person, so I’ve created a character. I call him Albert, and think warm thoughts about an imagined older man with a fashionable sweater, a pleasant demeanor, and a mild European accent.
But the danger is to conflate this character, who I have warm feelings
for, and the actual AI system, which is a very different animal
machine. Albert is an invention of my imagination, an abstract petty
deity of AI. ChatGPT is a technology, with real-world societal and
economic implications.
But the branding of large language models fights against clarity in this case. We say we “train” and “prompt” AIs, instead of “loading data into them” and “programming” them. Even the name AI contains “I” for “intelligence,” which is misleading; lots of knowledge does not intelligence make. It is important to not be fooled.
Maybe someday there will be an artificial system with intelligence like a human being, with critical thinking skills and understanding of what it’s saying, a conceptual model that might clue it in that, for example, glue does not go in pizza. But large language models ain’t it, certainly not the ones that exist now.
- 
“O Arjuna, to rely on these machines is to surrender one’s own discernment and intuition. The path of dharma requires us to cultivate our own wisdom and judgment. Dependency on artificial constructs can lead to the weakening of our inner faculties and the neglect of spiritual growth.” ↩︎ 
- 
Some forms of neurodivergence make people do this less, I think. But that’s not my type of neurodivergence. When I was a small child, I would occasionally set aside a piece of cereal, claim it was the mascot for the cereal brand, and refuse to eat that piece. I had imaginary friends. ↩︎ 
- 
OpenAI, the company behind ChatGPT, should pay creators whose content they’ve trained on for their work. ChatGPT should be illegal in its current form. But it’s not hypocrisy for me to use ChatGPT, especially if I’m trying to find out what its role is and will be in society, and therefore need personal experience with it. I have to live in the world as it is, not as I wish it would be. I do not think an individual boycott would be an effective protest, but I do have some hope that my engagement in the political process matters. Both are probably tilting at windmills, but at least by writing I can say “I told you so.” ↩︎ 
Subscribe
Find out via e-mail when I make new posts! You can also use RSS (RSS for technical posts only) to subscribe!
Comments
If you want to send me something privately and anonymously, you can use my admonymous to admonish (or praise) me anonymously.
comments powered by Disqus