If GPT-4 is asked to emulate the writing style of Carmen Machado, Margaret Atwood, or Alexander Chee, it can produce a reasonably accurate rendition, thanks to the extensive consumption of their works during the training process. However, these authors, along with thousands of others, are dissatisfied with this situation.
More than 8,500 authors of fiction, non-fiction, and poetry have come together to express their concerns in an open letter. The letter specifically calls out the technology companies responsible for large language models such as ChatGPT, Bard, LLaMa, and others, accusing them of utilizing their writings without permission or compensation.
The authors assert that these technologies imitate and reproduce their language, stories, style, and ideas. They highlight the fact that millions of copyrighted books, articles, essays, and poetry serve as the “fuel” for AI systems, without any payment rendered for these literary contributions.
Despite the AI systems demonstrating the ability to quote and mimic the mentioned authors, the developers have not adequately addressed the origin of these works. Were they trained on scraped samples from bookstores and reviews? Did they borrow every book from libraries? Or did they simply acquire materials from illegal archives like Libgen?
One thing is clear: the developers did not obtain proper licenses from publishers, which is undoubtedly the preferred, and arguably the only legal and ethical, approach. As the authors emphasize in their letter:
Not only does the recent Supreme Court decision in Warhol v. Goldsmith make clear that the high commerciality of your use argues against fair use, but no court would excuse copying illegally sourced works as fair use. As a result of embedding our writings in your systems, generative AI threatens to damage our profession by flooding the market with mediocre, machine-written books, stories, and journalism based on our work.
Certainly, we have witnessed this phenomenon firsthand. Recently, several poorly crafted AI-generated works have surged in popularity on Amazon’s YA best-seller lists. Publishers are facing an overwhelming influx of generated content, and even this website (and soon, this very post) is scraped for material that is repurposed for SEO bait.
These unscrupulous individuals are utilizing tools, APIs, and agents developed by companies like OpenAI and Meta, which could themselves be considered unscrupulous actors in this context. After all, who else would knowingly appropriate millions of works to fuel a new commercial product? (Well, Google might fit that description too, but their search indexing differs significantly from AI ingestion, and Google Books had at least claimed the purpose of being a dedicated index.)
Due to the complexities and slim profit margins of large-scale publishing, fewer authors are able to sustain a livelihood from writing. This predicament is particularly dire for newer authors, especially young writers and those from under-represented communities, as highlighted in the open letter.
In response, the letter calls upon the companies to take the following actions:
? Obtain permission for use of our copyrighted material in your generative AI programs.
? Compensate writers fairly for the past and ongoing use of our works in your generative AI programs.
? Compensate writers fairly for the use of our works in AI output, whether or not the outputs are infringing under current law.
No explicit legal action is being threatened in this matter, as Mary Rasenberger, the CEO of The Author’s Guild and one of the signatories, stated that lawsuits are costly and time-consuming. However, the harmful impact of AI on authors is already evident.
It remains uncertain which company will be the first to admit, “Yes, we constructed our AI using stolen works, and we apologize for it. We will take responsibility and make amends.” Currently, there appears to be little motivation for any company to do so. The general public is largely unaware or indifferent to the fact that large language models (LLMs) are developed through what could be deemed unlawful means, and that they may contain and reproduce copyrighted works without permission. While there is some resistance when it comes to generated images imitating an artist’s distinctive style, the subtler harm caused by using all of George Saunders’ or Diana Gabaldon’s books as “fuel” for an AI may not elicit as much action, even though many authors are prepared to challenge this practice.