The push to expand repressive copyright laws because machines can learn from human produced text, code and art is going to hurt us all in the long run.
People usually say contemporary media sucks because of commercial pressures, but those commercial pressures and conditions wouldn't exist without the expansion of copyright.
Yes, giant studios are struggling to introduce new ideas like 1993's Jurassic Park. But that doesn't mean Shane Carruth (of Primer fame) can't. And he could have if Jurassic Park had been released any time between 1790 and 1900.
Our stilted media landscape is directly downstream of prior legislation expanding copyright.
Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.
The "infringement" in this case was a diamond encrusted Steamboat Willie style Mickey pendant.
Questionable taste aside, I think it's good for society if people are able to make diamond encrusted miniature sculptures of characters from a 1928 movie in 2025. But Disney clearly disagrees.
Disney (and other giant corps) will use every tool in their belt to go after anyone who comes close to their money makers. There has been a long history of tension between artists and media corps. But that's water under the bridge now. AI art is apparently so bad that artists are willing to hand them the keys to their castle.
> Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.
Legal doctrines like the "Abstraction-Filtration-Comparison test", "total concept and feel," "comprehensive non-literal similarity," and "sequence, structure and organization" have systematically ascended the abstraction ladder. Copyright no longer protects expression but abstractions and styles.
The ugly part is the asymmetry at play - a copyright holder can pick and choose the level of abstraction on which to claim infringement, while a new author cannot possibly avoid all similarities on all levels of abstraction for all past works. The accuser can pick and choose how to frame infringement, the accused has to defend from all possible directions.
> Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.
This made me wonder about an alternate future timeline where IP law is eventually so broad and media megacorporations are so large that almost any permutation of ideas, concepts or characters could be claimed by one of these companies as theirs, based on some combination of stylistic similarities and using a concept similar to what they have in their endless stash of IP. I wonder what a world like that would look like. Would all expression be suppressed and reduced to the non-law-abiding fringes and the few remaining exceptions? Would the media companies mercifully carve out a thin slice of non-offensive, corporate-friendly, narrow ideas that could be used by anyone, putting them in control of how we express ourselves? Or would IP violation become so common that paying an "IP tax" be completely streamlined and normalized?
The worst thing is that none of this seems like the insane ramblings that it would've probably been several decades ago. Considering the incentives of companies like Disney, IP lawyers and pro-copyright lawmakers, this could be a future we get to after a long while.
> People usually say contemporary media sucks because of commercial pressures, but those commercial pressures and conditions wouldn't exist without the expansion of copyright.
I think this is a pretty bold assertion. Copyright protection exists because of what you call "commercial pressures", and what I would call "the desire of content producers to pay their bills". Sure, it leads to self-reinforcing pathologies that seek to expand the scope of the protections, but for every Disney, there are millions of small-scale creators who get to make a living because there are at some legal hinderances to third parties selling copies of their music, books, and so forth.
I don't think we can assume that if copyright did not exist, we'd live in an utopia where all the same content is still available and we get some additional liberties to write Mickey Mouse erotica. More likely, we'd see a significant drop in certain types of creative activity, because in the absence of royalties, you need a wealthy patron to pay your bills, and wealthy patrons are in short supply. I'd also wager that media empires would still be built, just structured around barriers less pleasant than copyright. A Disney-operated cinema with metal detectors and patdowns for all guests. Maybe a contract you need to sign to enter, too.
> there are millions of small-scale creators who get to make a living because there are at some legal hinderances to third parties selling copies of their music, books, and so forth.
They may be benefiting from copyright’s existence, but with rare exceptions they are not benefiting from its expansion, which is the topic in what you were responding to. And its expansion probably harms them.
Abolishing copyright altogether is almost never what is being proposed, though some will suggest tearing it all down to replace it with something much more restricted, often more like the Statute of Anne.
There is a weird interaction here and I am not a ip lawyer and do not know how to resolve it But I will try to explain.
There is steamboat willie(the work) it is in public domain and you are able to distribute it and modified copied of it freely, the weird interaction is where and how it conflicts with mickey mouse which is still a registered trademark of disney. And items bearing that mark are protected as such.
So I think legally to distribute a variant of steamboat willie you would also have to prove how your use of mickey mouse does not infringe on disney's trademark. You have to show how your product can not be confused with a disney product. Put a big "Not a disney product" on the back?
> AI art is apparently so bad that artists are willing to hand them the keys to their castle.
Because - believe or not - a lot of artists benefit from Disney and other giants. By directly getting hired by them, building social media followers with fanart of their IP, taking questionable commissions of their characters, etc.
Is this a fair and healthy relationship? Perhaps not. But it's indefinitely better than what "AI artists" brought to human artists.
Of course Disney is not artists' friend and we all know what will happen: artists will end up being squeezed from the both sides, AI and big IP holders (who deploy their own AIs while suing open weights) anyway.
Corporates can't have it both ways - the Hollywood corporates lobbied intensively to extend copyright for as long as 75+ years (if I recall right) because that's what would benefit them. Many have protested about this. Some tech corporates (namely search and AI companies) now feel encumbered by this, and even indulge in piracy to circumvent copyright (without any meaningful consequences), and we are now supposed to feel sorry for them? Are any of these Tech corporates also lobbying for changes to copyright laws? (I don't believe so, as many of them are now also trying to become media moghuls themselves!)
> The push to expand repressive copyright laws because machines can learn from human produced text, code and art is going to hurt us all in the long run.
Exactly. I always thought it was hilarious that, ever since LLMs and image generators like Stable Diffusion came online a few years ago, HN suddenly seemed to shift from the hacker ethos, of moving fast and breaking things, and using whatever you could for your goals, to one of being an intense copyright hawk, all because computers could now "learn."
While Chinese models train on all Western cultural output, our own models are restricted. And in the corporate world the models of choice for finetuning are DeepSeek and Qwen, wonder why.
The implication here of course that if we allow AI to be taken down by copyright then it could also take down Wikipedia. I am not even sure this is close to being true despite the article trying to suggest otherwise.
Perhaps a section on what the differences are might be helpful. For example what role does style play in the summary. I dont think that the summary of wiki is in the style of George R Martin.
I'm confused. There's an entire paragraph in the article where the author compares the two summaries and finds that they differ only in their structuring. I can't find any part of the article saying that the LLM summary was written "in the style of George R.R. Martin", as far as I understand both summaries are conceptually very similar. That's the main problem. If the scope of substantial similarity to a novel is pushed down from hundreds of pages of writing to a summary that's a couple paragraphs long, then all these summaries are in potential danger. To my knowledge there's no criteria that lets you only find LLM summaries infringing without leaving an opening for the lawyers to expand the reach to target all summaries of copyrighted content.
Even if true wiki would escape via fair use and AI would not.
It is possible that the laws and judgements are inconsistent nonsense but assuming they are not the fact that wiki has been around for decades suggests at least one key difference.
Just because Wikipedia has persisted for 20+ years doesn't mean that a key decision later down the line can't make it into an open season for all IP owners. AI-related lawsuits are a great opportunity for copyright owners to greatly shake up the status quo under the (fairly legitimate) guise of protecting themselves from LLM copying. Even if Wikipedia in particular could skirt it through fair use, the fact that hundred-word long summaries would be found "similar" to full novels would represent a large encroachment of copyright that would allow many other lawsuits to open up with entities who may not be as lucky as Wikipedia. Changing the answer to "Is something as brief as this notably similar to a full work?" from "what? Of course not" to "well... do you have a fair use reason?" would mean that many people will need to start looking both ways and triple-checking whatever they create/summarize/report on as to avoid tipping off anyone hungry for some settlement money.
Yes, I think the crux of the matter is what constitutes fair use and what doesn't. And I would say that a summary in an encyclopedia article about a copyrighted work is not only fair use, but also in the work's own interest, while being ingested and regurgitated by an LLM isn't, so... The article only mentions that twice, in passing.
I think LLM summarizing a work would largely meet the traditional test.
Im also not sure why llm summarizing the work wouldnt be in the interest of the work. It seems like it would to me to the same extent a wikipedia summary would be.
No, it’s more fundamental than fair use. Fair use is a defence to a copyright infringement. It’s an argument that, yes, I did violate the copyright of the creator, but my violation should be allowed.
The issue here is, what are creators allowed to own and control. It’s about the fundamental question of what copyright gives the creators control over.
Does creating a picture mean you own the right to its description? And you have the right to prevent anyone else from describing your picture?
Does creating a movie give you the right to its novelization? I think the answer would be yes. With that in mind why wouldn't creating a picture give you the right to its description (subject to fair use)
In general i think fair use serves as a good balance for these types of questions.
Are you religious? If not, you should assume that your cognition is a product of your body, a magnificent machine.
I don't think LLMs are sapient, but your argument implies that creativity is something unique to humans and therefore no machine can ever have it. If the human body IS a machine, this is a contradiction.
Now, there's a very reasonable argument to be made concerning the purpose of copyright law, but "machines can't be creative" isn't it.
Creativity is not unique to humans, but legal rights to protect creativity is unique to humans (or human-represented organizations). Humans are always special case in law.
Selling human livers and selling cow livers are never treated the same in terms of legality. Even the difference between your liver and that of a cow is much, much smaller than the difference between your brain and Stable Diffusion. I'm sure there isn't single biochemical reaction that is unique to humans.
It was ruled that our Copyright Law does require that a Human create the work, and that only a Human can hold copyright. The monkey was not given copyright over the image it took.
Monkeys obviously can be creative. However, our law has decided that human creativity is protected by copyright, and human creativity is special within the law. I don't see any contradictions or arguments about sapience that are relevant here.
The issue becomes there's little to no way to tell the difference between the two.
Additionally, if human summaries aren't copyright infringement, you can train LLMs on things such as the Wikipedia summaries. In this situation, they're still able to output "mechanical" summaries - are those legal?
> The issue becomes there's little to no way to tell the difference between the two.
If you and I write the exact same sentence, but we can prove that we did not know each other or have inspiration from each other, we both get unique copyright over the same sentence.
It has never been possible to tell the copyright status of a work without knowing who made it and what they knew when they made it.
Also, the human produced summary is likely to have been produced by people who have read purchased books (i.e. legally distributed) whereas the algorithmic production of a summary has probably been fed dubious copies of books.
Also there is fair use gray area. Unlike Wikipedia, ClosedAI is for profit to make money from this stuff and people using generated text do it for profit.
The ruling never said summaries are infringing. It just said the authors’ claims about some AI outputs were "plausible" enough to get past a motion to dismiss, which is basically the lowest hurdle. The judge isn’t deciding what actually counts as infringement, just that the case can move forward. IMHO the title of the article is reading more into the opinion than what the judge actually decided.
The author already fully addressed this in the article. They just think that even the fact that this was allowed to move forward is a worrying sign:
> Judge Stein’s order doesn’t resolve the authors’ claims, not by a long shot. And he was careful to point out that he was only considering the plausibility of the infringement allegation and not any potential fair use defenses. Nonetheless, I think this is a troubling decision that sets the bar on substantial similarity far too low.
From what I understood, the case against OpenAI wasn't about the summarisation. It was the fact that the AI was trained on copyrighted work. In case of Wikipedia, the assumption is that someone purchased the book, read it, and then summarised it.
That doesn't really make sense . Just because you purchased a book, does not mean the copyright goes away (for new works based on the book. For the physical book you bought, the doctrinevof first sale gives you some rights but only in that specific physical copy ). If openAI pirated material, that would be a separate issue from if the output of the LLM is infringing.
They’re sort of separate. In a sense you could say that the ChatGPT model is a lossily compressed version of its training corpus. We acknowledge that a jpeg of a copyrighted image is a violation. If the model can recite Harry Potter word for word, even imperfectly, this is evidence that the model itself is an encoding of the book (among other things).
You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc, but a transformer model is not human, and very philosophically and economically importantly, human brains can’t be copied and scaled.
They're very separate in terms of what seems to have happened in this case. This lawsuit isn't about memory or LLMs being archival/compression software (imho, a very far reach) or anything like that. The plaintiffs took a bit of text that was generated by ChatGPT and accused OpenAI of violating their IP rights, using the output as proof. As far as I understand, the method at which ChatGPT arrived to the output or how Game of Thrones is "stored" within it is irrelevant, the authors allege that the output text itself is infringing regardless of circumstance and therefore OpenAI should pay up. If it's eventually found that the short summary is indeed infringing on the copyright of the full work, there is absolutely nothing preventing the authors (or someone else who could later refer to this case) from suing someone else who wrote a similar summary, with or without the use of AI.
> You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc
Also worth noting that, if a person performs a copyrighted work from memory - like a poem, a play, or a piece of music - that can still be a copyright violation. "I didn't copy anything, I just memorized it" isn't the get-out-of-jail-free card some people think it is.
I would guess that if there were a court case where a poet sued someone commercially that is for pay(say tickets specifically for it) reciting his poetry they might very well win. So reciting poetry probably could be copy right infringement at certain scale.
And as AI companies are commercial entities. I would lean towards direction where they doing it in general, even if not for repeating specific works, it could be infringement too.
A jpeg of a copyrighted image can be copyright infringement, but isn't necessarily. A trained model can be copyright infringement, but isn't necessarily. A human reciting poetry can be copyright infringement, but isn't necessarily.
The means of reproduction are immaterial; what matters is whether a specific use is permitted or not. That a reproduction of a work is found to be infringing in one context doesn't mean it is always infringing in all contexts; conversely, that a reproduction is considered fair use doesn't mean all uses of that reproduction will be considered fair.
There is a world of difference between a corporation ingesting original works for the purposes of automatically, at scale generating derivative works for profit and a community of unpaid human volunteers working for a non-profit maintaining a public benefit encyplopedia.
Saying that merely denying the motion to dismiss claims that ChatGPT outputs infringed the rights of authors such as George R.R. Martin and David Baldacci is a “fundamental assault” on the idea-expression distinction as applied to works of fiction, and especially that it puts Wikipedia in crosshairs is beyond a stretch.
A motion to dismiss amounts is saying “come on, that’s ridiculous”. Denying the motion says “no, it’s not ridiculous; we may still decide it’s wrong, but it’s not ridiculous”.
If you consider it outrageous, as OP does (and I’m inclined to agree), the fact that the judge isn’t willing to laugh it out of court is an assault on your interpretation of copyright law.
Entertaining that the article about copyright-infringing similarity of AI-generated summaries is illustrated with a picture of an animated skeleton labelled "White Walker", which is neither what White Walkers are nor what they look like.
If a piece of information can be produced and consumed using general purpose computers, then there's no good reason for it to be made scarce under copyright. It should have a reciprocal copyleft license like AGPL or CC-BY-SA instead. This goes for digital drawings, writings, source code, AI model weights, and pretty much everything on the Internet. Forcing copyright on information that don't need to be scarce just creates extra problems.
--- a rant about using illustatration in articles ---
I open article. I see am image of a skeleton staring at a tablets back.
I am left to wonder what author meant by that metaphor.
The text after image states that "a white walker id reading Wikipedia" but I can plainly see that its not a white walker, the tablet is turned away from him, and the styling of the site is paroding New York Times.
After reading the article I still do not understand what this image is meant to communicate. And then I suddenly remember that alt text exist, and surely the text description would be useful to understand the intentions of the author. There isn't any alt description. And the bottom text is lying.
As I stare at the image looking at the teeth fused and ages lines on the skull, I suddenly understand that I might be the only person who looked at the image for more than 2 seconds.
Which is shite, because images are such a good way to communicate complex ideas simply or to illustrate a point. I shouldn't be expected to just skip it entirely.
Yet all the time I spend trying to understand it was a time wasted.
You don't have to put illustrations in, if you don't know what you want them to mean! You especially don't have to put illustrations that have "high effort" appearance to them, because people would assume that it MEANS to illustrate something!
And if you choose to generate something, it doesn't sound as a bad idea to check if that image conveys what you want to convey.
I love illustrations. I love pictures. There are great illustrations in my favorite manuals.
It's saddening that people nowdays choose to add illustrations that create confusion, not even because better illustrations would be preferable, but because no illustration would be preferable to that.
This is my favorite article on HN since the one on solar panels in Africa. Love to see a subject matter expert making a case at the bleeding edge of their field.
Honestly, i always thought this was how it always worked. A summary is by neccesisty a derrivative of the thing being summarized, but it is also very vert clearly fair use. Its transformational, its for an educational purpose, it contains only a tiny portion of the original work and it does not compete with the original work. I can't imagine anything more fair use then that.
Look at that useless AI generated image "A white walker in a desolate field reading Wikipedia (an AI Image by Gemini)." It's not reading Wikipedia, it's staring at the back of a tablet.
Absurdity aside, in thie specifc example Wikipedia doesn’t cite any sources.
The article (in English, anyway) is a summary of the plot of the book and there is not a footnote nor any external reference - and why would there be? It’s a summary of the plot, not a commentary or a critique of it. In this case, there’s no need to cite a source. https://en.wikipedia.org/wiki/A_Game_of_Thrones
That image caption says "A white walker in a desolate field reading Wikipedia", but the (backwards for some reason) Wikipedia article says "White Waleers". Forgive me for thinking this person might not have the necessary braincells to commentate on legal issues.
I like that the author saw a cartoon of a skeleton looking at the back of a tablet and thought “this is good enough to describe as a white walker reading Wikipedia”
"AI" keeps destroying free sources of information.
First it was library genesis and z-lib when meta torrented 70TB of books and then pulled off the ladder, recently it was Anna's archive and how they are coming for it (google and others), weird behaviors with some other torrent sites, now also Wikipedia is being used as a tool to defend LLMs breaking any semblance of copyright "law" unpunished.
All these actions will end up with very bad repercusions once the bubble bursts, there will be a lot of explaining to do.
Yeah, it does matter, though the issue is not exactly just monetary profit. The fundamental problem is OpenAI has made the GPT model weights artificially scarce. But at the same time they claim that other artificially scarce information such as books should not be scarce and instead belong to the intellectual commons. The latter part which I agree with, but they took from the commons and are claiming what they took as exclusively their own. That is just evil.
There would be no problem if they open-sourced everything including the model weights. That was their original mission which they have abandoned.
Another fundamental difference: OpenAI explicitly markets their tool as a replacement for the copyrighted material it was trained on. This is most explicit for image generation, but applies to text as well.
As a reminder, the 4 factors of "fair use" in the United States:
1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
2. the nature of the copyrighted work;
3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
4. the effect of the use upon the potential market for or value of the copyrighted work.
>> Every year, I ask students in my copyright class why the children’s versions of classic novels in Colting were found to be infringing but a Wikipedia summary of the plots of those same books probably wouldn’t be.
Not a lawyer, but the answer seems to obviously be that one is a commercial reproduction and the other is not. Seems like it would be a tougher questiom if the synopsis was in a set of Encyclopedia Britannica or something.
AI is clearly reproducing work for commercial purposes... ie reselling it in a new format. LLMs are compression engines. If I compress a movie into another format and sell DVDs of it, that's a pretty obvious violation of copyright law. If I publish every 24th frame of a movie in an illustrated book, that's a clear violation, even if I blur things or change the color scheme.
If I describe to someone, for free, what happened in a movie, I don't see how that's a violation. The premise here seems wrong.
Something else: Even a single condensation sold for profit only creates one new copyright itself. LLMs wash the material so that they can generate endless new copyrighted material that's derivative of the original. Doesn't that obliterate the idea of any copyright at all?
Good guess, but no. The most salient difference in that case is that an abridged children's version of a novel acts as a direct market substitute for the original, whereas a plot summary does not. (A secondary reason is that an abridged edition is likely to represent a much larger portion of the original work than would appear in a summary.)
Consider this - if I wanted to read A Game of Thrones, then I would read A Game of Thrones, not some bootleg LLM approximation. It is faster, more exact an cheaper to infringe by copying, a LLM is a terrible tool for infringement, it is slow, expensive and doesn't actually reproduce perfectly. The fact that some are using AI means they want something different, not the original.
Yes of course as a reader you would read the original. The major infringement isn't the LLM directly spitting out parts of the book to an end user (a reader). It occurs when the LLM injects parts of the book into a new text, without attribution, which some other writer will go on and sell thousands of copies of. The LLM acts as the washing machine.
The push to expand repressive copyright laws because machines can learn from human produced text, code and art is going to hurt us all in the long run.
People usually say contemporary media sucks because of commercial pressures, but those commercial pressures and conditions wouldn't exist without the expansion of copyright.
Yes, giant studios are struggling to introduce new ideas like 1993's Jurassic Park. But that doesn't mean Shane Carruth (of Primer fame) can't. And he could have if Jurassic Park had been released any time between 1790 and 1900.
Our stilted media landscape is directly downstream of prior legislation expanding copyright.
Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.
Even though Steamboat Willie has entered the public domain, Disney has been going after folks using the IP, https://mickeyblog.com/2025/07/17/disney-is-suing-a-hong-kon... / https://mickeyblog.com/2025/07/17/disney-is-suing-a-hong-kon...
The "infringement" in this case was a diamond encrusted Steamboat Willie style Mickey pendant.
Questionable taste aside, I think it's good for society if people are able to make diamond encrusted miniature sculptures of characters from a 1928 movie in 2025. But Disney clearly disagrees.
Disney (and other giant corps) will use every tool in their belt to go after anyone who comes close to their money makers. There has been a long history of tension between artists and media corps. But that's water under the bridge now. AI art is apparently so bad that artists are willing to hand them the keys to their castle.
> Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.
Legal doctrines like the "Abstraction-Filtration-Comparison test", "total concept and feel," "comprehensive non-literal similarity," and "sequence, structure and organization" have systematically ascended the abstraction ladder. Copyright no longer protects expression but abstractions and styles.
The ugly part is the asymmetry at play - a copyright holder can pick and choose the level of abstraction on which to claim infringement, while a new author cannot possibly avoid all similarities on all levels of abstraction for all past works. The accuser can pick and choose how to frame infringement, the accused has to defend from all possible directions.
> Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.
This made me wonder about an alternate future timeline where IP law is eventually so broad and media megacorporations are so large that almost any permutation of ideas, concepts or characters could be claimed by one of these companies as theirs, based on some combination of stylistic similarities and using a concept similar to what they have in their endless stash of IP. I wonder what a world like that would look like. Would all expression be suppressed and reduced to the non-law-abiding fringes and the few remaining exceptions? Would the media companies mercifully carve out a thin slice of non-offensive, corporate-friendly, narrow ideas that could be used by anyone, putting them in control of how we express ourselves? Or would IP violation become so common that paying an "IP tax" be completely streamlined and normalized?
The worst thing is that none of this seems like the insane ramblings that it would've probably been several decades ago. Considering the incentives of companies like Disney, IP lawyers and pro-copyright lawmakers, this could be a future we get to after a long while.
> People usually say contemporary media sucks because of commercial pressures, but those commercial pressures and conditions wouldn't exist without the expansion of copyright.
I think this is a pretty bold assertion. Copyright protection exists because of what you call "commercial pressures", and what I would call "the desire of content producers to pay their bills". Sure, it leads to self-reinforcing pathologies that seek to expand the scope of the protections, but for every Disney, there are millions of small-scale creators who get to make a living because there are at some legal hinderances to third parties selling copies of their music, books, and so forth.
I don't think we can assume that if copyright did not exist, we'd live in an utopia where all the same content is still available and we get some additional liberties to write Mickey Mouse erotica. More likely, we'd see a significant drop in certain types of creative activity, because in the absence of royalties, you need a wealthy patron to pay your bills, and wealthy patrons are in short supply. I'd also wager that media empires would still be built, just structured around barriers less pleasant than copyright. A Disney-operated cinema with metal detectors and patdowns for all guests. Maybe a contract you need to sign to enter, too.
> there are millions of small-scale creators who get to make a living because there are at some legal hinderances to third parties selling copies of their music, books, and so forth.
They may be benefiting from copyright’s existence, but with rare exceptions they are not benefiting from its expansion, which is the topic in what you were responding to. And its expansion probably harms them.
Abolishing copyright altogether is almost never what is being proposed, though some will suggest tearing it all down to replace it with something much more restricted, often more like the Statute of Anne.
There is a weird interaction here and I am not a ip lawyer and do not know how to resolve it But I will try to explain.
There is steamboat willie(the work) it is in public domain and you are able to distribute it and modified copied of it freely, the weird interaction is where and how it conflicts with mickey mouse which is still a registered trademark of disney. And items bearing that mark are protected as such.
So I think legally to distribute a variant of steamboat willie you would also have to prove how your use of mickey mouse does not infringe on disney's trademark. You have to show how your product can not be confused with a disney product. Put a big "Not a disney product" on the back?
> AI art is apparently so bad that artists are willing to hand them the keys to their castle.
Because - believe or not - a lot of artists benefit from Disney and other giants. By directly getting hired by them, building social media followers with fanart of their IP, taking questionable commissions of their characters, etc.
Is this a fair and healthy relationship? Perhaps not. But it's indefinitely better than what "AI artists" brought to human artists.
Of course Disney is not artists' friend and we all know what will happen: artists will end up being squeezed from the both sides, AI and big IP holders (who deploy their own AIs while suing open weights) anyway.
Corporates can't have it both ways - the Hollywood corporates lobbied intensively to extend copyright for as long as 75+ years (if I recall right) because that's what would benefit them. Many have protested about this. Some tech corporates (namely search and AI companies) now feel encumbered by this, and even indulge in piracy to circumvent copyright (without any meaningful consequences), and we are now supposed to feel sorry for them? Are any of these Tech corporates also lobbying for changes to copyright laws? (I don't believe so, as many of them are now also trying to become media moghuls themselves!)
> The push to expand repressive copyright laws because machines can learn from human produced text, code and art is going to hurt us all in the long run.
Exactly. I always thought it was hilarious that, ever since LLMs and image generators like Stable Diffusion came online a few years ago, HN suddenly seemed to shift from the hacker ethos, of moving fast and breaking things, and using whatever you could for your goals, to one of being an intense copyright hawk, all because computers could now "learn."
It's a moot point, at least as far as AI is concerned, because nobody in China gives a mouse's behind about any of this.
Nor should they.
While Chinese models train on all Western cultural output, our own models are restricted. And in the corporate world the models of choice for finetuning are DeepSeek and Qwen, wonder why.
The implication here of course that if we allow AI to be taken down by copyright then it could also take down Wikipedia. I am not even sure this is close to being true despite the article trying to suggest otherwise.
Perhaps a section on what the differences are might be helpful. For example what role does style play in the summary. I dont think that the summary of wiki is in the style of George R Martin.
I'm confused. There's an entire paragraph in the article where the author compares the two summaries and finds that they differ only in their structuring. I can't find any part of the article saying that the LLM summary was written "in the style of George R.R. Martin", as far as I understand both summaries are conceptually very similar. That's the main problem. If the scope of substantial similarity to a novel is pushed down from hundreds of pages of writing to a summary that's a couple paragraphs long, then all these summaries are in potential danger. To my knowledge there's no criteria that lets you only find LLM summaries infringing without leaving an opening for the lawyers to expand the reach to target all summaries of copyrighted content.
Even if true wiki would escape via fair use and AI would not. It is possible that the laws and judgements are inconsistent nonsense but assuming they are not the fact that wiki has been around for decades suggests at least one key difference.
Just because Wikipedia has persisted for 20+ years doesn't mean that a key decision later down the line can't make it into an open season for all IP owners. AI-related lawsuits are a great opportunity for copyright owners to greatly shake up the status quo under the (fairly legitimate) guise of protecting themselves from LLM copying. Even if Wikipedia in particular could skirt it through fair use, the fact that hundred-word long summaries would be found "similar" to full novels would represent a large encroachment of copyright that would allow many other lawsuits to open up with entities who may not be as lucky as Wikipedia. Changing the answer to "Is something as brief as this notably similar to a full work?" from "what? Of course not" to "well... do you have a fair use reason?" would mean that many people will need to start looking both ways and triple-checking whatever they create/summarize/report on as to avoid tipping off anyone hungry for some settlement money.
The llm summary is probably largely based on the Wikipedia summary…
Yes, I think the crux of the matter is what constitutes fair use and what doesn't. And I would say that a summary in an encyclopedia article about a copyrighted work is not only fair use, but also in the work's own interest, while being ingested and regurgitated by an LLM isn't, so... The article only mentions that twice, in passing.
What are you basing that on?
Fair use in usa is based on 4 factors - https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors
I think LLM summarizing a work would largely meet the traditional test.
Im also not sure why llm summarizing the work wouldnt be in the interest of the work. It seems like it would to me to the same extent a wikipedia summary would be.
No, it’s more fundamental than fair use. Fair use is a defence to a copyright infringement. It’s an argument that, yes, I did violate the copyright of the creator, but my violation should be allowed.
The issue here is, what are creators allowed to own and control. It’s about the fundamental question of what copyright gives the creators control over.
Does creating a picture mean you own the right to its description? And you have the right to prevent anyone else from describing your picture?
Does creating a movie give you the right to its novelization? I think the answer would be yes. With that in mind why wouldn't creating a picture give you the right to its description (subject to fair use)
In general i think fair use serves as a good balance for these types of questions.
> Does creating a movie give you the right to its novelization?
Go to imdb and see some of plot summaries there. Is that what you call "novelization"? Are you going to sue that site?
> Is that what you call "novelization"?
No, because that's not what the definition of novelization is. By novelization i mean when someone makes a novel version of a movie.
The point im trying to make here is we regularly expect translations from one medium to another to be copyrighted, so why is this different?
> Are you going to sue that site?
No, because i believe it to be a pretty text book example of fair use.
There's no reason the LLMs summary would not be considered "in the work's own interest" if Wikipedia's summary is.
To me the key difference is that Wikipedia summaries are written by a human, and so creativity imbues them with new copyright.
OpenAI outputs are an algorithm compressing text.
A jpeg thumbnail of an image is smaller but copyright-wise identical.
An OpenAI summary is a mechanically generated smaller version, so new creative copyright does not have a chance to enter in
Are you religious? If not, you should assume that your cognition is a product of your body, a magnificent machine.
I don't think LLMs are sapient, but your argument implies that creativity is something unique to humans and therefore no machine can ever have it. If the human body IS a machine, this is a contradiction.
Now, there's a very reasonable argument to be made concerning the purpose of copyright law, but "machines can't be creative" isn't it.
Creativity is not unique to humans, but legal rights to protect creativity is unique to humans (or human-represented organizations). Humans are always special case in law.
Selling human livers and selling cow livers are never treated the same in terms of legality. Even the difference between your liver and that of a cow is much, much smaller than the difference between your brain and Stable Diffusion. I'm sure there isn't single biochemical reaction that is unique to humans.
I am not religious, but our legal system does treat the human brain, and products therof, as unique.
Remember the monkey selfie thing? https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
It was ruled that our Copyright Law does require that a Human create the work, and that only a Human can hold copyright. The monkey was not given copyright over the image it took.
Monkeys obviously can be creative. However, our law has decided that human creativity is protected by copyright, and human creativity is special within the law. I don't see any contradictions or arguments about sapience that are relevant here.
The issue becomes there's little to no way to tell the difference between the two.
Additionally, if human summaries aren't copyright infringement, you can train LLMs on things such as the Wikipedia summaries. In this situation, they're still able to output "mechanical" summaries - are those legal?
> The issue becomes there's little to no way to tell the difference between the two.
If you and I write the exact same sentence, but we can prove that we did not know each other or have inspiration from each other, we both get unique copyright over the same sentence.
It has never been possible to tell the copyright status of a work without knowing who made it and what they knew when they made it.
I don't think that matters. A new copyright would just mean there are now multiple copyrights. It does not eliminate the original.
Also, the human produced summary is likely to have been produced by people who have read purchased books (i.e. legally distributed) whereas the algorithmic production of a summary has probably been fed dubious copies of books.
To add to your points, Wikipedia also generally cites its sources, whereas LLMs do not. I believe this is a significant distinction.
This.
Also there is fair use gray area. Unlike Wikipedia, ClosedAI is for profit to make money from this stuff and people using generated text do it for profit.
So if OpenAI stayed a non-profit, they'd be okay?
yes?
The ruling never said summaries are infringing. It just said the authors’ claims about some AI outputs were "plausible" enough to get past a motion to dismiss, which is basically the lowest hurdle. The judge isn’t deciding what actually counts as infringement, just that the case can move forward. IMHO the title of the article is reading more into the opinion than what the judge actually decided.
The author already fully addressed this in the article. They just think that even the fact that this was allowed to move forward is a worrying sign:
> Judge Stein’s order doesn’t resolve the authors’ claims, not by a long shot. And he was careful to point out that he was only considering the plausibility of the infringement allegation and not any potential fair use defenses. Nonetheless, I think this is a troubling decision that sets the bar on substantial similarity far too low.
From what I understood, the case against OpenAI wasn't about the summarisation. It was the fact that the AI was trained on copyrighted work. In case of Wikipedia, the assumption is that someone purchased the book, read it, and then summarised it.
That doesn't really make sense . Just because you purchased a book, does not mean the copyright goes away (for new works based on the book. For the physical book you bought, the doctrinevof first sale gives you some rights but only in that specific physical copy ). If openAI pirated material, that would be a separate issue from if the output of the LLM is infringing.
There are separate issues.
One is a large volume of pirated content used to train models.
Another is models reproducing copyrighted materials when given prompts.
In other words there's the input issue and the output issue and those two issues are separate.
They’re sort of separate. In a sense you could say that the ChatGPT model is a lossily compressed version of its training corpus. We acknowledge that a jpeg of a copyrighted image is a violation. If the model can recite Harry Potter word for word, even imperfectly, this is evidence that the model itself is an encoding of the book (among other things).
You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc, but a transformer model is not human, and very philosophically and economically importantly, human brains can’t be copied and scaled.
They're very separate in terms of what seems to have happened in this case. This lawsuit isn't about memory or LLMs being archival/compression software (imho, a very far reach) or anything like that. The plaintiffs took a bit of text that was generated by ChatGPT and accused OpenAI of violating their IP rights, using the output as proof. As far as I understand, the method at which ChatGPT arrived to the output or how Game of Thrones is "stored" within it is irrelevant, the authors allege that the output text itself is infringing regardless of circumstance and therefore OpenAI should pay up. If it's eventually found that the short summary is indeed infringing on the copyright of the full work, there is absolutely nothing preventing the authors (or someone else who could later refer to this case) from suing someone else who wrote a similar summary, with or without the use of AI.
> You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc
Also worth noting that, if a person performs a copyrighted work from memory - like a poem, a play, or a piece of music - that can still be a copyright violation. "I didn't copy anything, I just memorized it" isn't the get-out-of-jail-free card some people think it is.
I would guess that if there were a court case where a poet sued someone commercially that is for pay(say tickets specifically for it) reciting his poetry they might very well win. So reciting poetry probably could be copy right infringement at certain scale.
And as AI companies are commercial entities. I would lean towards direction where they doing it in general, even if not for repeating specific works, it could be infringement too.
A jpeg of a copyrighted image can be copyright infringement, but isn't necessarily. A trained model can be copyright infringement, but isn't necessarily. A human reciting poetry can be copyright infringement, but isn't necessarily.
The means of reproduction are immaterial; what matters is whether a specific use is permitted or not. That a reproduction of a work is found to be infringing in one context doesn't mean it is always infringing in all contexts; conversely, that a reproduction is considered fair use doesn't mean all uses of that reproduction will be considered fair.
I think we have no evidence someone bought the book and summarized. And what if an ai bought the book and summarized, is it fine now?
Yes. Anthropic won that one.
https://authorsguild.org/advocacy/artificial-intelligence/wh...
There is a world of difference between a corporation ingesting original works for the purposes of automatically, at scale generating derivative works for profit and a community of unpaid human volunteers working for a non-profit maintaining a public benefit encyplopedia.
Saying that merely denying the motion to dismiss claims that ChatGPT outputs infringed the rights of authors such as George R.R. Martin and David Baldacci is a “fundamental assault” on the idea-expression distinction as applied to works of fiction, and especially that it puts Wikipedia in crosshairs is beyond a stretch.
A motion to dismiss amounts is saying “come on, that’s ridiculous”. Denying the motion says “no, it’s not ridiculous; we may still decide it’s wrong, but it’s not ridiculous”.
If you consider it outrageous, as OP does (and I’m inclined to agree), the fact that the judge isn’t willing to laugh it out of court is an assault on your interpretation of copyright law.
Entertaining that the article about copyright-infringing similarity of AI-generated summaries is illustrated with a picture of an animated skeleton labelled "White Walker", which is neither what White Walkers are nor what they look like.
If a piece of information can be produced and consumed using general purpose computers, then there's no good reason for it to be made scarce under copyright. It should have a reciprocal copyleft license like AGPL or CC-BY-SA instead. This goes for digital drawings, writings, source code, AI model weights, and pretty much everything on the Internet. Forcing copyright on information that don't need to be scarce just creates extra problems.
--- a rant about using illustatration in articles ---
I open article. I see am image of a skeleton staring at a tablets back. I am left to wonder what author meant by that metaphor. The text after image states that "a white walker id reading Wikipedia" but I can plainly see that its not a white walker, the tablet is turned away from him, and the styling of the site is paroding New York Times.
After reading the article I still do not understand what this image is meant to communicate. And then I suddenly remember that alt text exist, and surely the text description would be useful to understand the intentions of the author. There isn't any alt description. And the bottom text is lying.
As I stare at the image looking at the teeth fused and ages lines on the skull, I suddenly understand that I might be the only person who looked at the image for more than 2 seconds.
Which is shite, because images are such a good way to communicate complex ideas simply or to illustrate a point. I shouldn't be expected to just skip it entirely. Yet all the time I spend trying to understand it was a time wasted.
You don't have to put illustrations in, if you don't know what you want them to mean! You especially don't have to put illustrations that have "high effort" appearance to them, because people would assume that it MEANS to illustrate something! And if you choose to generate something, it doesn't sound as a bad idea to check if that image conveys what you want to convey.
I love illustrations. I love pictures. There are great illustrations in my favorite manuals. It's saddening that people nowdays choose to add illustrations that create confusion, not even because better illustrations would be preferable, but because no illustration would be preferable to that.
It's as if the author thinks illustrations are just background noise, a garnish that you have to put in but serves no other purpose.
This is my favorite article on HN since the one on solar panels in Africa. Love to see a subject matter expert making a case at the bleeding edge of their field.
Honestly, i always thought this was how it always worked. A summary is by neccesisty a derrivative of the thing being summarized, but it is also very vert clearly fair use. Its transformational, its for an educational purpose, it contains only a tiny portion of the original work and it does not compete with the original work. I can't imagine anything more fair use then that.
Personally i'm not worried.
Look at that useless AI generated image "A white walker in a desolate field reading Wikipedia (an AI Image by Gemini)." It's not reading Wikipedia, it's staring at the back of a tablet.
The high seas are going to be crowded soon.
This is going to make anyone who does a college assignment explaining the general plot of a novel liable to copyright infringement. That’s absurd.
Wikipedia is careful to cite their sources. Is OpenAI as careful?
Absurdity aside, in thie specifc example Wikipedia doesn’t cite any sources.
The article (in English, anyway) is a summary of the plot of the book and there is not a footnote nor any external reference - and why would there be? It’s a summary of the plot, not a commentary or a critique of it. In this case, there’s no need to cite a source. https://en.wikipedia.org/wiki/A_Game_of_Thrones
That image caption says "A white walker in a desolate field reading Wikipedia", but the (backwards for some reason) Wikipedia article says "White Waleers". Forgive me for thinking this person might not have the necessary braincells to commentate on legal issues.
I like that the author saw a cartoon of a skeleton looking at the back of a tablet and thought “this is good enough to describe as a white walker reading Wikipedia”
"AI" keeps destroying free sources of information.
First it was library genesis and z-lib when meta torrented 70TB of books and then pulled off the ladder, recently it was Anna's archive and how they are coming for it (google and others), weird behaviors with some other torrent sites, now also Wikipedia is being used as a tool to defend LLMs breaking any semblance of copyright "law" unpunished.
All these actions will end up with very bad repercusions once the bubble bursts, there will be a lot of explaining to do.
For those of us who hate both intellectual property and OpenAI, it's hard to pick a side on this one. Hopefully there is a way both sides can lose.
One fundamental difference: Wikipedia is not a for-profit corporation. OpenAI is. That probably matters.
Yeah, it does matter, though the issue is not exactly just monetary profit. The fundamental problem is OpenAI has made the GPT model weights artificially scarce. But at the same time they claim that other artificially scarce information such as books should not be scarce and instead belong to the intellectual commons. The latter part which I agree with, but they took from the commons and are claiming what they took as exclusively their own. That is just evil.
There would be no problem if they open-sourced everything including the model weights. That was their original mission which they have abandoned.
Another fundamental difference: OpenAI explicitly markets their tool as a replacement for the copyrighted material it was trained on. This is most explicit for image generation, but applies to text as well.
As a reminder, the 4 factors of "fair use" in the United States:
1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
2. the nature of the copyrighted work;
3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
4. the effect of the use upon the potential market for or value of the copyrighted work.
I've never heard that non-profits can violate intellectual property laws. Otherwise, that might give advantages to Sci-hub, shadow libraries, etc.
The "Fair Use" doctrine has four major pillars that a sibling comment enumerated and you can officially find here: https://www.copyright.gov/fair-use/
One of them is the purpose or character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes.
OpenAI is a nonprofit.
Non-profit OpenAI ("OpenAI Foundation") holds a 26% interest in for-profit OpenAI ("OpenAI Group PBC").
https://www.cnbc.com/2025/10/28/open-ai-for-profit-microsoft...
not since oct28
Non for profit does not equal to no salaries for executives- they still have highly inflated salaries.
Non for profit just means there is no dividends to owners but they can very well get huge salaries. So actually non for profit is a very bad name.
Should be called non dividend company.
It should be called exactly what it is called, because that is the correct term for benefits accrued to an owner.
>> Every year, I ask students in my copyright class why the children’s versions of classic novels in Colting were found to be infringing but a Wikipedia summary of the plots of those same books probably wouldn’t be.
Not a lawyer, but the answer seems to obviously be that one is a commercial reproduction and the other is not. Seems like it would be a tougher questiom if the synopsis was in a set of Encyclopedia Britannica or something.
AI is clearly reproducing work for commercial purposes... ie reselling it in a new format. LLMs are compression engines. If I compress a movie into another format and sell DVDs of it, that's a pretty obvious violation of copyright law. If I publish every 24th frame of a movie in an illustrated book, that's a clear violation, even if I blur things or change the color scheme.
If I describe to someone, for free, what happened in a movie, I don't see how that's a violation. The premise here seems wrong.
Something else: Even a single condensation sold for profit only creates one new copyright itself. LLMs wash the material so that they can generate endless new copyrighted material that's derivative of the original. Doesn't that obliterate the idea of any copyright at all?
Good guess, but no. The most salient difference in that case is that an abridged children's version of a novel acts as a direct market substitute for the original, whereas a plot summary does not. (A secondary reason is that an abridged edition is likely to represent a much larger portion of the original work than would appear in a summary.)
For further reading, see: https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors
Consider this - if I wanted to read A Game of Thrones, then I would read A Game of Thrones, not some bootleg LLM approximation. It is faster, more exact an cheaper to infringe by copying, a LLM is a terrible tool for infringement, it is slow, expensive and doesn't actually reproduce perfectly. The fact that some are using AI means they want something different, not the original.
Yes of course as a reader you would read the original. The major infringement isn't the LLM directly spitting out parts of the book to an end user (a reader). It occurs when the LLM injects parts of the book into a new text, without attribution, which some other writer will go on and sell thousands of copies of. The LLM acts as the washing machine.