Artificial Intelligence & American Copyright Law: Analyzing the Copyright Office’s AI Report

Copyright Office’s AI Report: The Good, The Bad, and The Controversial

The Copyright Office just dropped Part 3 of its AI report, which aimed at addressing certain copyright law in regards to Artificial Intelligence. The thing that’s got everyone talking is the fact that the report was supposed to tackle infringement issues head on, but instead teased us by saying that answer will come up in “Part 4” that is expected to be released at a later date. Let’s dive into what was actually discussed.

Legal Theory: A Case by Case Basis

The report’s central thesis is a pretty straightforward legal theory. Basically, they recommend that there will be no blanket rule on whether training AI on copyrighted content constitutes infringement or fair use. Everything gets the case by case treatment, which is both realistic and frustrating depending on where you sit. That’s because most lawyers like clear bright line rules backed up by years of precedent, but when attempting to make legal frameworks regarding emerging technologies, the brightline approach is easier said than done.

The report acknowledges that scraping content for training data is different from generating outputs, and those are different from outputs that get used commercially. Each stage implicates different exclusive rights, and each deserves separate analysis. So in essence, what’s  actually useful here is the recognition that AI development involves multiple stages, each with its’ unique copyright implications.

This multi stage approach makes sense, but it also means more complexity for everyone involved. Tech companies can’t just assume that fair use covers everything they’re doing and content creators can’t assume it covers nothing. The devil is in the details.

Transformative Use Gets Complicated

The report reaffirms that various uses of copyrighted works in AI training are “likely to be transformative,” but then immediately complicates things by noting that transformative doesn’t automatically mean fair. The fairness analysis depends on what works were used, where they came from, what purpose they served, and what controls exist on outputs.

This nuanced approach is probably correct legally, but it’s also a nightmare for anyone trying to build AI systems at scale. You can’t just slap a “transformative use” label on everything and call it a day. The source of the material matters, and whether the content was pirated or legally obtained can factor into the analysis. So clearly purpose also matters since commercial use and research use will likely yield different results in the copyright realm. Control and mitigation matter in this context because developing the necessary guardrails is paramount to preventing direct copying or market substitution.

Nothing too revolutionary here, but the emphasis on these factors signals that the Copyright Office is taking a more sophisticated approach than some of the more simplistic takes we’ve seen from various opinions on this matter. This should be reassuring since a one size fits all approach at such an early stage of developing AI could stifle innovation. However if things are left to be too uncontrolled copyrighted works may face infringements to their copyright.

The Fourth Factor Controversy

Here’s where things get interesting and controversial. The report takes an expansive view of the fourth fair use factor: which is the effect on the potential market for the copyrighted work. That is because too many copyrighted works flooding the market brings fears of market dilution, lost licensing opportunities, and broader economic impacts.

The Office’s position is that the statute covers any “effect” on the potential market, which is broad interpretation. But that broad interpretation has a reason, they are worried about the “speed and scale” at which AI systems can generate content, creating what they see as a “serious risk of diluting markets” for similar works. Imagine an artist creates a new masterpiece only to get it copied by an AI model which makes the piece easily recreatble by anyone, diluting the value of the original masterpiece. These types of things are happening on the market today.

This gets particularly thorny when it comes to style. The report acknowledges that copyright doesn’t protect style per se, but then argues that AI models generating “material stylistically similar to works in their training data” could still cause market harm. That’s a fascinating tension, you can’t copyright a style but you might be able to claim market harm from AI systems that replicate it too effectively. It is going to be interesting to see how a court applies these rules in the coming future.

This interpretation could be a game-changer, and not necessarily in a good way for AI developers. If every stylistic similarity becomes a potential market harm argument, the fair use analysis becomes much more restrictive than many in the tech industry have been assuming.

The Guardrails

One of the more practical takeaways from the report is its emphasis on “guardrails” as a way to reduce infringement risk. The message is clear: if you’re building AI systems, you better have robust controls in place to prevent direct copying, attribution failures, and market substitution.

This is where the rubber meets the road for AI companies. Technical safeguards, content filtering, attribution systems, and output controls aren’t just up to the discretion of the engineers anymore they’re becoming essential elements of any defensible fair use argument.

The report doesn’t specify exactly what guardrails are sufficient, which leaves everyone guessing. But the implication is clear: the more you can show you’re taking steps to prevent harmful outputs, the stronger your fair use position becomes. So theoretically if a model has enough guardrails they may be able to mitigate their damages if the model happens to accidently output copyrighted works.

RAG Gets Attention

The report also dives into Retrieval Augmented Generation (RAG), which is significant because RAG systems work differently from traditional training approaches. Instead of baking copyrighted content into model weights, RAG systems retrieve and reference content dynamically.

This creates different copyright implications: potentially more like traditional quotation and citation than wholesale copying. But it also creates new challenges around attribution, licensing, and fair use analysis. The report doesn’t resolve these issues, but it signals that the Copyright Office is paying attention to the technical details that matter.

Licensing

The report endorses voluntary licensing and extended collective licensing as potential solutions, while rejecting compulsory licensing schemes or new legislation “for now.” This is probably the most politically palatable position, but it doesn’t solve the practical problems.

Voluntary licensing sounds great in theory, but the transaction costs are enormous when you’re dealing with millions of works from thousands of rights holders. Extended collective licensing might work for some use cases, but it requires coordination that doesn’t currently exist in most creative industries.

The “for now” qualifier is doing a lot of work here. It suggests that if voluntary solutions don’t emerge, more aggressive interventions might be on the table later.

The Real Stakes

What makes this report particularly significant isn’t just what it says, but what it signals about the broader policy direction. The Copyright Office is clearly trying to thread the needle between protecting creators and enabling innovation, but the emphasis on expansive market harm analysis tilts toward the protection side.

For AI companies, this report is a warning shot. The days of assuming that everything falls under fair use are over. The need for licensing, guardrails, and careful legal analysis is becoming unavoidable.

For content creators, it’s a mixed bag. The report takes their concerns seriously and provides some theoretical protection, but it doesn’t offer the clear-cut prohibitions that some have been seeking.

The real test will come in the courts, where these theoretical frameworks meet practical disputes. But this report will likely influence how those cases get decided, making it required reading for anyone in the AI space.

As we can see AI and copyright law is becoming only more and more complex. The simple answers that everyone wants don’t exist, and this report makes that abundantly clear. The question now is whether the industry can adapt to this new reality or whether we’re heading for a collision that nobody really wants.

Section 230: From Jordan Belfort to Gonzalez- The Law That Made The Modern Internet

On May 24, 1995, Jordan Belfort’s brokerage firm Stratton Oakmont successfully sued Prodigy Communications Corporation in a New York court for defamation. Little did anyone know Stratton’s win over Prodigy would be the catalyst that changed the internet forever. The so called Wolves of Wallstreet had unknowingly set a dangerous precedent that threated the tech industry.

Stratton Oakmont v. Prodigy Services

Prodigy was an online internet service which more or less mirrored modern day social media sites, it serviced over 2 million people at its peak. Users were able to utilize a broad range of services such as getting access to news, weather updates, shopping, and bulletin boards. One of Prodigy’s notorious bulletin boards was called Money Talk, a popular forum where members would discuss economics, finance, and stocks- similar to Reddit’s  Wallstreet Bets forum. Prodigy also contracted with independent moderators to vet and participate in the board discussions, similar to editors in a Newspaper but who engaged with their audience a lot more.

In 1994, two posts would subject Prodigy to legal liability. An unidentified user posted on the Money Talk bulletin on the dates of October 23rd & 25th, claiming that Stratton Oakmont was committing SEC violations and engaging in fraud in regards to an IPO they were involved in (Solomon-Page’s IPO) . The poster claimed:

  • the Solomon-Page IPO  was a “major criminal fraud” and “100% criminal fraud”
  • Daniel Porush was a “soon to be proven criminal”
  • Stratton was a “cult of brokers who either lie for a living or get fired.”

Ironically many of these claims would turn out to be true, however at the time they were unsubstantiated since there was no concrete evidence to back them up. After Stratton was made aware of the posts, the company and Daniel Porush (aka Jonah Hill’s character in the movie) commenced legal action against Prodigy for defamation due to the libelous statements made on Money Talks.

In the United States defamation claims are not plaintiff friendly due to the strong protections the 1st Amendment offers. In general in order to succeed in a defamation, a plaintiff must prove four elements:

1) a false statement purporting to be fact;

2) publication or communication of that statement to a third person;

3) fault amounting to at least negligence; and

4) damages, or some harm caused to the reputation of the person or entity who is the subject of the statement.

In the suit brought by Stratton against Prodigy the court focused on element 2 and 3. Namely whether or not Prodigy was a publisher & if the moderator’s acts or omissions while editing the Money Talk bulletin board amounted to at least negligence. The court ruled in favor for Stratton Oakmont.

  The court reasoned that an operator of an online message board is considered a publisher for purposes of defamation liability. Specifically, if the online operator holds itself out as controlling the content of the message board and implements such control through guidelines and screening programs. An entity that repeats or otherwise republishes a libel post is subject to liability as if he had originally published it. But a party disseminating others’ content only faces libel liability if the party qualifies as a publisher rather than a distributor. If a party merely “distributes” others’ content, then the party is a distributor and is not subject to liability. The court used the phrase “passive conduits” to describe distributors. A passive conduit doesn’t face liability for libel absent a finding that the distributor knew or had reason to know that distributed content contained defamatory statements. Basically, if you had a content moderation system for user generated content your website was likely liable for defamation. Defenses such as the impracticability of moderating millions of user generated posts had no merit and would still subject the website to defamation claims. This ruling would shake the tech and internet industry, threatening to stunt and undo years of innovation.

To avoid a barrage of lawsuits the tech industry successfully lobbied Congress to act after the Stratton ruling. In 1996 Congress passed the Communications Decency Act which dealt with various internet related issues. The one most pertinent to us is Section 230(c).

Jeff Kosseff one of the leading scholars on Section 230 describes it as “the twenty-six words that created the internet.” The 26 words are:

 “No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.”

These words provide immunity to online platforms from being held liable for user-generated content. Basically, it means that online platforms such as social media sites, forums, and search engines cannot be sued or prosecuted for what a user posts on their platforms. Even if the posts themselves are defamatory, false, or harmful .

This immunity has been vital in enabling the growth of the internet and the rise of social media platforms. It has allowed these platforms to provide a space for free expression and to facilitate the exchange of information and ideas without fear of legal consequences, unlike Prodigy who was a victim of pre Section 230 protection. This has also allowed smaller and newer online platforms to compete with established ones without having to worry about legal liabilities. Without Section 230, the internet me and you know and love would not exist- I likely would not be able to publish my articles without exposing myself to liability.

However, recently there have been attacks and concerns over the immunity Section 230 provides. Specifically that online platforms are basically not responsible  for any harmful content on their platforms, such as hate speech, harassment, and misinformation. Basically, people not in favor of the immunity argue that Section 230 has created an environment in which online hate speech and harassment can thrive. One recent case that puts forth such an argument Gonzalez v Google, has made it all the way to Supreme Court.

Gonzalez v. Google & It’s Implications

Gonzalez alleges ISIS generally used YouTube (owned by Google) to recruit members into its terror cells and “communicate its (ISIS’) desired messages.” which lead to the horrific events that occurred in Paris in 2015. Nohemi Gonzalez, a US citizen was unfortunately killed during the ISIS terror attacks that gripped the world in 2015. ISIS would later claim full responsibility for the attacks that lead to the untimely passing of Gonzalez.

Gonzalez argues that since YouTube videos helped fuel “the rise of ISIS” by knowingly recommending ISIS videos to its users, they are directly responsible for causing the Paris attack. They back up their argument that Google knew of such activity by claiming “[d]espite extensive media coverage, complaints, legal warnings, congressional hearings, and other attention for providing online social media platform and communications services to ISIS, prior to the Paris attacks Google continued to provide those resources and services to ISIS and its affiliates, refusing to actively identify ISIS YouTube accounts and only reviewing accounts reported by other YouTube users.” Their argument suggests that Google’s algorithms fall out of the scope of Section 230 and therefore subject them to liability.

Google contends that Section 230 fully immunizes them from such a suit based on judicial precedent and congressional intent, that their terms of services directly prohibit content that promotes terrorism, and they actively blocked such content when it was published by hiring Middle Eastern content moderators that worked 24/7 to flag terroristic content. Before the case arrived to the Supreme Court, all lower courts found in favor for Google.

Jess Miers a prominent Section 230 scholar mentions that this case “tees up a contentious question for the Supreme Court: whether Section 230 — a law that empowers websites to host, display and moderate user content — covers algorithmic curation.”. She points out that a vast majority of websites use non neutral algorithms, and that if the Supreme Court were to side in favor of Gonzalez it would open the flood gates of litigation against online services providers that rely on algorithms to function. Not only that but this ruling could incentivize states to curate the internet to achieve their ideological means, such as punishing websites for cracking down on misinformation or providing information for abortion services. This would give states too much power, and could lead to arbitrary curation of the information you see on the internet. A significant blow to consumers and arguably the facilitation of the 1st Amendment for Americans.

Only time will tell what happens with Section 230 as the court is expected to makes in ruling this summer. Hopefully, they make the right decision.

Sources:

STRATTON OAKMONT, INC. and Daniel Porush, Plaintiff(s),
v. PRODIGY SERVICES COMPANY, a Partnership of Joint Venture with IBM Corporation and Sears-Roebuck & Company, “John Doe” and “Mary Doe”, Defendant(s). Supreme Court, Nassau County, New York, Trial IAS Part 34.

Jeff Kosef: https://www.propublica.org/article/nsu-section-230

Jess Miers: High Court Should Protect Section 230 In Google Case https://www.law360.com/articles/1567399


Briefs of both parties in Supreme Court: REYNALDO GONZALEZ, et al., v. GOOGLE LLC,