The BBC is pursuing legal action against Perplexity AI, a US-based chatbot company, for unauthorized verbatim reproduction of BBC content, constituting copyright infringement and breach of terms of use. This action, the first of its kind for the BBC, stems from Perplexity’s alleged disregard for BBC’s “robots.txt” directives and follows earlier BBC research revealing inaccuracies and misrepresentations of BBC news in several popular AI chatbots, including Perplexity AI. The BBC’s legal letter demands Perplexity cease using BBC content, delete existing material, and provide financial compensation. The Professional Publishers Association also expressed deep concern regarding the broader issue of AI platforms’ copyright infringements.
Read the original article here
The BBC’s threat of legal action against Perplexity AI highlights a simmering conflict at the heart of the burgeoning AI industry: the legality of using copyrighted material for model training. Perplexity’s defensive statement, mentioning Google’s alleged “illegal monopoly,” feels like a deflection, a desperate attempt to shift the blame from the core issue—the unauthorized use of BBC content. Their refusal to elaborate on the Google connection only deepens the suspicion of wrongdoing.
The underlying issue isn’t simply a matter of corporate squabbling; it’s a much larger problem. Many AI companies openly admit their models’ success hinges on unrestricted access to online data, even if that data is copyrighted. This blatant disregard for intellectual property rights strikes at the very foundation of creative work and ownership. The assertion that these companies would fail without this free-for-all access feels like a justification for theft, a declaration that profit supersedes ethical considerations and legal frameworks. This shouldn’t be tolerated; these companies should not be allowed to profit from systematic theft.
The comparison to earlier forms of digital piracy, like MP3 downloading, is apt. While the methods have evolved, the underlying act of unauthorized appropriation remains the same. The difference now lies in the scale: We’re no longer talking about individual files, but the wholesale appropriation of vast datasets encompassing billions of pieces of copyrighted work. While it may be difficult to comprehend the sheer volume of data involved, and how difficult it may be to trace where specific bits and pieces of information are drawn from, this should not excuse the companies from their responsibilities.
Perplexity’s claim that they are simply doing what Google and Bing already do misses a crucial point: the act of stealing intellectual property. While established search engines may summarize information from various websites, that doesn’t legitimize Perplexity’s actions. Their argument reeks of hypocrisy – what is permissible for giants like Google and Bing does not automatically grant permission to all comers.
The “hammer” analogy used by Perplexity fails to hold water. While a hammer’s uses are inherently versatile, an AI model doesn’t simply utilize data; it replicates it, often with astonishing accuracy. The output isn’t a creative interpretation of source material; it’s a near-perfect reproduction, a digital clone. This isn’t building a house with a hammer; it’s producing counterfeit goods with a printing press. The analogy breaks down fundamentally: a hammer doesn’t inherently reproduce Mickey Mouse, but a LLM trained on copyrighted Disney materials may very well do so. Therefore, holding companies responsible for the actions of their tools is reasonable, especially when those tools are designed to reproduce copyrighted material.
The argument that AI companies shouldn’t be responsible for the actions of their users is also weak. If a company creates a tool specifically designed to infringe on intellectual property, it bears a significant portion of responsibility. The notion that they should be absolved because they didn’t explicitly intend for their creations to steal is ludicrous. Creating a tool designed to facilitate theft, even without direct user intent, is itself culpable. The problem here is that the data used to create these tools is not properly sourced or obtained.
The precedent already exists for legal action against AI companies who use copyrighted material for training without authorization. Lawsuits are piling up, involving major players such as Disney and Meta, demonstrating the untenable nature of the AI industry’s current approach. The BBC’s action may set a vital precedent, paving the way for further challenges against less well-funded AI companies.
The transformative use doctrine, successfully employed by Google Books, might be invoked, arguing that the overall model constitutes a new creative work. However, this defense crumbles if the AI outputs directly compete with existing copyrighted works—creating derivative works, in essence. The key lies in the purpose of the training data: if it’s used to generate outputs that compete in the marketplace with individual IP holders, it becomes clear copyright infringement.
The issue goes beyond financial compensation; it strikes at the core of creativity and innovation. The current practices of many AI companies not only undermine established artists and creators but also damage the incentive structure which encourages the creation of new intellectual property. This isn’t merely a technological hurdle to overcome; it’s a fundamental ethical and legal challenge. The current model of scraping the internet to create AI models needs to change, and lawsuits like the one against Perplexity are an important first step toward achieving that change. The future of AI hinges on finding a sustainable, legal, and ethical solution, and avoiding a future where AI-generated ransom videos become commonplace.
