Reddit sues Perplexity for scraping data to train AI system. This is where it all begins. From what I gather, it’s less about the principle of data scraping, and more about the simple fact that Perplexity didn’t pay for the privilege of using Reddit’s data. It’s like, you want to use our words, our salty basement dweller prose, to build your AI? Fine, but you gotta pay the toll. It’s a transaction, a business deal. The whole issue of AI training using the unfiltered ramblings of the internet… well, that’s another story altogether.

Reddit, in the wake of its IPO, has seemingly changed. There is a sense of disappointment in how the platform has evolved, a feeling that something has been lost in the pursuit of profit. And then there’s the question of what Perplexity actually *is*. Is it a search engine powered by AI, or is it something more? Does it create original content, or is it just cleverly repackaging existing information?

Consider the value of the content created on Reddit. Users build, curate, and moderate communities, generating a vast, readily available dataset. Reddit, however, has often been criticized for how it treats its user base. The irony of the situation is glaring: Reddit is now complaining about practices it has previously engaged in, all the while users have not received any compensation. It’s a bit hypocritical, wouldn’t you say?

These tech companies scraping data, the ethical questions raised, and the potential for a flood of low-quality, AI-generated content…it all raises significant concerns. The idea of an AI trained on the unfiltered opinions and often-questionable content of Reddit feels… well, it feels like it could go wrong in a number of entertaining and possibly disastrous ways. There’s a real chance AI may not answer with facts, but just with the “that’s what she said” memes.

The consensus here seems to be that Reddit is primarily after a settlement. It’s about getting a cut of the profits generated by these AI companies. Some also believe this is the way the world has turned in the age of IPO and public market interests. It’s the data, after all, built on the backs of its users.

The whole situation also highlights the evolving legal landscape surrounding data and AI. The question of whether or not a company can scrape data from a public platform and use it to train an AI is a complex one, and this lawsuit could set a precedent. The situation is not entirely new; the old adage that “parties cannot agree on a price, have decided to pay lawyers to help them find it” appears to be at play.

The user base is now at least acknowledging the impact and the potential consequences. One can already see how some subs are being designed for AI training. There is a lot of concern in the potential for this. There’s a potential for the emergence of a hall of mirrors, where AI is trained on its own content.

The use of AI in potentially sensitive areas, like therapy, further complicates matters. The idea of an AI drawing on Reddit comments to provide therapy is… well, that’s a nightmare scenario if there ever was one. It could also lead to AI becoming racist, and also reinforces that we are being watched.

The potential for AI to be terrible, to be filled with hallucinations and bad jokes, is also considered. Perhaps this is the best outcome: that AI is so awful, so full of errors, that it simply dies out.

There are also pragmatic steps users can take to protect themselves, such as using Reddit Redact to scramble old comments. It’s a reminder that anything posted online can be used, and that your own data privacy is at stake. The whole situation emphasizes that, anyone who’s been on Reddit for a while probably revealed more than they realize, and with AI, that data can be used in ways that are hard to predict.

There is a sense of detachment from the lawsuit, an acknowledgment that the comments of average users are not only what make the platform what it is, but now what is being utilized to potentially train the very AI that will replace their presence. The real value is the content. And the value will become something very different if Reddit loses its case.