Will we try to prevent google (and other) scrapers?
The headline is pretty much a summary. “Google Says It will Scrape Everything You Post Online for AI” https://www.gizmodo.com.au/2023/07/google-says-it-will-scrape-everything-you-post-online-for-ai/
The first question is obviously; do we as a community on Lemmy even want to try and stop them from scraping our content here? If no; well. ok then.
If yes; how? I’m not sure if “preventing access” to unregistered users would really prevent this. Pretty sure google has enough money and manpower to figure out a way to make it their mission to get around “can only accessed by members” content.
I personnaly agree with you. If content is not supposed to be searchable, maybe don’t post it online. It is a different problem for writers, artists and possibly even journalists.
But I think it’s a fair debate - unfortuantely one that was one (or the only?) reason the whole reddit API debacle startetd.
On the other hand maybe Lemmy should allow certain communities allow an “only for members” view?
The reddit API thing started because reddit thought they owned the content and could lock it behind a paywall for people who want training data. But that fundamentally isn’t the case, so that whole thing backfired.
If someone wants to own the content and restrict access, they have to distribute it on their own instead of using a public platform. Lemmy is the wrong tool for that.
I don’t think I agree with that. A public forum is a place for public discussion. I think the word public implies that anything you post will no longer be your private content with restricted access, certainly. But you still own your content and should be able to choose if someone uses it to train a model to mimic your writing/style. For example, if we say you don’t get to own any content on lemmy then we may as well shut down the various world building communities. People who post to those certainly want to own their content, especially when they work so hard on it.
AI and LLM are still breaking ground, and the legality/ethics of training models based on others’ creative works to later mimic and claim ownership of is still being discussed. It’s different than a human being influenced by his favorite author’s writing style or art style, so there are a lot of questions in the ether about it.
In either case, I think it’s healthy to let the discussions take place, and see what direction the winds blow. I’m personally ready for people to realize that skynet is not just around the corner. I’m also incredibly sick of everyone telling me that we’re not going to have jobs in 5 years.
My bits here: AI and LLMs have gained a great hype and will not be stopped, as they are very useful, at least in some usecases. However, if the access is restricted by cost, only large megacorps will be able to train the most performent models and study how they work, eventually ending up in an oligo- or monopol. If access is free to everyone, open concepts can be developed much better, not ending up in total dependence on those megacorps. Because of this I support free data scraping for everyone!