Security News > 2023 > July > Class-Action Lawsuit for Scraping Data without Permission

Class-Action Lawsuit for Scraping Data without Permission
2023-07-05 11:14

I have mixed feelings about this class-action lawsuit against OpenAI and Microsoft, claiming that it "Scraped 300 billion words from the internet" without either registering as a data broker or obtaining consent.

On the one hand, I want this to be a protected fair use of public data.

Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions.

This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale.

We already see AI startups hammering the Internet Archive for training data.

What this means is that text from before last year-text that is known human-generated-will become increasingly valuable.


News URL

https://www.schneier.com/blog/archives/2023/07/class-action-lawsuit-for-scraping-data-without-permission.html