The Defender’s Big Brother NewsWatch brings you the latest headlines related to governments’ abuse of power, including attacks on democracy, civil liberties and use of mass surveillance. The views expressed in the excerpts from other news sources do not necessarily reflect the views of The Defender.
OpenAI stole “massive amounts of personal data” to train ChatGPT, a lawsuit alleges. The proposed class-action suit claims that Sam Altman’s company “secretly” harvested data to train its large language models so that its chatbot could replicate human language.
The lawsuit alleges that OpenAI crawled the web to amass huge amounts of data, including vast quantities taken from social media sites. OpenAI’s proprietary AI corpus of personal data, WebText2, for example, scraped huge amounts of data from Reddit posts and the websites they linked to, the lawsuit claims.
The data accessed included “private information and private conversations, medical data, information about children — essentially every piece of data exchanged on the internet it could take — without notice to the owners or users of such data, much less with anyone’s permission,” per the lawsuit.
This amounted to “the negligent and otherwise illegal theft of personal data of millions of Americans who do not even use AI tools,” the lawsuit claims.
As well as OpenAI, major backer Microsoft was named as a defendant.