This article was last updated on August 14, 2024
Canada: Oye! Times readers Get FREE $30 to spend on Amazon, Walmart…
USA: Oye! Times readers Get FREE $30 to spend on Amazon, Walmart…
Stichting Brein takes a large amount of illegal data for AI training offline
Copyright organization Stichting Brein has taken offline a Dutch dataset, a collection of data, that was intended for training artificial intelligence (AI). According to the organization, this is the first time that this has happened in the Netherlands.
Brein itself speaks of a “large dataset” that, according to the organization, consists of illegal copies of tens of thousands of books, millions of lines from news articles from websites such as Nu.nl and subtitles of countless films and TV series from illegal sources. Director Bastiaan van Ramshorst also says he knows who the creator is, but cannot say for privacy reasons.
Use data set
The dataset is intended to train a so-called language model, in jargon these are called large language models. The creator of the dataset has promised Brein in writing not to use it anymore and has also provided information about who received it. The foundation is now checking whether the data has actually been used by AI models. If that is the case, the parties will be held accountable.
Copyright infringing material is a major problem when training AI. Recently, research that it strongly appears that works by Dutch image makers have been used without their permission to train well-known AI image generators, including DALL-E and Midjourney.
In the US, there is currently a lawsuit between The New York Times and OpenAI, the maker of ChatGPT. The newspaper accuses the company of using massive amounts of newspaper articles to train AI without permission. OpenAI believes that using the data is permitted.
Be the first to comment