News

NVIDIA the latest accused of improper scraping for AI training

Brendyn Lotz

6th August 2024

An explosive report documents allegations that NVIDIA has used YouTube and Netflix data to train AI without permission.
The company reportedly tried to evade protections put in place by YouTube to download 80 years of content every day.
NVIDIA says that scrapping data for training is an open legal issue it will address in future.

Copyright rules are made up, at least that seems to be the attitude of practically every company working on generative artificial intelligence, including NVIDIA according to recent allegations.

As alleged by 40 4 Media, NVIDIA told workers to download content from YouTube and Netflix to develop commercial AI projects. The data gleaned would be used to develop models for projects such as Omniverse 3D world generator, self-driving cars and more under the Cosmos banner.

The company told Engadget that it was in compliance with the letter and spirit of copyright law and highlights that intellectual property law protects specific expressions and no facts, ideas, data or information. In fact, the trillion dollar company went so far as to say that the practice of scrapping data can be likened to a human learning.

However, 404 Media reports that NVIDIA employees were downloading 80 years worth of video a day. The excuse that this is just like a human learning, sort of falls flat on its face when the learning is more than a person could accomplish in a lifetime.

The report highlights how employees raised ethical and legal concerns about downloading data for training purposes only for these concerns to be brushed off by upper management. Employees were told that the use of this data was an “open legal issue” that would be resolved in the future.

YouTube has reiterated its stance that improperly scrapping data from its site is a violation of its terms of use. Netflix also told 404 Media it had no deal with NVIDIA that allowed it to scrap its content for training purposes.

The company even used datasets that were expressly labelled as for academic use in its training for a commercial product.

And it seems as if NVIDIA knew that what it was doing was wrong. The company reportedly set up virtual machines to evade detection while downloading videos from YouTube.

The report highlights the unapologetic approach AI companies have to scrapping data improperly in pursuit of even higher profits. The attitude is clearly to ask for forgiveness and given the pallets of money these firms are making, settlements are increasingly more likely than any company being found to be in contravention of the law in a court.

It’s not clear whether NVIDIA will face consequences on the back of this report but we remain hopeful that eventually the legal system will reject claims that what is being done here isn’t legal.

About Author

Brendyn Lotz

Brendyn Lotz writes news, reviews, and opinion pieces for Hypertext. His interests include SMEs, innovation on the African continent, cybersecurity, blockchain, games, geek culture and YouTube.