"ChatGPT Creator Suspects China's DeepSeek AI Used OpenAI Data, Sparking Online Irony"

The people behind ChatGPT have raised concerns that China's DeepSeek AI models, which are significantly cheaper than Western alternatives, might have been developed using data from OpenAI. This suspicion has sparked a significant reaction in the U.S. tech industry, with former President Donald Trump labeling DeepSeek as a "wake-up call" following a massive $600 billion drop in Nvidia's market value.

The introduction of DeepSeek has triggered a sharp decline in stocks of companies deeply invested in AI technology. Nvidia, a key player in the GPU market essential for running AI models, experienced a historic 16.86% drop in its share price. Other tech giants like Microsoft, Meta Platforms, and Google's parent company Alphabet saw declines ranging from 2.1% to 4.2%, while Dell Technologies, a maker of AI servers, fell by 8.7%.

DeepSeek's R1 model, built on the open-source DeepSeek-V3, claims to require less computational power and was reportedly trained for just $6 million. These claims have led to skepticism about the hefty investments U.S. tech companies are making in AI, causing unease among investors. DeepSeek's app quickly rose to the top of the U.S. free app download charts amid discussions about its effectiveness.

Bloomberg reported that OpenAI and Microsoft are investigating whether DeepSeek used OpenAI's API to incorporate OpenAI's models into their own. OpenAI acknowledged to Bloomberg that Chinese companies and others are attempting to distill data from leading U.S. AI models, a practice that violates OpenAI's terms of service.

OpenAI emphasized its efforts to protect its intellectual property and highlighted the importance of collaborating with the U.S. government to safeguard advanced AI models from being exploited by competitors and adversaries. David Sacks, Trump's AI czar, told Fox News that there is significant evidence suggesting DeepSeek used distillation to extract knowledge from OpenAI's models, and predicted that leading U.S. AI companies would soon take steps to prevent such practices.

Amid these developments, critics have pointed out the irony of OpenAI's accusations, given its own history of using copyrighted internet content to train ChatGPT. In January 2024, OpenAI argued in a submission to the UK's House of Lords that it is "impossible" to develop AI models like ChatGPT without using copyrighted materials, as copyright covers virtually all forms of human expression. They further stated that limiting training data to public domain works would not meet the needs of today's users.

The use of copyrighted materials in AI training has become a contentious issue, highlighted by lawsuits such as the New York Times' action against OpenAI and Microsoft in December 2023 for the "unlawful use" of its content. OpenAI defended its practices as "fair use" and dismissed the lawsuit as baseless. Similarly, a lawsuit filed by 17 authors, including George R. R. Martin, in September 2023, accused OpenAI of "systematic theft on a mass scale."

Legal precedents, such as the 2018 U.S. Copyright Office ruling upheld by District Judge Beryl Howell in August 2023, have further complicated the issue by stating that AI-generated art cannot be copyrighted due to the essential link between human creativity and copyright protection.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.