As leading artificial intelligence companies release increasingly capable AI systems, a new report is sounding the alarm about what it says are some of those companies’ lagging safety practices.
The Winter 2025 AI Safety Index, which examines the safety protocols of eight leading AI companies, found that their approaches “lack the concrete safeguards, independent oversight and credible long-term risk-management strategies that such powerful systems demand.”
Sabina Nong, an AI safety investigator at the nonprofit Future of Life Institute (FLI), which organized the report and works to address large-scale risks from technologies like nuclear weapons and AI, said in an interview at the San Diego Alignment Workshop that the analysis revealed a divide in organizations’ approaches to safety.
“We see two clusters of companies in terms of their safety promises and practices,” Nong said. “Three companies are leading: Anthropic, OpenAI, Google DeepMind, in that order, and then five other companies are on the next tier.”
The lower tier of five companies includes xAI and Meta, along with the Chinese AI companies Z.ai, DeepSeek and Alibaba Cloud. Chinese models have been increasingly adopted in Silicon Valley as their capabilities have quickly advanced, and they are readily available because they are largely open source.
Anthropic, the highest-ranked company on the list, got a C+ grade, while Alibaba Cloud, the lowest-ranked, received a D-.
The index examined 35 safety indicators across six domains, including companies’ risk-assessment practices, information sharing protocols and whistleblowing protections, in addition to support for AI safety research.
Eight independent AI experts, including Massachusetts Institute of Technology professor Dylan Hadfield-Menell and Yi Zeng, a professor at the Chinese Academy of Sciences, graded companies’ fulfillment of the safety indicators.
FLI President Max Tegmark, an MIT professor, said the report provided clear evidence that AI companies are speeding toward a dangerous future, partly because of a lack of regulations around AI.
“The only reason that there are so many C’s and D’s and F’s in the report is because there are fewer regulations on AI than on making sandwiches,” Tegmark told NBC News, referring to the continued lack of adequate AI laws and the established nature of food-safety regulation.
The report recommended that AI companies share more information about their internal processes and assessments, use independent safety evaluators, increase efforts to prevent AI psychosis and harm and reduce lobbying, among other measures.
Tegmark, Nong and FLI are particularly concerned about the potential for AI systems to cause catastrophic harm, especially given calls from AI leaders like Sam Altman, the CEO of OpenAI, to build AI systems that are smarter than humans — also called artificial superintelligence.
“I don’t think companies are prepared for the existential risk of the superintelligent systems that they are about to create and are so ambitious to march towards,” Nong said.
An OpenAI spokesperson said in a statement: "Safety is core to how we build and deploy AI. We invest heavily in frontier safety research, build strong safeguards into our systems, and rigorously test our models, both internally and with independent experts. We share our safety frameworks, evaluations, and research to help advance industry standards, and we continuously strengthen our protections to prepare for future capabilities."
The report, released Wednesday morning, comes on the heels of several boundary-pushing AI model launches. Google’s Gemini 3 model, released at the end of November, has set records for performance on a series of tests designed to measure AI systems’ capabilities.
In a statement, a Google representative said, "Our Frontier Safety Framework outlines specific protocols for identifying and mitigating severe risks from powerful frontier AI models before they manifest. As our models become more advanced, we continue to innovate on safety and governance at pace with capabilities."
On Monday, one of China’s leading AI companies, DeepSeek, released a cutting-edge model that appears to match Gemini 3’s capabilities in several domains.
Though AI capability tests are increasingly criticized as flawed, partly because of the potential for AI systems to become hyper-focused on passing a specific series of unrealistic challenges, the record-breaking scores from new models signal systems’ relative performance above competitors.
Even though DeepSeek’s new model performs at or near the frontier of AI capabilities, Wednesday’s Safety Index report says DeepSeek fails on many key safety considerations.
The report scored DeepSeek second-to-last out of the eight companies on an overall safety metric. The report’s independent panel found that, unlike all leading American companies, DeepSeek does not publish any framework outlining its safety-minded evaluations or mitigations and does not disclose a whistleblowing policy that could help identify key risks from AI models.
Frameworks outlining company safety policies and testing mechanisms are now required for companies operating in California. Those frameworks can help companies avoid severe risks, like the potential for AI products to be used in cybersecurity attacks or bioweapon design.
The report classifies DeepSeek in the lower tier of safety-minded companies. “The lower tier companies continue to fall short on basic elements such as safety frameworks, governance structures, and comprehensive risk assessment,” the report says.
Tegmark said, “Second-tier companies have been completely obsessed by catching up to the technical frontier, but now that they have, they no longer have an excuse to not also prioritize safety.”
Advances in AI capabilities have recently grabbed headlines as AI systems are increasingly applied to consumer-facing products like OpenAI’s Sora video-generation app and Google’s Nano Banana image-generation model.
However, Wednesday’s report argues that the steady increase in capabilities is severely outpacing any expansion of safety-focused efforts. “This widening gap between capability and safety leaves the sector structurally unprepared for the risks it is actively creating,” it says.
This reporter is a Tarbell Fellow, funded through the Tarbell Center for AI Journalism, a nonprofit devoted to supporting the news coverage of artificial intelligence. The Tarbell Center has received funding from the Future of Life Institute, which is a subject of this article. The Tarbell Center had no input in NBC News’ reporting.
