Chinese AI startup DeepSeek is collaborating with Tsinghua University to develop new methods for reducing the computational resources needed to train artificial intelligence models, aiming to lower operational costs, Bloomberg reports.
The collaboration comes after DeepSeek made waves earlier this year with the release of its low-cost reasoning model.
The partnership resulted in a research paper detailing a novel approach to reinforcement learning, designed to make AI models more efficient. The paper, co-authored with researchers from Tsinghua University, focuses on helping AI models better align with human preferences by rewarding them for providing more accurate and understandable responses.
While reinforcement learning has demonstrated its effectiveness in accelerating AI tasks within specific domains, expanding its application to more general scenarios has proven challenging. DeepSeek’s team is tackling this hurdle with a strategy they call “self-principled critique tuning.” According to the paper, this new method outperformed existing methods and models across various benchmarks, demonstrating superior performance with fewer computing resources.
DeepSeek is naming these new models DeepSeek-GRM, short for “generalist reward modeling,” and plans to release them on an open-source basis.
The push for more efficient and self-improving AI models is a growing trend in the industry. Other leading AI developers, including Alibaba Group Holding Ltd. and OpenAI, are also actively exploring methods to enhance reasoning and self-refining capabilities in real-time.
DeepSeek’s models heavily rely on the Mixture of Experts (MoE) architecture to optimize resource utilization. Meta Platforms Inc., which recently released its latest AI models, Llama 4, also leverages MoE technology and benchmarked its new release against DeepSeek.
The latest news in your social feeds
Subscribe to our social media platforms to stay tuned