DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot

Security researchers have discovered that Chinese AI company DeepSeek’s cheaper R1 reasoning model has far fewer safeguards in place than rivals OpenAI and Meta’s Llama 3.1 model.
When tested using 50 malicious prompts, designed to elicit toxic content, DeepSeek’s model did not detect a single one.
When confronted with such content, many AI platforms reject the prompt or flag it as unsafe, but DeepSeek’s model did not do this, leading researchers to conclude that there is a trade-off between cost and safety and security measures.
The finding is another example of the growing trend of jailbreaking, which allows users to get around content filters and safety systems and has been denoted as one of the biggest security flaws in AI systems.
CEO of security firm Adversa AI, Alex Polyakov, said that DeepSeek’s restricted responses could easily be bypassed, and added that deepseek’s model goes into more depth with some instructions than he had seen any other model create.
He concluded by warning that if AI systems are not continuously red-teamed, they are already compromised.

Fast Feed