Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek
1 min read
Summary
Jailbreaking refers to the technique of exploiting flaws in large language models (LLMs) to make them produce prohibited content
Recent research by cybersecurity company, Palo Alto Networks (PAN), revealed two novel and effective jailbreaking techniques which it called Deceptive Delight and Bad Likert Judge
PAN tested these two techniques, as well as another multi-turn jailbreaking technique called Crescendo, against DeepSeek, an open-source LLM created by Chinese startup DeepSeek
The tests highlighted the ease with which these techniques can expose the flaws in LLM safeguards and provoke the models to produce malicious content
The content under evaluation included data exfiltration tools, keyloggers, instructions on creating Molotov cocktails, as well as SQL injection and lateral movement instructions