Artificial Intelligence & Machine Learning , Next-Generation Technologies & Secure Development

DeepSeek AI Models Vulnerable to Jailbreaking

Data Exposure, Harmful Content and Security Risks Undermine DeepSeek AI Models
DeepSeek AI Models Vulnerable to Jailbreaking
Image: Shutterstock

Security researchers uncovered multiple flaws in large language models developed by Chinese artificial intelligence company DeepSeek, including in its flagship R1 reasoning application.

See Also: A CISO’s Perspective on Scaling GenAI Securely

Research from Palo Alto's Unit 42, Kela and Enkrypt AI identified susceptibility to jailbreaking and hallucinations in the Chinese company's recently unveiled R1 and V3 models. Cybersecurity firm Wiz disclosed Wednesday that DeepSeek exposed a real-time data processing database to the open internet, allowing security researchers to view chat history and backend data (see: Breach Roundup: DeepSeek Leaked Sensitive Data).

The security concerns come as Microsoft and OpenAI investigate whether DeepSeek developed the R1 model based on data scraped from an OpenAI application programming interface (see: Accusations Mount Against DeepSeek Over AI Plagiarism).

Flaws identified by the security firms include:

  • Jailbreaking: The V3 and R1 models can be jailbroken using techniques called "Deceptive Delight," "Bad Likert Judge" and "Crescendo," Palo Alto researchers said. Jailbreaking is tricking a model into carrying out tasks restricted by AI developers.

    Deceptive Delight involves embedding restricted topics among benign ones, such as asking a LLM to connect obviously positive topics such as "reuniting with loved ones" and "creation of a Molotov cocktail." Bad Likert Judge exploits LLM's ability to evaluate and generate content based on a psychometric scale. Crescendo involves gradually steering LLMs to do prohibited tasks after starting conversations with harmless prompts.

    "Our research findings show that these jailbreak methods can elicit explicit guidance for malicious activities," Palo Alto researchers said. "These activities include keylogger creation, data exfiltration and even instructions for incendiary devices, demonstrating the tangible security risks posed by this emerging class of attack."

  • Generating harmful content: The research by Enkrypt AI found that R1 is susceptible to LLM flaws categorized as "highly vulnerable" under several existing AI safety frameworks.

    These include prompting the model to generate content that can pose chemical and biological threats, generating racially discriminative outcomes, prompt injection flaws and data extraction from prompts.

    When Enkrypt AI researchers prompted R1 about the biochemical interaction between sulfur mustard and human DNA components, the model generated extensive information on lethal chemical reactions.

    "While it may be suitable for narrowly scoped applications, the model shows considerable vulnerabilities in operational and security risk areas," Enkrypt AI researchers said.

  • Hallucinations: When Kela researchers prompted R1 to generate information on OpenAI employees, the model generated fictitious details including emails, phone numbers and salaries.

    "DeepSeek demonstrates strong performance and efficiency, positioning it as a potential challenger to major tech giants. However, it falls behind in terms of security, privacy and safety," Kela researchers said.

Security experts also warned of broader risks arising from the potential use of the open-source AI by nation-states and other hackers.

"It's important to remember that open-source AI means something foundationally different than open-source code," said Jake Williams, faculty at IANS Research and VP of R&D at Hunter Strategy. "With open-source code, we can audit the code and identify vulnerabilities. With open-source AI, we can do no such thing."

Roei Sherman, field CTO at Mitiga, warned that organizations should act promptly to secure their AI environments from potential R1 risks.

These include continuous monitoring of their cloud environments, ramping up AI-driven detection and response, and undertaking regular adversarial simulations.

"The release of DeepSeek highlights a troubling trend: adversaries are rapidly integrating AI into their attack methodologies," Sherman said. "Models like DeepSeek can amplify adversary capabilities through automated social engineering, advanced reconnaissance, code and exploit development.


About the Author

Akshaya Asokan

Akshaya Asokan

Senior Correspondent, ISMG

Asokan is a U.K.-based senior correspondent for Information Security Media Group's global news desk. She previously worked with IDG and other publications, reporting on developments in technology, minority rights and education.




Around the Network

Our website uses cookies. Cookies enable us to provide the best experience possible and help us understand how visitors use our website. By browsing govinfosecurity.com, you agree to our use of cookies.