Latest News

AI Chatbots Vulnerable: Newly Discovered Jailbreak for AI Chatbots Poses Serious Threat

StratosAlly

June 30, 2024

In a noteworthy advancement in AI safety, Microsoft has uncovered a troubling new generative AI jailbreak technique called “Skeleton Key.” This method employs prompt injection to circumvent a chatbot’s safety guardrails, allowing malicious users to exploit the AI model’s functionalities.

Skeleton Key represents a sophisticated prompt injection attack, where a multi-turn strategy is employed to persuade an AI model to ignore its built-in safety mechanisms. As Mark Russinovich, CTO of Microsoft Azure, explained, this technique can lead to the AI violating its operators’ policies, making skewed decisions, or executing harmful instructions.

To illustrate, imagine asking a chatbot to augment its safety features rather than disable them entirely. The model then issues warnings for prohibited requests instead of outright refusals. Once the jailbreak is successful, the AI acknowledges the updated guardrails and follows any user instructions, regardless of content. This could result in the chatbot providing dangerous information, such as constructing explosives or engaging in harmful activities.

Microsoft’s research team tested this exploit on several leading AI models, including Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo, and GPT-4, among others. Although Russinovich emphasized that such jailbreaks do not grant access to user data or control over systems, they do highlight a critical vulnerability in AI safety.

Microsoft has upped the security game for Azure-managed AI models with “Prompt Shields.” This new feature can sniff out and block “Skeleton Key” attacks, ensuring a safer environment for users and the underlying systems.

AI Chatbots Vulnerable: Newly Discovered Jailbreak for AI Chatbots Poses Serious Threat

StratosAlly

more Related articles

Password Access Ending in Microsoft Authenticator this August

Blind Eagle Linked to Russian Host in Latin American Phishing Surge

File Upload Vulnerability: How Attackers Bypass Filters and Gain Access

GIFTEDCROOK Malware Evolves, Now Targeting Sensitive Files in Ukraine

community

Join the community to Stay Updated with the Latest Cybersecurity Trends