Does RAG make LLMs less safe? Bloomberg research reveals hidden dangers
1 min read
Summary
A new report by Bloomberg has revealed that Retrieval Augmented Generation (RAG) – a process designed to make enterprise AI more accurate by providing grounded content – may in fact make large language models (LLMs) unsafe.
In tests on 11 popular LLMs, including Claude-3.5-Sonnet, Llama-3-8B and GPT-4o, the researchers discovered that when RAG was implemented, models that typically refuse harmful queries in standard settings, often provided unsafe responses.
Gehrmann, Head of Responsible AI at Bloomberg, said that RAG could be providing additional context which enables the LLMs to answer malicious queries that they would have previously ignored.
Bloomberg also published a second paper, which demonstrated how existing guardrail systems fail to address domain-specific risks in financial services applications.
The research calls for enterprises to create domain-specific risk taxonomies tailored to their regulatory environments and shift from generic AI safety frameworks to those that address specific business concerns.