Gemini hackers can deliver more potent attacks with a helping hand from… Gemini
1 min read
Summary
Academic researchers have found a way to create computer-generated prompt injections against Google’s Gemini AI, which have higher success rates than manually created ones.
The technique abuses the fine-tuning feature of closed-weight models, using large amounts of private or specialised data for training.
Discrete optimisation, a method for finding efficient solutions out of numerous possibilities in a computationally efficient manner, is used in the methodology, however, it is not currently known whether this approach works with other closed-weights models.
Although developers of closed-weights models tightly restrict access to underlying data and code, making them opaque to outside users, labour- and time-intensive trial and error through redundant manual effort to find prompt injections is still required.
Researchers previously discovered Logits Bias, a similar attack strategy against GPT-3.5.
Following its disclosure in December, OpenAI patched the hole.