Summary

  • Academic researchers have found a way to create computer-generated prompt injections against Google’s Gemini AI, which have higher success rates than manually created ones.
  • The technique abuses the fine-tuning feature of closed-weight models, using large amounts of private or specialised data for training.
  • Discrete optimisation, a method for finding efficient solutions out of numerous possibilities in a computationally efficient manner, is used in the methodology, however, it is not currently known whether this approach works with other closed-weights models.
  • Although developers of closed-weights models tightly restrict access to underlying data and code, making them opaque to outside users, labour- and time-intensive trial and error through redundant manual effort to find prompt injections is still required.
  • Researchers previously discovered Logits Bias, a similar attack strategy against GPT-3.5.
  • Following its disclosure in December, OpenAI patched the hole.

By Dan Goodin

Original Article