I Built a Tool to Hack AI Models — Here’s What It Uncovered
1 min read
Summary
AI and machine learning models need adequate security testing in order to avoid potentially harmful outcomes for users, but many developers do not know how to properly test these tools.
Lyndsey Scott built a model security testing suite to address these issues, evaluating model leakage, detecting jailbreaks, and auditing content filters.
The suite was built over a weekend and exposed vulnerabilities in three production systems within a week of deployment.
Scott explains that the problem with common testing methods is that they generally check only for the possibility of a security issue rather than actually assessing the potential harm caused by a model and putting it through a rigorous evaluation to stress-test it.
Her testing suite attempts to simulate real-world attackers in order to identify these issues and help developers to mitigate them.
Scott believes that building these capacities is increasingly important as ML and AI models are increasingly being used in critical areas such as finance and healthcare.