This data set helps researchers spot harmful stereotypes in LLMs
1 min read
Summary
SHADES is a new dataset to help detect harmful and discriminatory content in chatbot responses in multiple languages.
It has been created by international efforts to spot bias in AI models and was led by Margaret Mitchell, chief ethics scientist at AI startup Hugging Face.
The tool exposes models to a variety of prompts related to stereotypes and airs each response to generate a bias score, with the most problematic responses relating to English assertions that “nail polish is for girls” and the Chinese view that “be a strong man”.
The models often doubled down on the problematic responses, generating content that compounded on the initial bias.
Researchers hope the tool will enable diagnostic testing of future AI models.