Measuring Bias in Natural Language Models – Center for Data Innovation

Researchers from MIT, Facebook, Intel, and McGill University in Canada have released Stereoset, a dataset of 17,000 sentences that researchers can use to measure a natural language processing model’s bias towards stereotypes. The dataset tasks models to choose from options to fill in a blank for a sentence or to provide additional information after receiving an input sentence. The options include stereotypes, anti-stereotypes, and unrelated information. To score well, a model should prefer options that provide relevant info but not prefer options conveying a stereotype over those that do not.