Sexism Detection flags whether a response contains sexist content. Output is a binary classification of whether a response is sexist or not.
Calculation method
Sexism detection is computed through a specialized process:1
Model Architecture
The detection system is built on a Small Language Model (SLM) that combines training from both open-source datasets and carefully curated internal datasets to identify various forms of sexist content.
2
Performance Validation
The model demonstrates robust detection capabilities with an 83% accuracy rate when tested against the Explainable Detection of Online Sexism dataset, a widely recognized benchmark for sexism detection.
Optimizing your AI system
Addressing Sexism in Your System
Implement guardrails: Flag responses before being served to prevent future occurrences.
Fine-tune models: Adjust model behavior to reduce sexist outputs.
Identify responses that contain sexist comments and take preventive measures to ensure fair and unbiased AI interactions.