Sexism Detection flags whether a response contains sexist content. Output is a binary classification of whether a response is sexist or not.

Calculation Method

Sexism detection is computed through a specialized process:

1

Model Architecture

The detection system is built on a Small Language Model (SLM) that combines training from both open-source datasets and carefully curated internal datasets to identify various forms of sexist content.

2

Performance Validation

The model demonstrates robust detection capabilities with an 83% accuracy rate when tested against the Explainable Detection of Online Sexism dataset, a widely recognized benchmark for sexism detection.

Optimizing Your AI System

Addressing Sexism in Your System

When sexist content is detected in your system, consider these approaches:

Implement guardrails: Flag responses before being served to prevent future occurrences.

Fine-tune models: Adjust model behavior to reduce sexist outputs.

Identify responses that contain sexist comments and take preventive measures to ensure fair and unbiased AI interactions.