[Feature] Toxicity Detection Evaluation

### Priority

P2-High

### OS type

Ubuntu

### Hardware type

Gaudi2

### Running nodes

Single Node

### Description

Toxicity detection plays a critical role in guarding the inputs and outputs of large language models (LLMs) to ensure safe, respectful, and responsible content. Given the widespread use of LLMs in applications like customer service, education, and social media, there's a significant risk that they could inadvertently produce or amplify harmful language if toxicity is not detected effectively. Many SLMs and LLMs have also been tuned as guardrails to detect toxicity but have varying taxonomies and definitions of toxicity. This `Toxicity Detection Evaluation` script intends to measure how well an LLM can detect toxicity across many popular toxic language datasets, regardless of the taxonomy it was tuned on, by employing the most commonly used metrics in toxicity classification to provide a comprehensive assessment.

**Supported toxicity datasets**
- [BeaverTails](https://huggingface.co/datasets/PKU-Alignment/BeaverTails)
- [Jigsaw Unintended Bias](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification)
- [OpenAI Moderation](https://github.com/openai/moderation-api-release/tree/main)
- [SurgeAI Toxicity](https://github.com/surge-ai/toxicity)
- [ToxicChat](https://huggingface.co/datasets/lmsys/toxic-chat)
- [ToxiGen](https://huggingface.co/datasets/toxigen/toxigen-data)
- [XSTest](https://huggingface.co/datasets/walledai/XSTest)

**Supported Metrics**
- accuracy
- auprc (area under precision recall curve)
- auroc
- f1
- fpr (false positive rate)
- precision
- recall

**Supported Models**
- [x] Hugging Face Encoder text-classification LLMs 
- [ ] Hugging Face Decoder text generation LLMS

@ashahba @mitalipo @qgao007 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Toxicity Detection Evaluation #1558

Priority

OS type

Hardware type

Running nodes

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Toxicity Detection Evaluation #1558

Description

Priority

OS type

Hardware type

Running nodes

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions