Skip to content

[Feature] Toxicity Detection Evaluation #1558

@daniel-de-leon-user293

Description

@daniel-de-leon-user293

Priority

P2-High

OS type

Ubuntu

Hardware type

Gaudi2

Running nodes

Single Node

Description

Toxicity detection plays a critical role in guarding the inputs and outputs of large language models (LLMs) to ensure safe, respectful, and responsible content. Given the widespread use of LLMs in applications like customer service, education, and social media, there's a significant risk that they could inadvertently produce or amplify harmful language if toxicity is not detected effectively. Many SLMs and LLMs have also been tuned as guardrails to detect toxicity but have varying taxonomies and definitions of toxicity. This Toxicity Detection Evaluation script intends to measure how well an LLM can detect toxicity across many popular toxic language datasets, regardless of the taxonomy it was tuned on, by employing the most commonly used metrics in toxicity classification to provide a comprehensive assessment.

Supported toxicity datasets

Supported Metrics

  • accuracy
  • auprc (area under precision recall curve)
  • auroc
  • f1
  • fpr (false positive rate)
  • precision
  • recall

Supported Models

  • Hugging Face Encoder text-classification LLMs
  • Hugging Face Decoder text generation LLMS

@ashahba @mitalipo @qgao007

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions