What is it about?
Data Augmentation for Toxicity Detection: We introduced a new approach to improve toxic text data by instruction fine-tuning the FLAN-T5 model. Using Paraphrase Adversaries from Word Scrambling (PAWS) and Reinforcement Learning with Human Feedback (RLHF), our method creates diverse, semantically similar paraphrases to enrich the dataset.
Featured Image
Why is it important?
The importance of this approach lies in its ability to improve toxicity detection systems by creating high-quality, varied training data. Here’s why that matters: - Augmented Sample Availability: Most toxic datasets lack enough samples, which can introduce bias and hinder accurate detection. By augmenting toxic samples, we enrich the dataset, reducing bias and improving model performance across diverse toxic expressions. - Enhanced Detection Accuracy: Using PAWS and RLHF, we create more nuanced toxic text variations, allowing models to better recognize subtle toxic language. This makes the detection system more accurate and robust in real-world scenarios where toxic expressions can be nuanced or rephrased. - Increased Dataset Diversity: Toxic content varies widely in phrasing and expression. By generating semantically equivalent paraphrases, the model is exposed to a broader range of toxic language styles, helping it generalize better to new and unseen toxic inputs. - Human-Feedback Optimization: RLHF (Reinforcement Learning with Human Feedback) refines the model by ensuring that the generated toxic variations meet a controlled toxicity threshold. This control improves both the quality and reliability of the augmented data, making it safer and more ethical for real-world applications.
Read the Original
This page is a summary of: AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection, ACM Transactions on the Web, October 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3700791.
You can read the full text:
Contributors
The following have contributed to this page