- Notifications
You must be signed in to change notification settings - Fork 5
Corrected toxicity intepretation based on the new toxicity model beha… #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label | ||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
| ||||||||||||||
User description
…vior
PR Type
Bug fix
Description
Corrected toxicity score interpretation: lower scores now properly indicate higher toxicity
Inverted confidence calculation (1.0 - score) to accurately reflect toxicity levels
Updated toxicity threshold from 0.25 to 0.4 with corrected comparison logic
Enhanced documentation explaining inverted toxicity scoring behavior
Diagram Walkthrough
File Walkthrough
reprompter.py
Invert toxicity confidence calculation for accurate displayaimon/reprompting_api/reprompter.py
(1.0 - score) * 100to properlyreflect toxicity levels
utils.py
Fix toxicity threshold logic and comparison operatorsaimon/reprompting_api/utils.py
TOXICITY_THRESHOLDfrom 0.25 to 0.4>to<for threshold checks