Corrected toxicity intepretation based on the new toxicity model beha… #72

pjoshi30 · 2025-10-13T22:09:26Z

User description

…vior

PR Type

Bug fix

Description

Corrected toxicity score interpretation: lower scores now properly indicate higher toxicity
Inverted confidence calculation (1.0 - score) to accurately reflect toxicity levels
Updated toxicity threshold from 0.25 to 0.4 with corrected comparison logic
Enhanced documentation explaining inverted toxicity scoring behavior

Diagram Walkthrough

flowchart LR A["Toxicity Detection"] --> B["Score Interpretation"] B --> C["Inverted Score Calculation"] C --> D["Confidence Display"] B --> E["Threshold Comparison"] E --> F["Failure Detection"]

File Walkthrough

Relevant files

Bug fix

reprompter.py `Invert toxicity confidence calculation for accurate display` aimon/reprompting_api/reprompter.py Added documentation note explaining inverted toxicity scoring Changed confidence calculation to `(1.0 - score) * 100` to properly reflect toxicity levels	+5/-1
utils.py `Fix toxicity threshold logic and comparison operators` aimon/reprompting_api/utils.py Updated `TOXICITY_THRESHOLD` from 0.25 to 0.4 Changed comparison operators from `>` to `<` for threshold checks Added documentation clarifying lower scores indicate higher toxicity Updated residual error score calculation comments	+13/-10

…vior

qodo-merge-pro · 2025-10-13T22:09:52Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
⚪	No custom compliance provided Follow the guide to enable custom compliance check.
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-merge-pro · 2025-10-13T22:10:58Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Centralize toxicity score normalization logic To improve maintainability, centralize the toxicity score inversion logic. Create a single normalization function that converts the raw score into a consistent internal representation, preventing the need for widespread changes if the scoring model changes again. Examples: aimon/reprompting_api/utils.py [91] if inst.get("follow_probability", 0.0) < TOXICITY_THRESHOLD aimon/reprompting_api/reprompter.py [99] confidence = (1.0 - failed_instruction.get("score", 0.0)) * 100 Solution Walkthrough: Before: # reprompter.py def get_toxicity_reprompt(result): ... for failed_instruction in failed_instructions: # Logic is inverted here for display confidence confidence = (1.0 - failed_instruction.get("score", 0.0)) * 100 ... # utils.py TOXICITY_THRESHOLD = 0.4 def _count_toxicity_failures(result): # Logic is inverted here for failure check return sum( 1 for inst in ... if inst.get("follow_probability", 0.0) < TOXICITY_THRESHOLD ) After: # In a central utility location def get_normalized_toxicity_score(raw_score): """Returns a normalized score where higher is always more toxic.""" # Current model: lower score = more toxic. We invert it. return 1.0 - raw_score # reprompter.py def get_toxicity_reprompt(result): ... for failed_instruction in failed_instructions: # Use the normalized score directly for confidence normalized_score = get_normalized_toxicity_score(failed_instruction.get("score", 0.0)) confidence = normalized_score * 100 ... # utils.py NORMALIZED_TOXICITY_THRESHOLD = 0.6 # e.g., 1.0 - 0.4 def _count_toxicity_failures(result): # Logic is now consistent (higher > threshold) return sum( 1 for inst in ... if get_normalized_toxicity_score(inst.get("follow_probability", 0.0)) > NORMALIZED_TOXICITY_THRESHOLD ) Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies that the logic for handling the inverted toxicity score is scattered, and proposes a valid architectural improvement to centralize it, which would significantly enhance code maintainability and robustness.	Medium
Possible issue	Prevent bugs from missing toxicity scores In `get_failed_toxicity_instructions`, fetch `follow_probability` once with a safe default of `1.0` to prevent false positives and improve code clarity. aimon/reprompting_api/utils.py [108-134] def get_failed_toxicity_instructions(result) -> List[dict]: """ Extract failed toxicity instructions below the threshold. Lower scores indicate higher toxicity. Args: result: AIMon detection result containing a `toxicity` section. Returns: List[dict]: A list of dictionaries, each describing a failed toxicity instruction with: - type (str): "toxicity_failure" - source (str): "toxicity" - instruction (str): The instruction text. - score (float): The follow probability. - explanation (str): The explanation for the failure. """ failed = [] for inst in result.detect_response.toxicity.get("instructions_list", []): - if inst.get("follow_probability", 0.0) < TOXICITY_THRESHOLD: + follow_prob = inst.get("follow_probability", 1.0) + if follow_prob < TOXICITY_THRESHOLD: failed.append({ "type": "toxicity_failure", "source": "toxicity", "instruction": inst.get("instruction", ""), - "score": inst.get("follow_probability", 0.0), + "score": follow_prob, "explanation": inst.get("explanation", "") }) return failed Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a potential bug with missing probabilities and also improves code quality by removing a redundant call to `inst.get()`, making the code more robust and efficient.	Medium
Possible issue	Avoid false positives on missing scores In `_count_toxicity_failures`, change the default for a missing `follow_probability` from `0.0` to `1.0` to avoid incorrectly flagging it as a toxic failure. aimon/reprompting_api/utils.py [77-92] def _count_toxicity_failures(result) -> int: """ Count the number of toxicity instructions whose follow probability is below the threshold. Lower scores indicate higher toxicity. Args: result: AIMon detection result containing a `toxicity` section. Returns: int: Number of failed toxicity instructions. """ return sum( 1 for inst in result.detect_response.toxicity.get("instructions_list", []) - if inst.get("follow_probability", 0.0) < TOXICITY_THRESHOLD + if inst.get("follow_probability", 1.0) < TOXICITY_THRESHOLD ) Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential bug where a missing `follow_probability` is treated as a toxicity failure and proposes a valid fix to prevent these false positives.	Medium
Update

Corrected toxicity intepretation based on the new toxicity model beha…

b86da43

…vior

pjoshi30 requested review from alexlyzhov and devvratbhardwaj October 13, 2025 22:09

qodo-merge-pro bot added the Review effort 2/5 label Oct 13, 2025

Bumping version of the package

e3ecb99

Preetam Joshi added 4 commits October 13, 2025 15:23

Corrected doc string for ttoxicity

d0740b5

Added unit tests for reprompting utils

07dd112

Updated tests and fixed a few bugs

b77ab54

Fixing tests

1938353

alexlyzhov approved these changes Oct 13, 2025

View reviewed changes

pjoshi30 merged commit b76340c into main Oct 13, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Corrected toxicity intepretation based on the new toxicity model beha… #72

Corrected toxicity intepretation based on the new toxicity model beha… #72

Uh oh!

pjoshi30 commented Oct 13, 2025 •

edited by qodo-merge-pro bot

Loading

qodo-merge-pro bot commented Oct 13, 2025 •

edited

Loading

qodo-merge-pro bot commented Oct 13, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Labels

3 participants

Corrected toxicity intepretation based on the new toxicity model beha… #72

Corrected toxicity intepretation based on the new toxicity model beha… #72

Uh oh!

Conversation

pjoshi30 commented Oct 13, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

qodo-merge-pro bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

qodo-merge-pro bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Labels

3 participants

pjoshi30 commented Oct 13, 2025 •

edited by qodo-merge-pro bot

Loading

qodo-merge-pro bot commented Oct 13, 2025 •

edited

Loading

qodo-merge-pro bot commented Oct 13, 2025 •

edited

Loading