Skip to content

Conversation

@pjoshi30
Copy link
Contributor

@pjoshi30 pjoshi30 commented Oct 13, 2025

User description

…vior


PR Type

Bug fix


Description

  • Corrected toxicity score interpretation: lower scores now properly indicate higher toxicity

  • Inverted confidence calculation (1.0 - score) to accurately reflect toxicity levels

  • Updated toxicity threshold from 0.25 to 0.4 with corrected comparison logic

  • Enhanced documentation explaining inverted toxicity scoring behavior


Diagram Walkthrough

flowchart LR A["Toxicity Detection"] --> B["Score Interpretation"] B --> C["Inverted Score Calculation"] C --> D["Confidence Display"] B --> E["Threshold Comparison"] E --> F["Failure Detection"] 
Loading

File Walkthrough

Relevant files
Bug fix
reprompter.py
Invert toxicity confidence calculation for accurate display

aimon/reprompting_api/reprompter.py

  • Added documentation note explaining inverted toxicity scoring
  • Changed confidence calculation to (1.0 - score) * 100 to properly
    reflect toxicity levels
+5/-1     
utils.py
Fix toxicity threshold logic and comparison operators       

aimon/reprompting_api/utils.py

  • Updated TOXICITY_THRESHOLD from 0.25 to 0.4
  • Changed comparison operators from > to < for threshold checks
  • Added documentation clarifying lower scores indicate higher toxicity
  • Updated residual error score calculation comments
+13/-10 

@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Oct 13, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Oct 13, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Centralize toxicity score normalization logic

To improve maintainability, centralize the toxicity score inversion logic.
Create a single normalization function that converts the raw score into a
consistent internal representation, preventing the need for widespread changes
if the scoring model changes again.

Examples:

aimon/reprompting_api/utils.py [91]
 if inst.get("follow_probability", 0.0) < TOXICITY_THRESHOLD
aimon/reprompting_api/reprompter.py [99]
 confidence = (1.0 - failed_instruction.get("score", 0.0)) * 100

Solution Walkthrough:

Before:

# reprompter.py def get_toxicity_reprompt(result): ... for failed_instruction in failed_instructions: # Logic is inverted here for display confidence confidence = (1.0 - failed_instruction.get("score", 0.0)) * 100 ... # utils.py TOXICITY_THRESHOLD = 0.4 def _count_toxicity_failures(result): # Logic is inverted here for failure check return sum( 1 for inst in ... if inst.get("follow_probability", 0.0) < TOXICITY_THRESHOLD )

After:

# In a central utility location def get_normalized_toxicity_score(raw_score): """Returns a normalized score where higher is always more toxic.""" # Current model: lower score = more toxic. We invert it. return 1.0 - raw_score # reprompter.py def get_toxicity_reprompt(result): ... for failed_instruction in failed_instructions: # Use the normalized score directly for confidence normalized_score = get_normalized_toxicity_score(failed_instruction.get("score", 0.0)) confidence = normalized_score * 100 ... # utils.py NORMALIZED_TOXICITY_THRESHOLD = 0.6 # e.g., 1.0 - 0.4 def _count_toxicity_failures(result): # Logic is now consistent (higher > threshold) return sum( 1 for inst in ... if get_normalized_toxicity_score(inst.get("follow_probability", 0.0)) > NORMALIZED_TOXICITY_THRESHOLD )
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the logic for handling the inverted toxicity score is scattered, and proposes a valid architectural improvement to centralize it, which would significantly enhance code maintainability and robustness.

Medium
Possible issue
Prevent bugs from missing toxicity scores

In get_failed_toxicity_instructions, fetch follow_probability once with a safe
default of 1.0 to prevent false positives and improve code clarity.

aimon/reprompting_api/utils.py [108-134]

 def get_failed_toxicity_instructions(result) -> List[dict]: """ Extract failed toxicity instructions below the threshold. Lower scores indicate higher toxicity. Args: result: AIMon detection result containing a `toxicity` section. Returns: List[dict]: A list of dictionaries, each describing a failed toxicity instruction with: - type (str): "toxicity_failure" - source (str): "toxicity" - instruction (str): The instruction text. - score (float): The follow probability. - explanation (str): The explanation for the failure. """ failed = [] for inst in result.detect_response.toxicity.get("instructions_list", []): - if inst.get("follow_probability", 0.0) < TOXICITY_THRESHOLD: + follow_prob = inst.get("follow_probability", 1.0) + if follow_prob < TOXICITY_THRESHOLD: failed.append({ "type": "toxicity_failure", "source": "toxicity", "instruction": inst.get("instruction", ""), - "score": inst.get("follow_probability", 0.0), + "score": follow_prob, "explanation": inst.get("explanation", "") }) return failed
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential bug with missing probabilities and also improves code quality by removing a redundant call to inst.get(), making the code more robust and efficient.

Medium
Avoid false positives on missing scores

In _count_toxicity_failures, change the default for a missing follow_probability
from 0.0 to 1.0 to avoid incorrectly flagging it as a toxic failure.

aimon/reprompting_api/utils.py [77-92]

 def _count_toxicity_failures(result) -> int: """ Count the number of toxicity instructions whose follow probability is below the threshold. Lower scores indicate higher toxicity. Args: result: AIMon detection result containing a `toxicity` section. Returns: int: Number of failed toxicity instructions. """ return sum( 1 for inst in result.detect_response.toxicity.get("instructions_list", []) - if inst.get("follow_probability", 0.0) < TOXICITY_THRESHOLD + if inst.get("follow_probability", 1.0) < TOXICITY_THRESHOLD )
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential bug where a missing follow_probability is treated as a toxicity failure and proposes a valid fix to prevent these false positives.

Medium
  • Update
@pjoshi30 pjoshi30 merged commit b76340c into main Oct 13, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants