-
Couldn't load subscription status.
- Fork 13
Code to Review Customer Ticket #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Code to Review Customer Ticket #16
Conversation
WalkthroughAdds logging and error-handling to SOPSRequest.post, including early return on JSON serialization errors and clearer retry logging. Updates get_settings to include only JSON‑serializable spider settings, logging excluded keys and introducing a helper for serializability checks. Changes
Sequence Diagram(s)sequenceDiagram autonumber participant Caller participant SOPSRequest participant Logger participant HTTP as HTTP Client Caller->>SOPSRequest: post(url, payload, retries) rect rgba(230,245,255,0.6) note over SOPSRequest,Logger: New: Early JSON serialization validation/logging SOPSRequest->>SOPSRequest: json.dumps(payload) alt JSON not serializable SOPSRequest-->>Logger: error("JSON serialization failed") SOPSRequest-->>Caller: return None else JSON ok loop attempts SOPSRequest->>HTTP: POST alt Connection error SOPSRequest-->>Logger: warn("Connection error on attempt i") else Other unexpected error SOPSRequest-->>Logger: error("Unexpected error on attempt i") else Success HTTP-->>SOPSRequest: Response SOPSRequest-->>Caller: return Response end end end end sequenceDiagram autonumber participant Spider participant Model as get_settings participant Logger Spider->>Model: get_settings(spider) rect rgba(240,255,240,0.6) note over Model,Logger: New: Filter settings by JSON serializability Model->>Model: Iterate spider.settings alt value JSON-serializable Model->>Model: include in spider_settings else not serializable Model->>Model: collect key in exclusion list end opt any exclusions Model-->>Logger: warn("Excluded non-serializable settings: ...") end Model-->>Spider: spider_settings (filtered) end Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
scrapeops_scrapy/core/api.py (1)
201-203: Prefer a module-level logger over per-call setup.Initialize
logger = logging.getLogger(__name__)once at module scope and reuse it to avoid repeated lookups and ease testing.Example (outside this range):
# At top of module import logging logger = logging.getLogger(__name__)Then remove these lines from post().
- import logging - logger = logging.getLogger(__name__)scrapeops_scrapy/core/model.py (2)
219-221: Move logger to module scope.Define
logger = logging.getLogger(__name__)at the top and reuse here to avoid per-call setup.Example (outside this range):
# top of module import logging logger = logging.getLogger(__name__)Then remove these lines from get_settings().
- import logging - logger = logging.getLogger(__name__)
248-255: Minor: prefer try/except/else for clarity.Return True in an
else:block to make the happy path explicit.- try: - json.dumps(value) - return True - except (TypeError, ValueError): - return False + try: + json.dumps(value) + except (TypeError, ValueError): + return False + else: + return True
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scrapeops_scrapy/core/api.py(2 hunks)scrapeops_scrapy/core/model.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scrapeops_scrapy/core/api.py (2)
scrapeops_scrapy/normalizer/proxies.py (3)
ProxyNormalizer(11-197)unknown_proxy_scheme(92-95)get_proxy_scheme(85-89)scrapeops_scrapy/exceptions.py (1)
ScrapeOpsAPIResponseError(13-16)
🪛 Ruff (0.13.3)
scrapeops_scrapy/core/model.py
253-253: Consider moving this statement to an else block
(TRY300)
scrapeops_scrapy/core/api.py
220-226: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
236-236: Do not catch blind exception: Exception
(BLE001)
237-237: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (2)
scrapeops_scrapy/core/model.py (1)
227-247: Good safeguard: filter to JSON‑serializable settings and warn once.This prevents payload failures and guides users to the exclusion list. LGTM.
scrapeops_scrapy/core/api.py (1)
200-210: SOPSRequest.post never receives non-None files All callsites either omit the files parameter or explicitly pass files=None, so json+files combination cannot occur.
| import logging | ||
| logger = logging.getLogger(__name__) | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Differentiate serialization vs response JSON decode errors; narrow exceptions and include tracebacks.
Catching (TypeError, ValueError) here also swallows response.json() decode failures, causing premature no-retry. Handle json.JSONDecodeError separately (retry), keep early-return only for payload serialization errors, add Timeout/RequestException, and use logger.exception for tracebacks.
Apply:
@@ - import logging - logger = logging.getLogger(__name__) + import logging + import json + logger = logging.getLogger(__name__) @@ - except (TypeError, ValueError) as json_error: - # These are JSON serialization errors - don't retry, log and return immediately - logger.error( - "ScrapeOps: Unable to send monitoring data due to non-serializable settings. " - "Some of your Scrapy settings contain functions or other objects that cannot be " - "converted to JSON format. To resolve this, add the problematic setting names " - "to SCRAPEOPS_SETTINGS_EXCLUSION_LIST in your settings.py file. " - f"Error details: {json_error}" - ) - error = json_error - return None, str(error) # Don't retry on serialization errors + except json.JSONDecodeError as parse_error: + # Invalid JSON in response from server → retry + logger.warning(f"ScrapeOps: Invalid JSON response (attempt {attempt+1}): {parse_error}") + error = parse_error + continue + except (TypeError, ValueError) as serialization_error: + # Payload JSON serialization error → don't retry + logger.exception( + "ScrapeOps: Unable to send monitoring data due to non-serializable settings. " + "Add the problematic setting names to SCRAPEOPS_SETTINGS_EXCLUSION_LIST in your settings.py. " + f"Error details: {serialization_error}" + ) + return None, str(serialization_error) + except requests.exceptions.Timeout as e: + logger.warning(f"ScrapeOps: Timeout (attempt {attempt+1}): {e}") + error = e + continue except requests.exceptions.ConnectionError as e: logger.warning(f"ScrapeOps: Connection error (attempt {attempt+1}): {e}") error = e continue except ScrapeOpsAPIResponseError as e: error = e continue - except Exception as e: - logger.error(f"ScrapeOps: Unexpected error (attempt {attempt+1}): {e}") - error = e - continue + except requests.exceptions.RequestException as e: + logger.exception(f"ScrapeOps: Request error (attempt {attempt+1}): {e}") + error = e + continueAlso consider converting the error log at Line 237 to logger.exception for tracebacks (addressed above). Based on static analysis hints.
Also applies to: 218-239
🤖 Prompt for AI Agents
In scrapeops_scrapy/core/api.py around lines 201-203 and 218-239, narrow broad exception handling so payload serialization errors (TypeError, ValueError) remain an early-return, but handle response.json() failures (json.JSONDecodeError) separately and treat them as retryable (do not early-return); also add explicit catches for requests.Timeout and requests.RequestException where appropriate and treat them as retryable. Replace simple logger.error calls (notably around line 237) with logger.exception to include tracebacks, and ensure json (for JSONDecodeError) and requests exceptions are imported/used accordingly. Ensure each except block only handles the intended error class and logs context+traceback via logger.exception where debugging info is needed.
Summary by CodeRabbit