testing some workflow optimizations #1

LukeHagar · 2025-11-14T20:34:37Z

Summary by CodeRabbit

New Features
- Added URL result caching with configurable TTL to improve performance
- Enhanced HTTP retry logic with exponential backoff and redirect handling
Improvements
- Improved PR comment reliability with better error handling and token detection
- Better input validation for concurrency and timeout parameters
- Simplified entrypoint script with unified argument handling
Documentation
- Clarified PR comment behavior and required GitHub token permissions
- Updated action configuration with clearer descriptions of caching and PR commenting options

cursor · 2025-11-14T20:34:40Z

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on December 6.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

coderabbitai · 2025-11-14T20:34:46Z

Warning

Rate limit exceeded

@LukeHagar has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 0 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between e1295a6 and 14b02cb.

📒 Files selected for processing (5)

.github/workflows/example-slinky.yml (1 hunks)
action.yml (2 hunks)
entrypoint.sh (1 hunks)
internal/web/cache.go (1 hunks)
internal/web/checker.go (5 hunks)

Walkthrough

Introduces URL caching with TTL persistence, HTTP retry and redirect handling, validates numeric inputs, improves PR comment robustness with graceful error handling, optimizes file reading, and refactors the entrypoint script for maintainability.

Changes

Cohort / File(s)	Summary
Configuration & Documentation `.github/workflows/example-slinky.yml`, `action.yml`, `README.md`	Add GITHUB_TOKEN environment variable to workflow example; update comment_pr input description with default behavior clarification; add documentation notes on token usage and PR comment permissions.
Container Runtime `Dockerfile`	Remove curl from Alpine package install, retain jq and ca-certificates with explanatory comments, introduce ENTRYPOINT directive for /entrypoint.sh.
Caching Infrastructure `internal/web/cache.go`	New file: implement URLCache with TTL-based expiration, atomic file operations for persistence (Load/Save), and graceful handling of missing or corrupted cache data.
HTTP Transport & Retry Logic `internal/web/http.go`	New file: implement fetchWithMethod with exponential backoff retry, redirect following with loop prevention, status classification (OK/retryable), and error categorization for DNS/connection/timeout cases.
Web Configuration & Type Extensions `internal/web/types.go`, `internal/web/checker.go`	Add Cache field to Config struct; integrate cache lookups in checker, add atomic counters for progress tracking, emit CacheHit flag in results, optimize HTTP transport connection limits.
File System Optimization `internal/fsurls/fsurls.go`	Replace buffered line-by-line reading with LimitReader + ReadAll; treat read errors as non-fatal file skips; adjust relative path calculation to use os.Getwd; remove bufio import.
Main Check Logic & PR Integration `cmd/check.go`	Add input validation/clamping for maxConcurrency and timeoutSeconds; wire URL cache via SLINKY_CACHE_PATH and SLINKY_CACHE_TTL_HOURS environment variables; introduce INPUT_COMMENT_PR flag with default-true behavior; harden upsertPRComments with robust request handling, defensive JSON parsing, and non-fatal error logging.
Entrypoint Script Consolidation `entrypoint.sh`	Consolidate multi-step CLI argument construction into single ARGS string; introduce explicit COMMIT_SHA extraction via jq; unify boolean flag handling; simplify target parsing and debug output; streamline GitHub output exposure and exit handling.

Sequence Diagrams

sequenceDiagram actor User participant slinky as Slinky Check participant cache as URL Cache participant http as HTTP Fetcher participant github as GitHub API User->>slinky: Start check with cache enabled slinky->>cache: Load() from disk (if path set) cache-->>slinky: Cache populated (non-expired entries) loop For each URL slinky->>cache: Get(url) alt Cache hit & not expired cache-->>slinky: Return cached result else Cache miss or expired slinky->>http: fetchWithMethod(url) http->>http: Attempt request with retry/backoff alt Retryable error http->>http: Exponential backoff → retry else Success or final error http-->>slinky: Return status + error end slinky->>cache: Set(url, ok, status, errMsg) end end slinky->>cache: Save() to disk (if path set) cache-->>slinky: Cache persisted alt INPUT_COMMENT_PR enabled & report exists slinky->>github: upsertPRComments(chunks) github-->>slinky: Post/update comment (with error recovery) end slinky-->>User: Return results

sequenceDiagram participant slinky as cmd/check.go participant github as GitHub API alt Report exists and COMMENT_PR enabled slinky->>slinky: Read report markdown (warn on error, continue) slinky->>slinky: Split into chunks slinky->>github: List existing PR comments alt Request/parse succeeds github-->>slinky: Comment list slinky->>slinky: Find previous slinky comment alt Found slinky->>github: Delete old comment end else Request/parse fails slinky->>slinky: Log debug, continue (non-fatal) end loop For each chunk slinky->>github: Create/post comment alt Success slinky->>slinky: Log successful post else Error slinky->>slinky: Log error, continue (non-fatal) end end end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

internal/web/cache.go: Review TTL logic, atomic file operations (temp file + rename), concurrent access patterns, and non-fatal load/corruption handling.
internal/web/http.go: Verify exponential backoff implementation, redirect loop prevention, status code classification correctness, and timeout/DNS error handling.
cmd/check.go: Examine cache initialization/wiring, INPUT_COMMENT_PR flag logic, upsertPRComments robustness (request creation, JSON parsing, non-fatal error paths).
entrypoint.sh: Validate COMMIT_SHA extraction and ARGS assembly logic flow; confirm target parsing and GitHub output writes.
internal/fsurls/fsurls.go: Confirm file read error handling as non-fatal and impact of relative path calculation change.

Poem

🐰 A cache hops in with TTL so keen,
Retries and redirects keep links pristine—
PR comments now gentle, they don't crash and burn,
Just skip the bad files and happily return! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is vague and generic, using non-descriptive language ('testing', 'some', 'optimizations') that fails to convey the substantial changes across multiple files including caching, PR commenting, error handling, and workflow configuration.	Provide a more specific title that highlights the main changes, such as 'Add URL caching and improve PR comment handling' or 'Implement caching layer and enhance workflow robustness'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Enhance the description for the 'comment_pr' input in action.yml to specify that it defaults to true when GITHUB_TOKEN is present. In check.go, update the logic to explicitly disable PR commenting only when the input is set to "false", improving clarity on the commenting behavior during PR checks.

…TOKEN usage This commit modifies action.yml to remove the direct reference to GITHUB_TOKEN and updates the README.md to specify that it should be accessed via secrets.GITHUB_TOKEN. Additionally, the example workflow is updated to include the GITHUB_TOKEN in the environment variables, ensuring clarity on its necessity for PR comment functionality.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/web/checker.go (1)
58-64: Deadlock risk when context is cancelled

If ctx.Done() fires inside the worker loop, the worker returns without ever sending on done, but the main goroutine still waits for concurrency done signals. That can deadlock CheckURLs under cancellation.

You can fix this by guaranteeing one done send per worker via defer:
-worker := func() { -for j := range jobs { +worker := func() { +defer func() { done <- struct{}{} }() +for j := range jobs {	select {	case <-ctx.Done(): -return +return	default:	}	... -} -done <- struct{}{} -} +} +}
This preserves the current behavior while making cancellation safe.

Also applies to: 135-135

🧹 Nitpick comments (7)

internal/fsurls/fsurls.go (2)
249-256: Consider adding debug logging for read errors.

The file reading optimization using io.LimitReader and io.ReadAll is sound and provides defense-in-depth even though file size is pre-checked. However, silently skipping files on read errors could make troubleshooting difficult.

Consider adding debug logging when files are skipped due to read errors:
	contentBytes, readErr := io.ReadAll(limitedReader)	if readErr != nil {	// Non-critical error: skip file and continue +if isDebugEnv() { +fmt.Printf("::debug:: Skipping file %s due to read error: %v\n", path, readErr) +}	return nil	}
422-429: Same optimization, consider debug logging here too.

This segment mirrors the file reading optimization in CollectURLsWithIgnoreConfig. The consistent implementation is good, but the same debug logging suggestion applies.

Consider adding debug logging for read errors:
	contentBytes, readErr := io.ReadAll(limitedReader)	if readErr != nil {	// Non-critical error: skip file and continue +if isDebugEnv() { +fmt.Printf("::debug:: Skipping file %s due to read error: %v\n", path, readErr) +}	return nil	}
README.md (1)

19-29: Docs around PR comments and GITHUB_TOKEN are almost perfectly aligned

The workflow example + note clearly explain:

pull-requests: write is only needed when comment-pr is enabled.

GITHUB_TOKEN is only required for PR comment functionality.

One small polish opportunity: in the Inputs list (comment-pr bullet at Line 42), you currently say “Default: true” without mentioning the “when GITHUB_TOKEN is present” nuance that you added to action.yml. Mirroring that phrasing here would make the behavior 100% consistent across docs.

Also applies to: 31-31
entrypoint.sh (1)
4-14: Entrypoint refactor is solid; consider two robustness tweaks

Overall, this is a nice cleanup: PR head SHA handling is safer, ARGS construction is centralized, and outputs/summary handling are clearer.

Two things to consider tightening:
set -e vs. EXIT_CODE handling

With set -e in effect, any non-zero exit from slinky $ARGS will terminate the script immediately, meaning EXIT_CODE=$?, the output writes, and the summary append won’t run in those cases. If your intention is:

to always write JSON/MD outputs and step summaries even when --fail-on-failures=true causes a non-zero exit, it would be safer to explicitly guard the call, e.g.:
set +e slinky $ARGS EXIT_CODE=$? set -e
or:
slinky $ARGS || EXIT_CODE=$?
(and initialize EXIT_CODE=0 before the call).
ARGS as a single string vs. argument array

Building ARGS="check ..." and later doing slinky $ARGS relies on word-splitting, which is fine for your current numeric/path inputs but will misbehave if any input or target ever legitimately contains spaces. A more robust pattern is to build an argument vector:
ARGS="check" # or, more robustly: # set -- check # [ -n "${INPUT_CONCURRENCY:-}" ] && set -- "$@" --concurrency "$INPUT_CONCURRENCY" # ... # slinky "$@"
That keeps quoting correct while still allowing globs like **/*.md to be passed as intended.
Neither point is a blocker, but addressing them would make the entrypoint more future‑proof.

Please double-check how GitHub runs this container (shell implementation on the runner image) to confirm set -e behavior and argument splitting match your expectations.

Also applies to: 16-40, 43-47, 51-52, 60-60
internal/web/checker.go (1)

65-81: Verify URLCache concurrency safety and minor cache-path nits

The cache-first path is a nice optimization and the CacheHit flag in Result is helpful. A couple of points to double-check:

Multiple workers call cfg.Cache.Get and cfg.Cache.Set concurrently; please ensure URLCache’s implementation (in internal/web/cache.go) protects its internal entries map with appropriate locking or otherwise guarantees concurrent safety. If it’s just a plain map, this will race under load.

For reconstructing cached errors, fmt.Errorf("%s", cached.ErrMsg) could be simplified to errors.New(cached.ErrMsg) if you don’t need formatting/stack context, but that’s purely cosmetic.

Everything else in this cache integration looks consistent with the rest of the checker.

Also applies to: 84-104, 118-121
cmd/check.go (2)
176-199: Cache wiring is reasonable; TTL semantics worth documenting

The URL cache initialization via SLINKY_CACHE_PATH and optional SLINKY_CACHE_TTL_HOURS looks good, with best-effort Load/Save and debug-only logging on failure.

One minor suggestion: since SLINKY_CACHE_TTL_HOURS is treated as hours and values ≤0 fall back to 24h inside NewURLCache, it might be worth documenting that:

0 or negative TTL doesn’t disable caching; it reverts to the default.

Fractional hours are truncated when converted to int.

Functionally this is fine, but explicit docs will avoid surprises for users tuning cache behavior.

Also applies to: 201-205

499-587: upsertPRComments robustness: good overall, but close DELETE responses

The refactor here nicely hardens the PR comment flow: clear errors on list failure, non-fatal handling of JSON parse issues, and defensive behavior when marshalling/posting chunks.

One small improvement: http.DefaultClient.Do(dReq) for DELETE ignores both the error and the response body, which can leak idle connections:
-_, _ = http.DefaultClient.Do(dReq) // Non-critical: ignore delete errors +if res, err := http.DefaultClient.Do(dReq); err == nil && res != nil { +res.Body.Close() +} +// Non-critical: we still ignore delete errors
This keeps the non-critical semantics while ensuring connections are properly returned to the pool.

Also applies to: 537-544

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fb0c9b and e1295a6.

📒 Files selected for processing (11)

.github/workflows/example-slinky.yml (1 hunks)
Dockerfile (1 hunks)
README.md (1 hunks)
action.yml (1 hunks)
cmd/check.go (4 hunks)
entrypoint.sh (1 hunks)
internal/fsurls/fsurls.go (2 hunks)
internal/web/cache.go (1 hunks)
internal/web/checker.go (6 hunks)
internal/web/http.go (1 hunks)
internal/web/types.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

internal/web/checker.go (1)

internal/web/types.go (2)

Result (5-16)

Stats (18-21)

internal/web/types.go (1)

internal/web/cache.go (1)

URLCache (20-24)

cmd/check.go (2)

internal/web/cache.go (2)

URLCache (20-24)

NewURLCache (27-37)

internal/web/types.go (1)

Config (23-30)

🔇 Additional comments (8)

.github/workflows/example-slinky.yml (1)

20-21: GITHUB_TOKEN wiring for the step looks correct

Providing GITHUB_TOKEN via env here matches the permissions block (pull-requests: write) and is what the action expects for PR comments. No changes needed.

action.yml (1)

35-35: Improved comment_pr description is clear and consistent

The updated description explicitly tying the default to GITHUB_TOKEN presence is helpful and avoids ambiguity about when PR comments are actually enabled.

Please confirm cmd/check.go’s defaulting logic for INPUT_COMMENT_PR still matches this description (default-on only when a token is available).

Dockerfile (1)

10-16: Runtime image and ENTRYPOINT wiring look correct

Adding jq and ca-certificates matches the needs of entrypoint.sh and HTTPS requests, and the ENTRYPOINT to /entrypoint.sh is consistent with the new startup flow. No changes needed.

If you haven’t already, it’s worth building and running the image locally (docker run with a minimal GITHUB_EVENT_PATH) to confirm jq is reachable and the entrypoint runs as expected on Alpine’s /bin/sh.

internal/web/types.go (1)

23-30: Config extension with Cache pointer looks appropriate

Adding Cache *URLCache to Config is a clean way to thread caching through the web layer. Just make sure callers consistently nil-check cfg.Cache before calling its methods so that configurations without caching remain supported.

Please scan internal/web/checker.go and cmd/check.go to confirm all uses of cfg.Cache are guarded against nil.

internal/web/http.go (1)

6-12: HTTP retry/redirect implementation looks solid for link-checking use

The new fetchWithMethod + followRedirects flow is well-structured:

Exponential backoff with bounded retries (maxRetries) and context-aware cancellation is correctly implemented.

Redirect handling closes bodies along the way, caps depth via maxRedirects, and copes with missing Location headers and simple redirect loops.

Status classification via isOKStatus (200–299, 401, 403) and isRetryableStatus (408, 429, 5xx) matches the documented behavior and is appropriate for a link checker.

Error handling for DNS, connection refused, and timeouts uses clear simpleError messages while still exposing status codes.

I don’t see correctness issues here; this should significantly improve robustness versus transient HTTP failures and redirects.

Once wired into your checker, it’d be good to run against a sample repo containing redirects, 401/403 endpoints, and flaky hosts to confirm the observed behavior (retries, final statuses, and marking links OK/broken) matches your expectations.

Also applies to: 15-16, 18-106, 108-161, 163-182

internal/web/checker.go (1)

3-11: HTTP client tuning looks solid

Dynamic MaxIdleConns with a minimum and aligning per-host limits with configured concurrency is a sensible way to improve throughput across many domains without over-opening connections. No issues from my side here.

Also applies to: 18-23, 29-34

cmd/check.go (2)

161-171: Numeric flag clamping looks correct

Clamping maxConcurrency to [1, 100] and timeoutSeconds to [1, 300] gives sane bounds and prevents obviously bad inputs from the CLI without surprising behavior. No issues here.

319-327: PR comment gating and error handling look robust

The new INPUT_COMMENT_PR toggle (default-on, only disabled by "false") and the guard ghOK && commentPR && finalMDPath != "" make the PR commenting behavior predictable and easy to opt out of.

Gracefully handling markdown read failures and treating upsertPRComments errors as warnings (without failing the run) is also a good balance between observability and resilience.

Also applies to: 328-345

internal/web/cache.go

internal/web/checker.go

…rkflow This commit updates the input names in action.yml to use hyphenated formats for better consistency and clarity. Additionally, the example workflow is modified to demonstrate the new input names and to showcase the functionality of disabling blocking failures and enabling PR comments.

…handling in CheckURLs function This commit introduces a mutex to the URLCache struct to ensure thread-safe access to cache entries during load, get, set, and clear operations. Additionally, it refines the job handling logic in the CheckURLs function by using atomic counters for processed and pending jobs, improving concurrency management.

…robustness This commit modifies the argument construction in entrypoint.sh to use 'set --' for better handling of optional flags and targets. This change ensures that arguments are properly quoted, accommodating spaces and enhancing overall script reliability. Additionally, debug output is updated to display the constructed command more clearly.

github-actions · 2025-11-14T21:00:59Z

Slinky Test Report

Last Run: 2025-11-14 15:00:48 CST (Duration: 10.008s)

Pass: 17
Fail: 26
Total: 43
Files Scanned: 72

Failures by URL

404 GET http://sub.acme.example/logout — host not found
- internal/fsurls/fsurls_test.go
404 GET https://abcdefg1234567890.example.com — host not found
- testdata/test nesting/test more nesting/test1.md
GET https://acme.com/logout — Get "https://acme.com/logout": remote error: tls: handshake failure
- internal/fsurls/fsurls_test.go
404 GET https://docs.acme.dev — host not found
- internal/fsurls/fsurls_test.go
404 GET https://docs.something.org — host not found
- testdata/test nesting/test2.txt
- testdata/test3.js
- testdata/test4.xml
- testdata/test7.yaml
- testdata/test8.csv
404 GET https://example.com/404notfound —
- testdata/test nesting/test more nesting/test1.md
404 GET https://example.com/path —
- internal/fsurls/lang_files_test.go
- testdata/test18.md
404 GET https://example.com/q?(x) —
- internal/fsurls/lang_files_test.go
- testdata/test18.md
404 GET https://example.com/this/path/does/not/exist —
- internal/fsurls/fsurls_test.go
404 GET https://example.madeuptld — host not found
- testdata/test nesting/test more nesting/test1.md
- testdata/test13.ps1
- testdata/test6.json
- testdata/test7.yaml
- testdata/test9.ini
404 GET https://foo.bar.baz.nonexistent-tld — host not found
- testdata/test nesting/test more nesting/test1.md
404 GET https://github.com/example/repo —
- testdata/test nesting/test2.txt
- testdata/test10.log
- testdata/test12.php
- testdata/test3.js
- testdata/test4.xml
- testdata/test8.csv
404 GET https://github.com/owner/repo/blob/ —
- cmd/check.go
- cmd/run.go
- internal/report/markdown.go
503 GET https://ignore.me/in/fenced/code — connection refused
- testdata/test18.md
503 GET https://ignore.me/inside/code — connection refused
- testdata/test18.md
404 GET https://sailpoint.api.identitynow.com/beta — host not found
- internal/fsurls/fsurls_test.go
404 GET https://sailpoint.api.identitynow.com/v2024 — host not found
- internal/fsurls/fsurls_test.go
- internal/fsurls/lang_files_test.go
404 GET https://sailpoint.api.identitynow.com/v2025 — host not found
- internal/fsurls/fsurls_test.go
404 GET https://sailpoint.api.identitynow.com/v2026 — host not found
- internal/fsurls/fsurls_test.go
404 GET https://sailpoint.api.identitynow.com/v3 — host not found
- internal/fsurls/fsurls_test.go
404 GET https://this-domain-does-not-exist-123456789.com — host not found
- internal/fsurls/fsurls_test.go
- internal/fsurls/lang_files_test.go
- internal/web/checker_test.go
- internal/web/checker_test.go
- testdata/test nesting/test more nesting/test1.md
- testdata/test10.log
- testdata/test11.go
- testdata/test5.html
- testdata/test6.json
- testdata/test8.csv
404 GET https://this-domain-does-not-exist-987654321.com — host not found
- testdata/test16.ts
404 GET https://this-image-does-not-exist.com/pic.jpg — host not found
- testdata/test nesting/test more nesting/test1.md
404 GET https://this-image-domain-should-not-exist-xyz.example — host not found
- testdata/test14.py
404 GET https://totally-made-up-host-tld-abcdef123.com — host not found
- testdata/test17.rb
404 GET https://unregistered-website-xyz.com — host not found
- testdata/test nesting/test more nesting/test1.md
- testdata/test15.java
- testdata/test7.yaml

testing some optimizations

bd21420

LukeHagar added 2 commits November 14, 2025 20:36

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

internal/web/cache.go Show resolved Hide resolved

internal/web/checker.go Show resolved Hide resolved

LukeHagar added 3 commits November 14, 2025 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

testing some workflow optimizations #1

testing some workflow optimizations #1

Uh oh!

LukeHagar commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

cursor bot commented Nov 14, 2025

coderabbitai bot commented Nov 14, 2025 •

edited

Loading

Rate limit exceeded

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025

Labels

2 participants

testing some workflow optimizations #1

Are you sure you want to change the base?

testing some workflow optimizations #1

Uh oh!

Conversation

LukeHagar commented Nov 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

cursor bot commented Nov 14, 2025

coderabbitai bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Poem

Pre-merge checks and finishing touches

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025

Slinky Test Report

Failures by URL

Labels

2 participants

LukeHagar commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 14, 2025 •

edited

Loading