Skip to content

Conversation

filipchristiansen
Copy link
Contributor

@filipchristiansen filipchristiansen commented Jul 6, 2025

✨ Refactor: consistent cloning & pattern-handling

Closes #196
Closes #344
Prepares for #342 (digest prefix/suffix) and #343 (caching)

Why

  • Cloning behaved differently depending on whether they supplied a branch, tag or commit.
    Caching and reproducibility suffered.
  • Pattern handling lived in query_parser and duplicated ignore/include logic.
    Hard to unit-test and reuse.
  • Windows CI occasionally failed when temp clones contained read-only
    packed-object files.
    Shallow-clone cleanup broke with WinError 5.

What’s new

  • Unified commit resolution
    utils.git_utils.resolve_commit() guarantees we always fetch the exact SHA
    (HEAD / branch / tag) before checkout.
    Deterministic → enables caching (feat: Implement caching on a per-commit basis #343).
  • New pattern helpers
    utils.pattern_utils.process_patterns() centralises include/exclude parsing
    (moved out of query_parser). Adds thorough tests.
  • Windows-safe repo cleanup
    _handle_remove_readonly on-error callback makes read-only Git objects
    writable and retries shutil.rmtree(), preventing WinError 5 in CI.
  • Public API cleanup (BREAKING)
    • parse_query() removed in favour of
      • parse_remote_repo() (URLs/slugs)
      • parse_local_dir_path() (local paths)
    • clone_repo, ingest_query, parse_query are no longer re-exported from
      gitingest.__init__.
  • Clone rewrite
    • Single entry point that always does: shallow clone → sparse-checkout
      (optional) → fetch commit → checkout → submodule update (optional).
    • _checkout_partial_clone renamed & moved → git_utils.checkout_partial_clone.
  • Server updates
    • query_processor uses new pattern utilities and passes a typed enum.
    • IngestRequest.validate_input_text() now removes any .git suffix.
  • Delete _is_safe_symlink
  • Remove pattern validation (delete _is_valid_pattern, InvalidPatternError, and test_parse_patterns_invalid_characters)
  • Tests added/updated

    clone, pattern, parser, summary

File changes

gitingest

File Changes
__init__.py Stop exposing internal helpers (clone_repo, ingest_query, parse_query)
clone.py Rewrite clone_repo to call resolve_commit, sparse checkout, and uniform fetch/checkout steps; move helper & rename
entrypoint.py Add _handle_remove_readonly callback for Windows temp-dir cleanup; ingest_async now uses parse_remote_repo / parse_local_dir_path
output_formatter.py Add missing Tag prefix in _create_summary_prefix
query_parser.py Remove parse_query; move pattern helpers to pattern_utils.py
utils/exceptions.py Delete InvalidPatternError
utils/git_utils.py New resolve_commit helper
utils/os_utils.py Rename ensure_directoryensure_directory_exists_or_create
utils/pattern_utils.py Consolidated helpers + process_patterns
utils/query_parser_utils.py Delete _is_valid_pattern
path_utils.py (removed) Delete obsolete helper _is_safe_symlink

server

File Changes
models.py Strip .git suffix in validate_input_text
query_processor.py Replace parse_query with parse_remote_repo, integrate PatternType, use process_patterns
routers_utils.py Coerce pattern_type into PatternType enum

tests

File Changes
conftest.py Adjust fixtures
query_parser/test_git_host_agnostic.py Switch to parse_remote_repo
query_parser/test_query_parser.py Switch to parse_remote_repo; add parse_local_dir_path coverage
test_clone.py Update expectations and fixtures
test_pattern_utils.py Add tests for _parse_patterns and process_patterns; remove test_parse_patterns_invalid_characters (pattern validation no longer enforced)
test_summary.py Verify that gitingest.ingest() emits correct summaries
@filipchristiansen filipchristiansen changed the title resolve commit feat: resolve commit Jul 9, 2025
@filipchristiansen filipchristiansen force-pushed the resolve-commit branch 3 times, most recently from 7ab4af2 to 831a36d Compare July 9, 2025 20:32
@filipchristiansen filipchristiansen requested a review from Copilot July 9, 2025 20:33
Copilot

This comment was marked as resolved.

@filipchristiansen filipchristiansen force-pushed the resolve-commit branch 6 times, most recently from cf1aa6f to d1224a6 Compare July 9, 2025 23:45
@filipchristiansen filipchristiansen marked this pull request as ready for review July 9, 2025 23:47
Copilot

This comment was marked as resolved.

@filipchristiansen filipchristiansen force-pushed the resolve-commit branch 2 times, most recently from 979c88a to 3031e7c Compare July 10, 2025 00:07
@filipchristiansen filipchristiansen changed the title feat: resolve commit refactor: consistent cloning & pattern-handling Jul 10, 2025
@filipchristiansen filipchristiansen requested a review from ix-56h July 12, 2025 19:39
@ix-56h

This comment was marked as resolved.

@filipchristiansen

This comment was marked as resolved.

Copy link
Contributor

@ix-56h ix-56h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This comment was marked as off-topic.

@NicolasIRAGNE

This comment was marked as duplicate.

@filipchristiansen
Copy link
Contributor Author

filipchristiansen commented Jul 14, 2025

TODO:

  • delete _is_valid_pattern, InvalidPatternError, _is_safe_symlink
  • call resolve_commit in parse_remote_repo instead of in clone_repo
Copilot

This comment was marked as outdated.

Copy link
Contributor

@NicolasIRAGNE NicolasIRAGNE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks Filip!

@filipchristiansen filipchristiansen removed the request for review from ix-56h July 23, 2025 14:34
@filipchristiansen filipchristiansen merged commit a99089a into main Jul 23, 2025
25 checks passed
@filipchristiansen filipchristiansen deleted the resolve-commit branch July 23, 2025 14:35
Copy link

⚙️ Preview environment was undeployed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants