Skip to content

Investigate options to improve license scanning efficiency #5383

@mthalman

Description

@mthalman

License scanning takes a long time: ~19 hrs total compute per run. We should investigate ways to reduce this time.

One idea is to stop scanning stuff that's already been scanned before. Assuming that the version of scancode since the last time a file has been scanned, it makes no sense to scan it again over and over and over. Some ways to avoid doing that:

  • A pipeline run outputs as artifacts the git commit URL of the file that was scanned (which includes the commit SHA to uniquely identify that version of the file). A subsequent pipeline run checks the artifacts of the previous pipeline run and compares the commit URLs of the files to what the branch currently has. The scan only runs on those files which have changed.
  • A pipeline run does a git diff to determine the file changes from the current run and the last pipeline run. The scan runs only on that diff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions