- Notifications
You must be signed in to change notification settings - Fork 1.9k
JS: Incremental API graph #20733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
JS: Incremental API graph #20733
Conversation
| | ||
| pragma[nomagic] | ||
| private predicate shouldBacktrackIntoOverlay(DataFlow::SourceNode nd) { | ||
| exists(DataFlow::Node overlayNode | |
Check warning
Code scanning / CodeQL
Omittable 'exists' variable Warning
in this argument
| predicate inScope(DataFlow::Node node) { any() } | ||
| } | ||
| | ||
| private module Full = Stage<FullInput>; |
Check warning
Code scanning / CodeQL
Dead code Warning
| | ||
| private module Full = Stage<FullInput>; | ||
| | ||
| query predicate missingDefNode(DataFlow::Node node) { |
Check warning
Code scanning / CodeQL
Dead code Warning
| not exists(MkDef(node)) | ||
| } | ||
| | ||
| query predicate missingUseNode(DataFlow::Node node) { |
Check warning
Code scanning / CodeQL
Dead code Warning
| not exists(MkUse(node)) | ||
| } | ||
| | ||
| query predicate lostEdge(Node pred, Label::ApiLabel lbl, Node succ) { |
Check warning
Code scanning / CodeQL
Dead code Warning
| not Cached::edge(pred, lbl, succ) | ||
| } | ||
| | ||
| query predicate counts(int numEdges, int numOverlayEdges, float ratio) { |
Check warning
Code scanning / CodeQL
Dead code Warning
2e5779a to c8abbdc Compare c8abbdc to e1a113c Compare 3cb97e6 to fbc775f Compare bde7253 to 69857e3 Compare Simply wraps everything in 'cached private module Stage {}' and adds 'import Stage'. The diff is large because of indentation changes. This is needed for localizing ApiLabel later
Some abstract classes defines fields without binding them, leaving it up to the subclasses to bind them. When combined with overlay[local?], the charpred for such an abstract class can become local, while the subclasses are global. The means the charpred needs to be materialized, even though it doesn't bind the fields, leading to a cartesian product.
This was somehow lost in a rebase
We want the type itself to be local but nearly all its member predicates are global.
Previously this was implied by MkClassInstance but that's no longer the case.
69857e3 to 5fbf877 Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements incremental API graph analysis for JavaScript through a two-stage approach. In the base database, stage 1 performs the full analysis while stage 2 does nothing. In overlay databases, stage 1 does nothing while stage 2 only analyzes changed files and their dependencies, using stage 1 results as roots. This optimization can reduce end-to-end analysis time by up to 40% in some cases.
- Introduces a staged architecture with configurable roots and scope for each stage
- Refactors API graph node construction to be more selective based on stage configuration
- Updates overlay annotations across multiple framework files to support incremental analysis
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
javascript/ql/lib/semmle/javascript/ApiGraphs.qll | Core implementation of two-stage incremental API graph analysis with new stage-based architecture, tracking predicates, and debug utilities |
javascript/ql/lib/semmle/javascript/frameworks/Templating.qll | Changes concat to strictconcat and updates overlay annotation from global to local? for IncludeFunctionAsEntryPoint |
javascript/ql/lib/semmle/javascript/frameworks/*.qll | Adds overlay[local?] annotations to various EntryPoint classes across multiple framework files |
javascript/ql/lib/semmle/javascript/ES2015Modules.qll | Adds pragma[nomagic] annotations and refactors reExportsAs to improve join ordering |
javascript/ql/lib/semmle/javascript/security/dataflow/*.qll | Adds overlay[global] annotations to abstract sink and flow label classes |
python/ql/lib/semmle/python/frameworks/data/internal/ApiGraphModels.qll | Adds overlay[local?] annotations to TypeModelUseEntry and TypeModelDefEntry classes |
ruby/ql/lib/codeql/ruby/frameworks/data/internal/ApiGraphModels.qll | Adds overlay[local?] annotations to TypeModelUseEntry and TypeModelDefEntry classes |
javascript/ql/test/library-tests/frameworks/Redux/test.expected | Updates expected test output to reflect shorter, more readable API graph node descriptions |
javascript/ql/test/library-tests/frameworks/Knex/test.expected | Updates expected test output with simplified node descriptions |
javascript/ql/test/library-tests/EndpointNaming/EndpointNaming.expected | Updates expected test output with more concise node descriptions |
javascript/ql/test/ApiGraphs/custom-entry-point/VerifyAssertions.ql | Adds overlay[local?] annotation to CustomEntryPoint class |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
tausbn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a difficult review, in part because of the couple of "whoops here's something I removed by accident previously" commits (which had me scratching my head quite a bit). You should feel free to just rebase those out of existence in the future, especially on PRs that are intended to be reviewed commit-by-commit.
That aside, I think this looks good! I have a few question here and there, but I can't argue with the results, and I didn't see anything that struck me as obviously out of place.
I do have some higher-level thoughts, like
- should we add comments to explain why something is marked as
global(when it might not be entirely obvious)? and - could we come up with better names than
Stage1andStage2for the two API graph stages? (Non-descriptive names like this are a bit of a pet peeve of mine.)
... but I don't see a need to reach consensus on this right now. We can always address these in a future PR.
| * This predicate should thus not be used to block the tracking of use/def nodes, but only block the creation of new labelled edges. | ||
| */ | ||
| bindingset[node] | ||
| predicate inScope(DataFlow::Node node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned that we're using "scope" in a non-standard (for program semantics) way here. I'm not going to insist we change it in this PR, but we should probably try to come up with a different word, lest we doom all future readers into thinking inScope has something to do with (lexical) scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a change where it's rephrased as inActiveFile. PTAL
| EntryPoint() { any() } | ||
| | ||
| /** Gets a data-flow node where a value enters the current codebase through this entry-point. */ | ||
| overlay[global] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this global? Are these shared across files somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it local would require all the overriding predicates to become local as well, and there's at least one override that depends on type-name resolution, which is global. There might be ways to change that, but since we don't really need this to be local (for the purpose of this PR), I'm happy to just leave it global.
| @@ -1,7 +1,7 @@ | |||
| testFailures | |||
| ambiguousPreferredPredecessor | |||
| | pack2/lib.js:1:1:3:1 | def moduleImport("pack2").getMember("exports").getMember("lib").getMember("LibClass").getInstance() | | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pity that we lose the "descriptive" names for API graph nodes, but I guess it's kind of necessary.
I'm sorry if it wasn't obvious, but it was not my intention for you to review the "wrap in a module" commit. 😬 I factored it out like this exactly so you wouldn't have to crawl through that. Rebasing conflicting changes from main across such a commit is a lot of work because many commits in the history are going to conflict, and I figured it wasn't much work for the reviewer because the indentation-only commit was not supposed to be reviewed. But thanks for powering through. 😅 |
Co-authored-by: Taus <tausbn@github.com>
I tried doing this early on and found out that
The only meaningful names I can think of are "base stage" and "overlay stage". On one hand, this makes sense because each stage is a no-op in one of the incremental versions (stage 1 is a no-op in the overlay version, and stage 2 is a no-op in the base version). On the other hand, it makes the words "base" and "overlay" more ambiguous as it's never quite clear if we're talking about the base/overlay stage or the base/overlay version of a predicate. |
tausbn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
Makes API graphs incremental.
The high-level overview is that we run the underlying data flow analysis in two global stages, where each stage is configured with a set of "roots" and a set of nodes that are "in scope". They're configured in a way where the first stage only does real work in the base, and the second stage only does any real work in the overlay.
Based on the tracking results from the first stage, we identify which nodes could flow into the overlay part, and use those as roots in the second stage.
Commit-by-commit review strongly recommended.
See backilnked performance evaluations. The performance gains are very swingy; in many cases it doesn't make much of a difference, but then in some cases e2e time drops by 40%.