Skip to content

Conversation

@asgerf
Copy link
Contributor

@asgerf asgerf commented Oct 31, 2025

Makes API graphs incremental.

The high-level overview is that we run the underlying data flow analysis in two global stages, where each stage is configured with a set of "roots" and a set of nodes that are "in scope". They're configured in a way where the first stage only does real work in the base, and the second stage only does any real work in the overlay.

Based on the tracking results from the first stage, we identify which nodes could flow into the overlay part, and use those as roots in the second stage.

Commit-by-commit review strongly recommended.

See backilnked performance evaluations. The performance gains are very swingy; in many cases it doesn't make much of a difference, but then in some cases e2e time drops by 40%.

@github-actions github-actions bot added the JS label Oct 31, 2025

pragma[nomagic]
private predicate shouldBacktrackIntoOverlay(DataFlow::SourceNode nd) {
exists(DataFlow::Node overlayNode |

Check warning

Code scanning / CodeQL

Omittable 'exists' variable Warning

This exists variable can be omitted by using a don't-care expression
in this argument
.
predicate inScope(DataFlow::Node node) { any() }
}

private module Full = Stage<FullInput>;

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.

private module Full = Stage<FullInput>;

query predicate missingDefNode(DataFlow::Node node) {

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.
not exists(MkDef(node))
}

query predicate missingUseNode(DataFlow::Node node) {

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.
not exists(MkUse(node))
}

query predicate lostEdge(Node pred, Label::ApiLabel lbl, Node succ) {

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.
not Cached::edge(pred, lbl, succ)
}

query predicate counts(int numEdges, int numOverlayEdges, float ratio) {

Check warning

Code scanning / CodeQL

Dead code Warning

This code is never used, and it's not publicly exported.
@asgerf asgerf force-pushed the js/incremental-api-graphs branch from 2e5779a to c8abbdc Compare November 3, 2025 10:52
@asgerf asgerf force-pushed the js/incremental-api-graphs branch from c8abbdc to e1a113c Compare November 20, 2025 10:43
@asgerf asgerf force-pushed the js/incremental-api-graphs branch 2 times, most recently from 3cb97e6 to fbc775f Compare November 24, 2025 11:05
@asgerf asgerf force-pushed the js/incremental-api-graphs branch 4 times, most recently from bde7253 to 69857e3 Compare November 28, 2025 09:26
asgerf added 11 commits December 1, 2025 10:12
Some abstract classes defines fields without binding them, leaving it up to the subclasses to bind them. When combined with overlay[local?], the charpred for such an abstract class can become local, while the subclasses are global. The means the charpred needs to be materialized, even though it doesn't bind the fields, leading to a cartesian product.
This was somehow lost in a rebase
We want the type itself to be local but nearly all its member predicates are global.
Previously this was implied by MkClassInstance but that's no longer the case.
@asgerf asgerf force-pushed the js/incremental-api-graphs branch from 69857e3 to 5fbf877 Compare December 1, 2025 09:17
@asgerf asgerf added the no-change-note-required This PR does not need a change note label Dec 9, 2025
@asgerf asgerf marked this pull request as ready for review December 9, 2025 07:55
@asgerf asgerf requested review from a team as code owners December 9, 2025 07:55
Copilot AI review requested due to automatic review settings December 9, 2025 07:55
@asgerf asgerf requested a review from a team as a code owner December 9, 2025 07:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements incremental API graph analysis for JavaScript through a two-stage approach. In the base database, stage 1 performs the full analysis while stage 2 does nothing. In overlay databases, stage 1 does nothing while stage 2 only analyzes changed files and their dependencies, using stage 1 results as roots. This optimization can reduce end-to-end analysis time by up to 40% in some cases.

  • Introduces a staged architecture with configurable roots and scope for each stage
  • Refactors API graph node construction to be more selective based on stage configuration
  • Updates overlay annotations across multiple framework files to support incremental analysis

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
javascript/ql/lib/semmle/javascript/ApiGraphs.qll Core implementation of two-stage incremental API graph analysis with new stage-based architecture, tracking predicates, and debug utilities
javascript/ql/lib/semmle/javascript/frameworks/Templating.qll Changes concat to strictconcat and updates overlay annotation from global to local? for IncludeFunctionAsEntryPoint
javascript/ql/lib/semmle/javascript/frameworks/*.qll Adds overlay[local?] annotations to various EntryPoint classes across multiple framework files
javascript/ql/lib/semmle/javascript/ES2015Modules.qll Adds pragma[nomagic] annotations and refactors reExportsAs to improve join ordering
javascript/ql/lib/semmle/javascript/security/dataflow/*.qll Adds overlay[global] annotations to abstract sink and flow label classes
python/ql/lib/semmle/python/frameworks/data/internal/ApiGraphModels.qll Adds overlay[local?] annotations to TypeModelUseEntry and TypeModelDefEntry classes
ruby/ql/lib/codeql/ruby/frameworks/data/internal/ApiGraphModels.qll Adds overlay[local?] annotations to TypeModelUseEntry and TypeModelDefEntry classes
javascript/ql/test/library-tests/frameworks/Redux/test.expected Updates expected test output to reflect shorter, more readable API graph node descriptions
javascript/ql/test/library-tests/frameworks/Knex/test.expected Updates expected test output with simplified node descriptions
javascript/ql/test/library-tests/EndpointNaming/EndpointNaming.expected Updates expected test output with more concise node descriptions
javascript/ql/test/ApiGraphs/custom-entry-point/VerifyAssertions.ql Adds overlay[local?] annotation to CustomEntryPoint class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

asgerf and others added 2 commits December 9, 2025 09:02
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
tausbn
tausbn previously approved these changes Dec 16, 2025
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a difficult review, in part because of the couple of "whoops here's something I removed by accident previously" commits (which had me scratching my head quite a bit). You should feel free to just rebase those out of existence in the future, especially on PRs that are intended to be reviewed commit-by-commit.

That aside, I think this looks good! I have a few question here and there, but I can't argue with the results, and I didn't see anything that struck me as obviously out of place.

I do have some higher-level thoughts, like

  • should we add comments to explain why something is marked as global (when it might not be entirely obvious)? and
  • could we come up with better names than Stage1 and Stage2 for the two API graph stages? (Non-descriptive names like this are a bit of a pet peeve of mine.)

... but I don't see a need to reach consensus on this right now. We can always address these in a future PR.

* This predicate should thus not be used to block the tracking of use/def nodes, but only block the creation of new labelled edges.
*/
bindingset[node]
predicate inScope(DataFlow::Node node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that we're using "scope" in a non-standard (for program semantics) way here. I'm not going to insist we change it in this PR, but we should probably try to come up with a different word, lest we doom all future readers into thinking inScope has something to do with (lexical) scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a change where it's rephrased as inActiveFile. PTAL

EntryPoint() { any() }

/** Gets a data-flow node where a value enters the current codebase through this entry-point. */
overlay[global]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this global? Are these shared across files somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it local would require all the overriding predicates to become local as well, and there's at least one override that depends on type-name resolution, which is global. There might be ways to change that, but since we don't really need this to be local (for the purpose of this PR), I'm happy to just leave it global.

@@ -1,7 +1,7 @@
testFailures
ambiguousPreferredPredecessor
| pack2/lib.js:1:1:3:1 | def moduleImport("pack2").getMember("exports").getMember("lib").getMember("LibClass").getInstance() |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pity that we lose the "descriptive" names for API graph nodes, but I guess it's kind of necessary.

@asgerf
Copy link
Contributor Author

asgerf commented Dec 17, 2025

This was a difficult review, in part because of the couple of "whoops here's something I removed by accident previously" commits (which had me scratching my head quite a bit). You should feel free to just rebase those out of existence in the future, especially on PRs that are intended to be reviewed commit-by-commit.

I'm sorry if it wasn't obvious, but it was not my intention for you to review the "wrap in a module" commit. 😬 I factored it out like this exactly so you wouldn't have to crawl through that. Rebasing conflicting changes from main across such a commit is a lot of work because many commits in the history are going to conflict, and I figured it wasn't much work for the reviewer because the indentation-only commit was not supposed to be reviewed. But thanks for powering through. 😅

Co-authored-by: Taus <tausbn@github.com>
@asgerf
Copy link
Contributor Author

asgerf commented Dec 17, 2025

should we add comments to explain why something is marked as global (when it might not be entirely obvious)?

I tried doing this early on and found out that

  • We end up having a lot of such comments
  • They quietly become out-dated, probably because they tend talk about things happening "far away" from the comment
  • There is no nice place to put the comment. Auto-formatting does not permit comments on the same line as overlay[global]

could we come up with better names than Stage1 and Stage2 for the two API graph stages? (Non-descriptive names like this are a bit of a pet peeve of mine.)

The only meaningful names I can think of are "base stage" and "overlay stage". On one hand, this makes sense because each stage is a no-op in one of the incremental versions (stage 1 is a no-op in the overlay version, and stage 2 is a no-op in the base version). On the other hand, it makes the words "base" and "overlay" more ambiguous as it's never quite clear if we're talking about the base/overlay stage or the base/overlay version of a predicate.

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

JS no-change-note-required This PR does not need a change note Python Ruby

2 participants