- Notifications
You must be signed in to change notification settings - Fork 1.9k
Ruby: expand DataFlow API #11024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby: expand DataFlow API #11024
Conversation
This is the inverse of getALocalSource()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
99c8eba to be1d931 Compare 04a5c91 to c2e33a2 Compare | * end | ||
| * ``` | ||
| */ | ||
| ModuleNode getADescendentModule() { MkAncestorLookup(result) = this.getATargetScope() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand why we've mixed them together, but I'm slightly concerned about the conflation of classes and modules here. You can't subclass a module and you can't include, extend or prepend a class. Users may find it confusing that they need to use getADescendentModule() to find subclasses.
Additionally, if our module resolution gets confused at any point and conflates a class A with some unrelated module M::A in a different namespace M, we may find that we can't find that fetching subclasses of A also yields modules that include M::A.
Is there an easy way to split the class and module logic into separate predicates, i.e. something like this?
ModuleNode getADescendentModule; ClassNode getASubclass;Overall though, I don't think this is a huge deal if it has to stay as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought, it might be enough to add
ClassNode getASubclass() { result = this.getADescendentModule() }but I'm also happy to leave this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have Module#getASubClass() from before this PR, which returns an immediate subclass (not transitive). For that reason I'm trying to avoid adding things called getSubClass/getSubclass that don't behave the same way.
| On the naming of |
I preferred a short name given how ubiquitous the corresponding |
| New evaluation shows we gain about 1,600 new call edges due to handling of |
Ruby: Make sure to always generate SSA definitions for namespace self-variables
hvitved left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see what DCA says.
| Latest evaluation still looks good |
Adds some classes to do basic stuff with
DataFlownodes without having to map down to CFG and AST nodes, and a few other tidbits I found myself missing.To ensure some degree of battle-hardening I (mostly) ported three library models to use the new nodes where it makes sense. The affected models are
Rails.qll,Railties.qll, andActionController.qll.Migrating from AST nodes to DataFlow nodes revealed a couple of pre-existing issues, some of which have now been fixed or fixed enough to avoid a regression:
Some other issues which were too hard to fix prior to opening this PR:
self, but of course that's not always the case. There is currently no way to tell the analysis what the value ofselfis, so we have to work around it for now. This means using things likegetParent+()and treating the moduleselfas if it's an instance of the module.getConst() and what about API graphs?
One of the changes that may look odd is the introduction of
DataFlow::getConst()andConstRef. It provides convenient access to constants that may resolve to a given qualified name. For example, the incantationgives us all descendents of
ActionController::Base.This was probably the biggest use case for API graphs in Ruby, as you can get to (almost) the same thing with
API::getTopLevelMember("ActionController").getMember("Base"). So why addgetConst()?getConst()doesn't rely on the call graph, which means it can be used for things likegetACallSimple()and other pre-call graph customizations (we have quite a few of those in JS and I expect we'll find the need in Ruby as well).getConst()handles the rules of constant lookup more precisely than API graphs, in particular when related toinclude, nested modules, and modules with more than one declaration. I think API graphs could be extended to make this work, but it's not really a natural fit for its use/def tracking, it would need to be a bunch of new predicates specific to constant lookup, which might as well exist earlier in the pipeline.I replaced some uses of API graphs with
getConst, not because it matters in practice, I just had to ensuregetConst()is being used enough to test it in the wild.I hope the name
getConst()makes more sense in the context of Ruby. I think the name in API graphs,getMember, was used mostly because that's what JS called it.Evaluation
Evaluation shows
flowsToThe new sources and sinks are mainly due to resolving methods upwards in the hierarchy in
ActionController. The combinationgetADescendentModule().getAnInstanceMethod()takes "sideways" method resolution into account without having to really think about it, so I like that.