I recently had a use case at work where I wanted to check that file paths given in a Python script actually existed. These paths were in various GitHub repositories, so all I had to do was pull out the paths and check if they exist on GitHub.
There were a few catches though.
First, I couldn’t simply get any string out of each Python script - they needed to be strings specficied by a specific function parameter, and match a regex (e.g., start with ‘abc’).
Second, the script paths lack the GitHub repository root name. This name was part of the function name - so I needed to get access to the function that the path was specified within, and then parse the function name to get the repository name.
The obvious solution I thought was the ast library.
ast library
I started by using ast
. The ast.NodeVisitor
class seemed like it would do the trick.
An example script (“my_script.py”):
def hello(path, stuff=None): return path if __name__ == " __main__": print(hello(path="hello/world.py", stuff="hello mars")) import ast class CollectStrings(ast.NodeVisitor): def visit_Module(self, node): self.out = set() self.generic_visit(node) return list(filter(lambda w: w.startswith("hello") and w.endswith(".py"), self.out)) def visit_Str(self, node): self.out.add(node.s) file = "my_script.py" with open(file, "r") as f: body = ast.parse(f.read()) coll = CollectStrings() coll.visit(body) ## ['hello/world.py']
That worked great at fetching paths - only because all the paths I was looking for started with the same text and all have the same file extension.
HOWEVER - I also needed the function name that the path
argument was called from. I tried to make this work with ast.NodeVisitor
but couldn’t get it to work.
I eventually got frustrated enough and figured there must be some libraries that build on top of ast
that make it easier to work with ast’s in Python.
redbaron
Enter redbaron. I found this library pretty quickly upon searching for a library building on top of ast
.
Another example script (“their_script.py”):
def hello(path, stuff=None): return path def goodbye(path, stuff=None): return path def world(): path_str = hello(path="src/world.py", stuff="hello mars") other_path_str = goodbye(path="src/world.py", stuff="hello saturn") return path_str, other_path_str if __name__ == " __main__": print(world()) import re from redbaron import RedBaron file = "their_script.py" with open(file, "r") as src: red = RedBaron(src.read()) red ## 0 def hello(path, stuff=None): ## return path ## ## ## ## 1 def goodbye(path, stuff=None): ## return path ## ## ## ## 2 def world(): ## path_str = hello(path="src/world.py", stuff="hello mars") ## other_path_str = goodbye(path="src/world.py", stuff="hello saturn") ## ## return path_str, other_path_str ## ## ## ## 3 if __name__ == " __main__": ## print(world()) ##
Even just the resulting object you get from parsing something is useful:
And with .help()
you get a very detailed map of the structure of the thing you’re trying to navigate (only printing first 20 lines):
red.help() ## 0 ----------------------------------------------------- ## DefNode() ## # identifiers: def, def_, defnode, funcdef, funcdef_ ## # default test value: name ## async=False ## name='hello' ## return_annotation -> ## None ## decorators -> ## arguments -> ## * DefArgumentNode() ## # identifiers: def_argument, def_argument_, defargument, defargumentnode ## target -> ## NameNode() ... ## annotation -> ## None ## value -> ## None ## * DefArgumentNode() ## # identifiers: def_argument, def_argument_, defargument, defargumentnode ...
Looking at the result from red.help()
I can then use .find_all()
to find certain nodes in the ast.
nodes = red.find_all("AtomtrailersNode") nodes = list(filter(lambda w: "hello" in w.dumps(), nodes)) nodes ## [hello(path="src/world.py", stuff="hello mars"), goodbye(path="src/world.py", stuff="hello saturn")]
Then I can write some okay code to extract out the function name, and ugly code to get the string supplied to the path
parameter. Then f-string those together to get the path I’m after.
paths = [] for node in nodes: fxn_name = node.name.value command = re.search("src/.*\\.py", node.dumps()).group() paths.append(f"{fxn_name}/{command}") for path in paths: print(path) ## hello/src/world.py ## goodbye/src/world.py
Not super proud of this but gets the job done for my use case - and when you’re not making open source for others, you don’t need to worry about other use cases :)
I’ll definitely try to learn how to properly extract stuff using redbaron
- but it got me to answer much faster than the ast
library.
Top comments (0)