Skip to content

Conversation

HippocampusGirl
Copy link
Contributor

Summary

A few months back, I submitted pull request #3184 to improve the performance of using connect when creating large workflows. Specifically, I had discovered that the use of the inputs or outputs properties of workflows can create a performance bottleneck if there are many child nodes or nested workflows.

Recently, I noticed that the same bottleneck can cause a delay between calling workflow.run() and the start of the actual execution, meaning when nodes and interfaces start to run.

Running cProfile suggests that the delay occurs in _create_flat_graph. Note that the profile does not include the full workflow execution, but was cancelled immediately when the first node started to run.
Screen Shot 2020-08-07 at 10 47 49 AM

As far as I can tell, before execution starts, nested workflows are merged into one overall workflow using _create_flat_graph. To resolve the final connections between nodes in this merged workflow, _create_flat_graph calls _get_parameter_node for each input from or output to a nested workflow, and then modifies the connection information accordingly.

for u, _, d in list(self._graph.in_edges(nbunch=node, data=True)):
logger.debug("in: connections-> %s", str(d["connect"]))
for cd in deepcopy(d["connect"]):
logger.debug("in: %s", str(cd))
dstnode = node._get_parameter_node(cd[1], subtype="in")

As a result, for each connection to/from a nested workflow, _get_parameter_node constructs the entire inputs or outputs data structure of the nested workflow, and then uses it to resolve the correct connection information. Just as for #3184, constructing this entire data structure over and over again for each connection can reduce performance.

List of changes proposed in this PR (pull-request)

Instead of generating the full inputs or outputs data structure, I propose that the _get_parameter_node function should traverse the individual workflow graphs until it finds the target node (or not).

I have created a quick implementation that leads to a significant speedup. This implementation is a slightly modified copy of the code from #3184.
Screen Shot 2020-08-07 at 11 47 01 AM

I hope that this code will be useful for the nipype community.

Acknowledgment

  • (Mandatory) I acknowledge that this contribution will be available under the Apache 2 license.
- Traverse nested workflows in a loop - Avoid constructing the entire workflow.inputs or workflow.outputs data structure
@codecov
Copy link

codecov bot commented Oct 18, 2020

Codecov Report

Merging #3260 into master will increase coverage by 0.39%.
The diff coverage is 100.00%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #3260 +/- ## ========================================== + Coverage 64.23% 64.62% +0.39%  ========================================== Files 300 302 +2 Lines 39884 39824 -60 Branches 5276 5279 +3 ========================================== + Hits 25618 25735 +117  + Misses 13210 12995 -215  - Partials 1056 1094 +38 
Flag Coverage Δ
#unittests 64.62% <100.00%> (+0.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
nipype/pipeline/engine/workflows.py 64.77% <100.00%> (-4.45%) ⬇️
nipype/testing/utils.py 70.90% <0.00%> (-18.19%) ⬇️
nipype/info.py 80.00% <0.00%> (-7.70%) ⬇️
nipype/scripts/cli.py 42.16% <0.00%> (-5.09%) ⬇️
nipype/interfaces/fsl/base.py 76.40% <0.00%> (-4.45%) ⬇️
nipype/interfaces/afni/base.py 65.54% <0.00%> (-3.99%) ⬇️
nipype/interfaces/diffusion_toolkit/base.py 46.15% <0.00%> (-3.85%) ⬇️
nipype/interfaces/workbench/base.py 54.16% <0.00%> (-3.53%) ⬇️
nipype/interfaces/ants/base.py 60.00% <0.00%> (-3.27%) ⬇️
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9217c2...d5a88de. Read the comment docs.

@effigies
Copy link
Member

Thanks for this, that looks like a great improvement!

I've read your solution and my only suggestions would be aesthetic ones, but I realized that we probably already have a function to fetch a particular node, and found get_node():

def get_node(self, name):
"""Return an internal node by name
"""
nodenames = name.split(".")
nodename = nodenames[0]
outnode = [
node for node in self._graph.nodes() if str(node).endswith("." + nodename)
]
if outnode:
outnode = outnode[0]
if nodenames[1:] and issubclass(outnode.__class__, Workflow):
outnode = outnode.get_node(".".join(nodenames[1:]))
else:
outnode = None
return outnode

I think we might be able to replace all calls to wf._get_parameter_node(parameter) with wf.get_node(parameter.rsplit(".", 1)[0]), and probably not lose much efficiency over your solution. Would you mind giving that a shot?

@HippocampusGirl
Copy link
Contributor Author

That's a really good idea :-)

@effigies
Copy link
Member

Test failures appear to be #3261.

How does the profiling look? Do we need to clean up get_node() at all? The outnode = [...] comprehension seems like it could be wasteful on large graphs, but if there's no discernible difference with your solution, then I think we should just get this in.

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fMRIPrep with the latest commit, this gets a ~100x speedup for _generate_flatgraph().

LGTM, though let me know if there's anything else you want to include before merge.

@satra
Copy link
Member

satra commented Oct 21, 2020

thank you @HippocampusGirl - much appreciated.

@effigies effigies merged commit 07af08f into nipy:master Oct 22, 2020
@HippocampusGirl
Copy link
Contributor Author

Thank you for benchmarking that @effigies! No, I don't have anything to add

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants