[ENH] Optimize workflow.run performance #3260

HippocampusGirl · 2020-10-18T19:49:13Z

Summary

A few months back, I submitted pull request #3184 to improve the performance of using connect when creating large workflows. Specifically, I had discovered that the use of the inputs or outputs properties of workflows can create a performance bottleneck if there are many child nodes or nested workflows.

Recently, I noticed that the same bottleneck can cause a delay between calling workflow.run() and the start of the actual execution, meaning when nodes and interfaces start to run.

Running cProfile suggests that the delay occurs in _create_flat_graph. Note that the profile does not include the full workflow execution, but was cancelled immediately when the first node started to run.

As far as I can tell, before execution starts, nested workflows are merged into one overall workflow using _create_flat_graph. To resolve the final connections between nodes in this merged workflow, _create_flat_graph calls _get_parameter_node for each input from or output to a nested workflow, and then modifies the connection information accordingly.

nipype/nipype/pipeline/engine/workflows.py

Lines 975 to 979 in e9217c2

     for u, _, d in list(self._graph.in_edges(nbunch=node, data=True)):  
   logger.debug("in: connections-> %s", str(d["connect"]))  
   for cd in deepcopy(d["connect"]):  
   logger.debug("in: %s", str(cd))  
   dstnode = node._get_parameter_node(cd[1], subtype="in")  
 

As a result, for each connection to/from a nested workflow, _get_parameter_node constructs the entire inputs or outputs data structure of the nested workflow, and then uses it to resolve the correct connection information. Just as for #3184, constructing this entire data structure over and over again for each connection can reduce performance.

List of changes proposed in this PR (pull-request)

Instead of generating the full inputs or outputs data structure, I propose that the _get_parameter_node function should traverse the individual workflow graphs until it finds the target node (or not).

I have created a quick implementation that leads to a significant speedup. This implementation is a slightly modified copy of the code from #3184.

I hope that this code will be useful for the nipype community.

Acknowledgment

(Mandatory) I acknowledge that this contribution will be available under the Apache 2 license.

- Traverse nested workflows in a loop - Avoid constructing the entire workflow.inputs or workflow.outputs data structure

codecov · 2020-10-18T20:13:51Z

Codecov Report

Merging #3260 into master will increase coverage by 0.39%.
The diff coverage is 100.00%.

@@ Coverage Diff @@ ## master #3260 +/- ## ========================================== + Coverage 64.23% 64.62% +0.39%  ========================================== Files 300 302 +2 Lines 39884 39824 -60 Branches 5276 5279 +3 ========================================== + Hits 25618 25735 +117  + Misses 13210 12995 -215  - Partials 1056 1094 +38

Flag	Coverage Δ
#unittests	`64.62% <100.00%> (+0.39%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
nipype/pipeline/engine/workflows.py	`64.77% <100.00%> (-4.45%)`	⬇️
nipype/testing/utils.py	`70.90% <0.00%> (-18.19%)`	⬇️
nipype/info.py	`80.00% <0.00%> (-7.70%)`	⬇️
nipype/scripts/cli.py	`42.16% <0.00%> (-5.09%)`	⬇️
nipype/interfaces/fsl/base.py	`76.40% <0.00%> (-4.45%)`	⬇️
nipype/interfaces/afni/base.py	`65.54% <0.00%> (-3.99%)`	⬇️
nipype/interfaces/diffusion_toolkit/base.py	`46.15% <0.00%> (-3.85%)`	⬇️
nipype/interfaces/workbench/base.py	`54.16% <0.00%> (-3.53%)`	⬇️
nipype/interfaces/ants/base.py	`60.00% <0.00%> (-3.27%)`	⬇️
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9217c2...d5a88de. Read the comment docs.

effigies · 2020-10-18T23:52:55Z

Thanks for this, that looks like a great improvement!

I've read your solution and my only suggestions would be aesthetic ones, but I realized that we probably already have a function to fetch a particular node, and found get_node():

nipype/nipype/pipeline/engine/workflows.py

Lines 369 to 383 in e9217c2

     def get_node(self, name):  
   """Return an internal node by name  
    """  
   nodenames = name.split(".")  
   nodename = nodenames[0]  
   outnode = [  
   node for node in self._graph.nodes() if str(node).endswith("." + nodename)  
   ]  
   if outnode:  
   outnode = outnode[0]  
   if nodenames[1:] and issubclass(outnode.__class__, Workflow):  
   outnode = outnode.get_node(".".join(nodenames[1:]))  
   else:  
   outnode = None  
   return outnode  
 

I think we might be able to replace all calls to wf._get_parameter_node(parameter) with wf.get_node(parameter.rsplit(".", 1)[0]), and probably not lose much efficiency over your solution. Would you mind giving that a shot?

HippocampusGirl · 2020-10-19T20:29:26Z

That's a really good idea :-)

effigies · 2020-10-20T12:19:02Z

Test failures appear to be #3261.

How does the profiling look? Do we need to clean up get_node() at all? The outnode = [...] comprehension seems like it could be wasteful on large graphs, but if there's no discernible difference with your solution, then I think we should just get this in.

effigies

Using fMRIPrep with the latest commit, this gets a ~100x speedup for _generate_flatgraph().

LGTM, though let me know if there's anything else you want to include before merge.

satra · 2020-10-21T19:53:33Z

thank you @HippocampusGirl - much appreciated.

HippocampusGirl · 2020-10-22T05:52:40Z

Thank you for benchmarking that @effigies! No, I don't have anything to add

[REF] Optimize _get_parameter_node performance

bfc21fd

- Traverse nested workflows in a loop - Avoid constructing the entire workflow.inputs or workflow.outputs data structure

Replace _get_parameter_node with get_node

84c791b

Fix pass the correct argument to get_node

d5a88de

effigies approved these changes Oct 21, 2020

View reviewed changes

effigies merged commit 07af08f into nipy:master Oct 22, 2020

HippocampusGirl mentioned this pull request Apr 30, 2021

[REF] Cache nodes in workflow to speed up construction, other optimizations #3331

Merged

1 task

effigies added this to the 1.6.0 milestone May 1, 2021

HippocampusGirl mentioned this pull request Feb 21, 2025

[Question]: Usage of Graph Algorithms and Any Slowdowns? #3714

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Optimize workflow.run performance #3260

[ENH] Optimize workflow.run performance #3260

Uh oh!

HippocampusGirl commented Oct 18, 2020

codecov bot commented Oct 18, 2020 •

edited

Loading

effigies commented Oct 18, 2020

HippocampusGirl commented Oct 19, 2020

effigies commented Oct 20, 2020

effigies left a comment

satra commented Oct 21, 2020

HippocampusGirl commented Oct 22, 2020

Labels

3 participants

	for u, _, d in list(self._graph.in_edges(nbunch=node, data=True)):
	logger.debug("in: connections-> %s", str(d["connect"]))
	for cd in deepcopy(d["connect"]):
	logger.debug("in: %s", str(cd))
	dstnode = node._get_parameter_node(cd[1], subtype="in")

[ENH] Optimize workflow.run performance #3260

[ENH] Optimize workflow.run performance #3260

Uh oh!

Conversation

HippocampusGirl commented Oct 18, 2020

Summary

List of changes proposed in this PR (pull-request)

Acknowledgment

codecov bot commented Oct 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

effigies commented Oct 18, 2020

HippocampusGirl commented Oct 19, 2020

effigies commented Oct 20, 2020

effigies left a comment

Choose a reason for hiding this comment

satra commented Oct 21, 2020

HippocampusGirl commented Oct 22, 2020

Labels

3 participants

codecov bot commented Oct 18, 2020 •

edited

Loading