- Notifications
You must be signed in to change notification settings - Fork 3
Add the option to plot the task graph #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This simplifies the debugging of complex task dependencies: ```python from pympipool import Executor def calc_function(parameter_a, parameter_b): return parameter_a + parameter_b with Executor(max_cores=2, backend="local", plot_dependency_graph=True) as exe: future_1 = exe.submit( calc_function, 1, parameter_b=2, resource_dict={"cores": 1}, ) future_2 = exe.submit( calc_function, 1, parameter_b=future_1, resource_dict={"cores": 1}, ) print(future_2.result()) ``` for more information, see https://pre-commit.ci
WalkthroughThis update introduces the ability to generate and visualize task dependency graphs in the Changes
Sequence Diagram(s)sequenceDiagram participant User participant ExecutorWithDependencies participant Plot User->>ExecutorWithDependencies: Submit task with plot_dependency_graph=True ExecutorWithDependencies-->>ExecutorWithDependencies: Process task ExecutorWithDependencies->>Plot: Generate nodes and edges Plot-->>ExecutorWithDependencies: Return graph structure ExecutorWithDependencies->>Plot: Draw dependency graph Plot-->>ExecutorWithDependencies: Return graph visualization ExecutorWithDependencies-->>User: Task completed with graph plotted Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (3)
pympipool/shared/inputcheck.py (1)
Line range hint
49-49: Optimize dictionary key check.
Use a more Pythonic way to check for the existence of a key in a dictionary.- if "resource_dict" in inspect.signature(function).parameters.keys(): + if "resource_dict" in inspect.signature(function).parameters:pympipool/__init__.py (2)
Line range hint
99-99: Avoid using mutable default arguments.
Using mutable default arguments like lists can lead to unexpected behavior due to how Python handles default argument values.- command_line_argument_lst: list[str] = [], + command_line_argument_lst: Optional[list[str]] = None, + if command_line_argument_lst is None: + command_line_argument_lst = []
Line range hint
121-121: Avoid using mutable default arguments.
Similar to the previous comment, avoid using mutable defaults for function parameters to prevent potential bugs.- command_line_argument_lst: list[str] = [], + command_line_argument_lst: Optional[list[str]] = None, + if command_line_argument_lst is None: + command_line_argument_lst = []
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (9)
- .ci_support/environment-mpich.yml (1 hunks)
- .ci_support/environment-old.yml (1 hunks)
- .ci_support/environment-openmpi.yml (1 hunks)
- .ci_support/environment-win.yml (1 hunks)
- pympipool/init.py (4 hunks)
- pympipool/interactive/dependencies.py (2 hunks)
- pympipool/shared/inputcheck.py (1 hunks)
- pympipool/shared/plot.py (1 hunks)
- pyproject.toml (1 hunks)
Files skipped from review due to trivial changes (3)
- .ci_support/environment-mpich.yml
- .ci_support/environment-openmpi.yml
- pyproject.toml
Additional context used
Ruff
pympipool/interactive/dependencies.py
45-45: Do not use mutable data structures for argument defaults (B006)
Replace with
None; initialize within functionpympipool/shared/inputcheck.py
49-49: Use
key in dictinstead ofkey in dict.keys()(SIM118)Remove
.keys()pympipool/__init__.py
99-99: Do not use mutable data structures for argument defaults (B006)
Replace with
None; initialize within function
121-121: Do not use mutable data structures for argument defaults (B006)
Replace with
None; initialize within function
Additional comments not posted (15)
.ci_support/environment-win.yml (3)
13-13: Addedmatplotlib=3.8.4to support the new plotting feature.
14-14: Addednetworkx=3.3to manage the graph structures needed for plotting.
15-15: Addedipython=8.25.0for interactive plotting capabilities..ci_support/environment-old.yml (4)
12-12: Removedh5io=0.2.1to clean up unused dependencies.
13-13: Addedmatplotlib=3.5.3for older environments, supporting the plotting feature.
14-14: Addednetworkx=2.8.8to handle graph data structures in older environments.
15-15: Addedipython=7.33.0for interactive capabilities in older environments.pympipool/shared/plot.py (3)
9-44: The functiongenerate_nodes_and_edgescorrectly creates nodes and edges for the graph, handling both direct values and futures.
47-58: Functiongenerate_task_hashuses cloudpickle for serialization, ensuring that task dictionaries are uniquely identifiable.
61-69: Thedrawfunction effectively visualizes the task graph using NetworkX and displays it as an SVG. Good use of libraries to achieve the plotting functionality.pympipool/interactive/dependencies.py (3)
8-15: Graceful handling of import errors for the plotting functionality, ensuring the program can still function without this feature.
Line range hint
18-44: Updated constructor to handle theplot_dependency_graphparameter. Properly raises an ImportError if the plotting libraries are not available.
68-77: In the__exit__method, the task graph is drawn if enabled. This is a clean integration of the plotting functionality into the lifecycle of the executor.pympipool/__init__.py (2)
Line range hint
125-165: Review new functionality for plotting task dependencies.
The addition of theplot_dependency_graphparameter to theExecutorclass is a significant enhancement. It allows users to visualize task dependencies, which can be very helpful for debugging complex workflows. Ensure that the documentation and examples are updated to reflect how users can utilize this new feature effectively.Also applies to: 185-185
188-188: Validate plot_dependency_graph usage.
It's good to see validation for the newplot_dependency_graphparameter using_check_plot_dependency_graph. This ensures that the parameter is used correctly and adheres to expected constraints.
| def check_plot_dependency_graph(plot_dependency_graph: bool): | ||
| if plot_dependency_graph: | ||
| raise ValueError( | ||
| "The plot_dependency_graph parameter is only used when disable_dependencies=False." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure correct usage of plot_dependency_graph parameter.
The exception message for plot_dependency_graph appears to be incorrect. It suggests that the parameter is only used when disable_dependencies=False, but this parameter should be usable regardless of the disable_dependencies setting, as it's meant for debugging and visualizing task dependencies.
- "The plot_dependency_graph parameter is only used when disable_dependencies=False." + "The plot_dependency_graph parameter should be set to True to enable task dependency visualization."Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def check_plot_dependency_graph(plot_dependency_graph: bool): | |
| if plot_dependency_graph: | |
| raise ValueError( | |
| "The plot_dependency_graph parameter is only used when disable_dependencies=False." | |
| ) | |
| def check_plot_dependency_graph(plot_dependency_graph: bool): | |
| if plot_dependency_graph: | |
| raise ValueError( | |
| "The plot_dependency_graph parameter should be set to True to enable task dependency visualization." | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- .github/workflows/unittest-mpich.yml (1 hunks)
- .github/workflows/unittest-openmpi.yml (1 hunks)
Files skipped from review due to trivial changes (2)
- .github/workflows/unittest-mpich.yml
- .github/workflows/unittest-openmpi.yml
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (1)
pympipool/__init__.py (1)
Line range hint
107-193: Added a new parameterplot_dependency_graphto theExecutorclass to control the new graph plotting functionality. This is effectively integrated into the existing architecture of the class. However, mutable default arguments in the__new__method should be replaced withNoneto avoid potential bugs.- command_line_argument_lst: list[str] = [] + command_line_argument_lst: list[str] = None + if command_line_argument_lst is None: + command_line_argument_lst = []
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (9)
- .ci_support/environment-mpich.yml (1 hunks)
- .ci_support/environment-old.yml (1 hunks)
- .ci_support/environment-openmpi.yml (1 hunks)
- .ci_support/environment-win.yml (1 hunks)
- pympipool/init.py (6 hunks)
- pympipool/interactive/dependencies.py (2 hunks)
- pympipool/shared/plot.py (1 hunks)
- pyproject.toml (1 hunks)
- tests/test_dependencies_executor.py (2 hunks)
Files skipped from review as they are similar to previous changes (5)
- .ci_support/environment-mpich.yml
- .ci_support/environment-old.yml
- .ci_support/environment-openmpi.yml
- .ci_support/environment-win.yml
- pyproject.toml
Additional context used
Ruff
pympipool/interactive/dependencies.py
35-35: Do not use mutable data structures for argument defaults (B006)
Replace with
None; initialize within functiontests/test_dependencies_executor.py
15-15:
pygraphvizimported but unused; consider usingimportlib.util.find_specto test for availability (F401)pympipool/__init__.py
103-103: Do not use mutable data structures for argument defaults (B006)
Replace with
None; initialize within function
126-126: Do not use mutable data structures for argument defaults (B006)
Replace with
None; initialize within function
Additional comments not posted (8)
pympipool/interactive/dependencies.py (3)
6-6: Added imports for generating and drawing task graphs.
Line range hint
14-33: Added support for plotting dependency graphs in the constructor. This includes initializing several dictionaries and a flag to determine whether to generate graphs.
58-67: Handles the graph drawing in the__exit__method if the graph generation flag is set. This is a clean and effective use of Python's context management.pympipool/shared/plot.py (3)
7-44: Thegenerate_nodes_and_edgesfunction constructs the nodes and edges for the graph based on the tasks' dependencies. This is well-implemented with checks to differentiate betweenFutureobjects and other arguments.
47-58: Thegenerate_task_hashfunction creates a unique hash for each task, considering whether arguments areFutureobjects or not. This function is crucial for identifying unique tasks in the dependency graph and is implemented correctly.
61-73: Thedrawfunction uses NetworkX and Matplotlib to visualize the task graph. This function is well encapsulated and makes use of IPython's display tools to render the graph directly in Jupyter notebooks or other compatible environments.tests/test_dependencies_executor.py (1)
11-11: Imported necessary functions for testing the graph generation functionality.pympipool/__init__.py (1)
7-10: Added checks for new parameters related to plotting the dependency graph.
| def submit(self, fn: callable, *args, resource_dict: dict = {}, **kwargs): | ||
| if not self._generate_dependency_graph: | ||
| f = super().submit(fn, *args, resource_dict=resource_dict, **kwargs) | ||
| else: | ||
| f = Future() | ||
| f.set_result(None) | ||
| task_dict = { | ||
| "fn": fn, | ||
| "args": args, | ||
| "kwargs": kwargs, | ||
| "future": f, | ||
| "resource_dict": resource_dict, | ||
| } | ||
| task_hash = generate_task_hash( | ||
| task_dict=task_dict, | ||
| future_hash_inverse_dict={ | ||
| v: k for k, v in self._future_hash_dict.items() | ||
| }, | ||
| ) | ||
| self._future_hash_dict[task_hash] = f | ||
| self._task_hash_dict[task_hash] = task_dict | ||
| return f | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The submit method has been adapted to handle task graph generation when enabled. However, using a mutable default argument (resource_dict) is a common Python pitfall, which could lead to unexpected behavior if the dictionary is modified. This should be corrected as indicated by the static analysis tool.
- def submit(self, fn: callable, *args, resource_dict: dict = {}, **kwargs): + def submit(self, fn: callable, *args, resource_dict: dict = None, **kwargs): + if resource_dict is None: + resource_dict = {}Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def submit(self, fn: callable, *args, resource_dict: dict = {}, **kwargs): | |
| if not self._generate_dependency_graph: | |
| f = super().submit(fn, *args, resource_dict=resource_dict, **kwargs) | |
| else: | |
| f = Future() | |
| f.set_result(None) | |
| task_dict = { | |
| "fn": fn, | |
| "args": args, | |
| "kwargs": kwargs, | |
| "future": f, | |
| "resource_dict": resource_dict, | |
| } | |
| task_hash = generate_task_hash( | |
| task_dict=task_dict, | |
| future_hash_inverse_dict={ | |
| v: k for k, v in self._future_hash_dict.items() | |
| }, | |
| ) | |
| self._future_hash_dict[task_hash] = f | |
| self._task_hash_dict[task_hash] = task_dict | |
| return f | |
| def submit(self, fn: callable, *args, resource_dict: dict = None, **kwargs): | |
| if resource_dict is None: | |
| resource_dict = {} | |
| if not self._generate_dependency_graph: | |
| f = super().submit(fn, *args, resource_dict=resource_dict, **kwargs) | |
| else: | |
| f = Future() | |
| f.set_result(None) | |
| task_dict = { | |
| "fn": fn, | |
| "args": args, | |
| "kwargs": kwargs, | |
| "future": f, | |
| "resource_dict": resource_dict, | |
| } | |
| task_hash = generate_task_hash( | |
| task_dict=task_dict, | |
| future_hash_inverse_dict={ | |
| v: k for k, v in self._future_hash_dict.items() | |
| }, | |
| ) | |
| self._future_hash_dict[task_hash] = f | |
| self._task_hash_dict[task_hash] = task_dict | |
| return f |
Tools
Ruff
35-35: Do not use mutable data structures for argument defaults (B006)
Replace with
None; initialize within function
| @unittest.skipIf( | ||
| skip_graphviz_test, | ||
| "graphviz is not installed, so the plot_dependency_graph test is skipped.", | ||
| ) | ||
| def test_executor_dependency_plot(self): | ||
| with Executor( | ||
| max_cores=1, | ||
| backend="local", | ||
| hostname_localhost=True, | ||
| plot_dependency_graph=True, | ||
| ) as exe: | ||
| cloudpickle_register(ind=1) | ||
| future_1 = exe.submit(add_function, 1, parameter_2=2) | ||
| future_2 = exe.submit(add_function, 1, parameter_2=future_1) | ||
| self.assertTrue(future_1.done()) | ||
| self.assertTrue(future_2.done()) | ||
| self.assertEqual(len(exe._future_hash_dict), 2) | ||
| self.assertEqual(len(exe._task_hash_dict), 2) | ||
| nodes, edges = generate_nodes_and_edges( | ||
| task_hash_dict=exe._task_hash_dict, | ||
| future_hash_inverse_dict={ | ||
| v: k for k, v in exe._future_hash_dict.items() | ||
| }, | ||
| ) | ||
| self.assertEqual(len(nodes), 5) | ||
| self.assertEqual(len(edges), 4) | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new test method test_executor_dependency_plot that checks the functionality of the dependency graph plotting. The test is well-structured and appropriately skips execution if graphviz is not installed. However, the import of pygraphviz should be adjusted to use importlib.util.find_spec for checking availability instead of catching an ImportError.
- import pygraphviz + from importlib import util + skip_graphviz_test = util.find_spec("pygraphviz") is NoneCommittable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @unittest.skipIf( | |
| skip_graphviz_test, | |
| "graphviz is not installed, so the plot_dependency_graph test is skipped.", | |
| ) | |
| def test_executor_dependency_plot(self): | |
| with Executor( | |
| max_cores=1, | |
| backend="local", | |
| hostname_localhost=True, | |
| plot_dependency_graph=True, | |
| ) as exe: | |
| cloudpickle_register(ind=1) | |
| future_1 = exe.submit(add_function, 1, parameter_2=2) | |
| future_2 = exe.submit(add_function, 1, parameter_2=future_1) | |
| self.assertTrue(future_1.done()) | |
| self.assertTrue(future_2.done()) | |
| self.assertEqual(len(exe._future_hash_dict), 2) | |
| self.assertEqual(len(exe._task_hash_dict), 2) | |
| nodes, edges = generate_nodes_and_edges( | |
| task_hash_dict=exe._task_hash_dict, | |
| future_hash_inverse_dict={ | |
| v: k for k, v in exe._future_hash_dict.items() | |
| }, | |
| ) | |
| self.assertEqual(len(nodes), 5) | |
| self.assertEqual(len(edges), 4) | |
| @unittest.skipIf( | |
| skip_graphviz_test, | |
| "graphviz is not installed, so the plot_dependency_graph test is skipped.", | |
| ) | |
| def test_executor_dependency_plot(self): | |
| with Executor( | |
| max_cores=1, | |
| backend="local", | |
| hostname_localhost=True, | |
| plot_dependency_graph=True, | |
| ) as exe: | |
| cloudpickle_register(ind=1) | |
| future_1 = exe.submit(add_function, 1, parameter_2=2) | |
| future_2 = exe.submit(add_function, 1, parameter_2=future_1) | |
| self.assertTrue(future_1.done()) | |
| self.assertTrue(future_2.done()) | |
| self.assertEqual(len(exe._future_hash_dict), 2) | |
| self.assertEqual(len(exe._task_hash_dict), 2) | |
| nodes, edges = generate_nodes_and_edges( | |
| task_hash_dict=exe._task_hash_dict, | |
| future_hash_inverse_dict={ | |
| v: k for k, v in exe._future_hash_dict.items() | |
| }, | |
| ) | |
| self.assertEqual(len(nodes), 5) | |
| self.assertEqual(len(edges), 4) | |
| + from importlib import util | |
| + skip_graphviz_test = util.find_spec("pygraphviz") is None |
This simplifies the debugging of complex task dependencies:
Summary by CodeRabbit
New Features
Dependency Updates
matplotlib,networkx, andipythonacross various environment configurations.h5iofrom older environment configurations.Testing Enhancements
CI Configuration Changes