⚡️ Speed up function _grouped_plot_by_column by 5% #152
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
📄 5% (0.05x) speedup for
_grouped_plot_by_columninpandas/plotting/_matplotlib/boxplot.py⏱️ Runtime :
663 milliseconds→630 milliseconds(best of5runs)📝 Explanation and details
The optimized code achieves a 5% speedup through several key improvements that reduce function call overhead and generator costs:
Key Optimizations:
Module-level import: Moved
import matplotlib.pyplot as pltto the module level instead of importing it insidecreate_subplots()on every call. This eliminates the repeated import overhead visible in the profiler (74,950ns per call).Direct array conversion: Replaced
np.fromiter(flatten_axes(ax), dtype=object)withnp.asarray(ax, dtype=object).reshape(-1)in two places withincreate_subplots(). This avoids generator overhead when dealing with array-like inputs, which is more efficient for numpy arrays and ABCIndex objects.Generator pre-computation: In
_grouped_plot_by_column(), stored the result offlatten_axes(axes)in a local variablefabefore the loop instead of calling it inline withinzip(). This prevents the generator from being recreated on each iteration.Loop optimization: Changed the visibility loop from
for ax in axarr[naxes:]tofor idx in range(naxes, nplots)withaxarr[idx].set_visible(False), which avoids array slicing overhead.Improved flatten_axes: Added explicit
dtype=objectparameter tonp.asarray()inflatten_axes()to ensure consistent behavior and potentially reduce conversion overhead.Performance Benefits:
These micro-optimizations compound to deliver measurable performance gains, particularly in data visualization workflows that create multiple subplots repeatedly.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_grouped_plot_by_column-mhdna7u9and push.