- Notifications
You must be signed in to change notification settings - Fork 3
Refactor HDF5 interface #775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
WalkthroughThis change migrates cache file handling from Changes
Sequence Diagram(s)sequenceDiagram participant User participant executorlib.__init__ participant executorlib.standalone.hdf User->>executorlib.__init__: get_cache_data(cache_directory) executorlib.__init__->>executorlib.standalone.hdf: get_cache_data(cache_directory) executorlib.standalone.hdf->>executorlib.standalone.hdf: get_cache_files(cache_directory) executorlib.standalone.hdf->>executorlib.standalone.hdf: _get_content_of_file(file_name) for each file executorlib.standalone.hdf-->>executorlib.__init__: list of cache data dicts executorlib.__init__-->>User: list of cache data dicts Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
executorlib/standalone/hdf.py (2)
124-137: Consider error handling and Python version compatibility.The implementation is clean and well-structured. However, consider these points:
- The dictionary union operator (
|) requires Python 3.9+. Ensure this aligns with your minimum supported version.- Consider adding error handling for file reading failures to prevent the entire operation from failing due to a single corrupted file.
def get_cache_data(cache_directory: str) -> list[dict]: """ Collect all HDF5 files in the cache directory Args: cache_directory (str): The directory to store cache files. Returns: list[dict]: List of dictionaries each representing on of the HDF5 files in the cache directory. """ - return [ - _get_content_of_file(file_name=file_name) | {"filename": file_name} - for file_name in get_cache_files(cache_directory=cache_directory) - ] + result = [] + for file_name in get_cache_files(cache_directory=cache_directory): + try: + content = _get_content_of_file(file_name=file_name) + content["filename"] = file_name + result.append(content) + except Exception as e: + # Log error and continue with other files + print(f"Warning: Failed to read {file_name}: {e}") + return result
157-172: Good implementation, consider adding error handling.The function correctly extracts HDF5 content using the standardized
group_dictmapping and only includes existing datasets. However, consider adding error handling for file I/O and deserialization failures to improve robustness.def _get_content_of_file(file_name: str) -> dict: """ Get content of an HDF5 file Args: file_name (str): file name Returns: dict: Content of HDF5 file """ - with h5py.File(file_name, "r") as hdf: - return { - key: cloudpickle.loads(np.void(hdf["/" + key])) - for key in group_dict.values() - if key in hdf - } + try: + with h5py.File(file_name, "r") as hdf: + result = {} + for key in group_dict.values(): + if key in hdf: + try: + result[key] = cloudpickle.loads(np.void(hdf["/" + key])) + except Exception as e: + print(f"Warning: Failed to deserialize {key} from {file_name}: {e}") + return result + except Exception as e: + print(f"Warning: Failed to open {file_name}: {e}") + return {}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
executorlib/__init__.py(3 hunks)executorlib/standalone/cache.py(0 hunks)executorlib/standalone/hdf.py(2 hunks)executorlib/task_scheduler/file/backend.py(1 hunks)executorlib/task_scheduler/file/queue_spawner.py(1 hunks)executorlib/task_scheduler/file/shared.py(1 hunks)executorlib/task_scheduler/file/subprocess_spawner.py(1 hunks)executorlib/task_scheduler/interactive/shared.py(1 hunks)tests/test_cache_backend_execute.py(1 hunks)tests/test_fluxclusterexecutor.py(1 hunks)tests/test_slurmclusterexecutor.py(1 hunks)tests/test_standalone_hdf.py(1 hunks)
💤 Files with no reviewable changes (1)
- executorlib/standalone/cache.py
🧰 Additional context used
🧬 Code Graph Analysis (10)
tests/test_slurmclusterexecutor.py (1)
executorlib/standalone/hdf.py (1)
dump(20-37)
executorlib/task_scheduler/file/queue_spawner.py (1)
executorlib/standalone/hdf.py (2)
dump(20-37)get_queue_id(107-121)
executorlib/task_scheduler/file/backend.py (1)
executorlib/standalone/hdf.py (2)
dump(20-37)load(40-68)
executorlib/task_scheduler/file/subprocess_spawner.py (1)
executorlib/standalone/hdf.py (1)
dump(20-37)
executorlib/task_scheduler/interactive/shared.py (1)
executorlib/standalone/hdf.py (3)
dump(20-37)get_cache_files(140-154)get_output(71-87)
executorlib/__init__.py (2)
executorlib/standalone/hdf.py (1)
get_cache_data(124-137)executorlib/task_scheduler/file/queue_spawner.py (1)
terminate_tasks_in_cache(93-117)
tests/test_cache_backend_execute.py (1)
executorlib/standalone/hdf.py (2)
dump(20-37)get_runtime(90-104)
executorlib/task_scheduler/file/shared.py (1)
executorlib/standalone/hdf.py (2)
get_cache_files(140-154)get_output(71-87)
executorlib/standalone/hdf.py (1)
executorlib/__init__.py (1)
get_cache_data(30-42)
tests/test_fluxclusterexecutor.py (1)
executorlib/standalone/hdf.py (1)
dump(20-37)
🔇 Additional comments (17)
executorlib/task_scheduler/file/backend.py (1)
6-6: LGTM! Clean import path update.The import change from
executorlib.task_scheduler.file.hdftoexecutorlib.standalone.hdfaligns with the HDF5 interface refactoring objectives. The function signatures and usage remain consistent throughout the file.executorlib/task_scheduler/file/queue_spawner.py (1)
6-6: LGTM! Import path update is consistent.The import change correctly updates the path for
dumpandget_queue_idfunctions to align with the HDF5 interface consolidation. Function usage throughout the file remains unchanged and consistent.executorlib/task_scheduler/file/subprocess_spawner.py (1)
6-6: LGTM! Simple import path update.The import change for the
dumpfunction fromexecutorlib.task_scheduler.file.hdftoexecutorlib.standalone.hdfis correct and consistent with the refactoring objectives.tests/test_standalone_hdf.py (1)
7-13: LGTM! Test imports updated correctly.The import paths have been correctly updated to use the consolidated
executorlib.standalone.hdfmodule. This ensures the test suite validates the refactored HDF5 functionality in its new location.tests/test_slurmclusterexecutor.py (1)
17-17: LGTM! Test import updated for consistency.The import path update for the
dumpfunction ensures this test file remains compatible with the HDF5 interface refactoring.tests/test_cache_backend_execute.py (1)
10-10: LGTM: Import path updated correctly.The import change from
executorlib.task_scheduler.file.hdftoexecutorlib.standalone.hdfaligns with the HDF5 interface refactoring. The functionsdumpandget_runtimeare available in the new module with compatible signatures.tests/test_fluxclusterexecutor.py (1)
13-13: LGTM: Import path updated correctly.The import change from
executorlib.task_scheduler.file.hdftoexecutorlib.standalone.hdfis consistent with the HDF5 interface refactoring. Thedumpfunction is available in the new module.executorlib/task_scheduler/file/shared.py (1)
8-8: LGTM: Import consolidation aligns with refactoring objectives.The consolidation of
get_cache_filesandget_outputimports intoexecutorlib.standalone.hdfsuccessfully centralizes HDF5-related functionality. Both functions are available with compatible signatures in the new module.executorlib/task_scheduler/interactive/shared.py (1)
132-132: LGTM: Import update maintains consistency.The import change consolidates HDF5 functions into
executorlib.standalone.hdf, maintaining consistency with the overall refactoring. All three functions (dump,get_cache_files,get_output) are available with compatible signatures.executorlib/__init__.py (6)
15-15: LGTM: Added necessary import for type hints.The
Optionalimport is correctly added to support the type hints in the new wrapper functions.
17-17: LGTM: Version import updated to absolute path.The import change from relative to absolute import (
executorlib._version) improves clarity and consistency.
30-42: LGTM: Clean wrapper function implementation.The
get_cache_datawrapper function provides a clean public API that delegates to the underlying implementation inexecutorlib.standalone.hdf. The function signature and documentation are consistent with the wrapped function.
45-64: LGTM: Well-designed wrapper function.The
terminate_tasks_in_cachewrapper function properly delegates to the underlying implementation while providing a clean public interface. The function signature matches the wrapped function and includes appropriate type hints.
69-69: LGTM: Updated all list correctly.The addition of
terminate_tasks_in_cacheto the__all__list correctly exposes the new public function.
78-78: LGTM: Version reference updated consistently.The version reference is updated to match the absolute import path used above.
executorlib/standalone/hdf.py (2)
8-17: LGTM! Good consolidation of mapping logic.The inline
group_dictcentralizes the key-to-HDF5-dataset-name mapping, improving maintainability and consistency across the module.
140-154: LGTM! Efficient file discovery implementation.The recursive directory traversal using
os.walkand filtering by the "_o.h5" suffix is appropriate for finding HDF5 output files in the cache directory.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@ ## main #775 +/- ## ========================================== + Coverage 97.53% 97.65% +0.12% ========================================== Files 33 32 -1 Lines 1460 1451 -9 ========================================== - Hits 1424 1417 -7 + Misses 36 34 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary by CodeRabbit
New Features
Refactor
Chores