Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Aug 30, 2025

This PR adds pytest support for the 01_store/store_bench.py benchmark by following the established pattern from test_load_bench.py.

Changes Made

Added bench_store() function

  • Extracted the core benchmarking logic from run_experiment() into a reusable bench_store() function
  • Maintains the same signature pattern as bench_load() for consistency:
    def bench_store(shmem, source_rank, destination_rank, buffer, BLOCK_SIZE, dtype, verbose=False, validate=False, num_experiments=1, num_warmup=0)

Refactored run_experiment()

  • Updated to use the new bench_store() function internally
  • Maintains full backward compatibility with existing CLI usage
  • Clean separation between argument parsing and benchmarking logic

Created test_store_bench.py

  • Follows the exact pattern established by test_load_bench.py
  • Parametrized tests covering different data types (int8, float16, bfloat16, float32)
  • Tests various buffer sizes and block sizes (512, 1024)
  • Properly imports the benchmark module and calls bench_store()

Fixed barrier synchronization issue

  • Removed explicit warmup call and barrier that was causing deadlocks in test environment
  • The iris.do_bench function handles warmup and barriers internally
  • Now matches the synchronization pattern used in bench_load function

Testing Structure

The test follows the established pattern:

@pytest.mark.parametrize("dtype", [torch.int8, torch.float16, torch.bfloat16, torch.float32]) @pytest.mark.parametrize("buffer_size, heap_size", [((1 << 32), (1 << 33))]) @pytest.mark.parametrize("block_size", [512, 1024]) def test_store_bench(dtype, buffer_size, heap_size, block_size): # Test implementation

This implementation provides a clean, testable interface while making minimal changes to the existing codebase and maintaining full backward compatibility.

Fixes #56.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement pytest for 01_store/store_bench.py Implement pytest for 01_store/store_bench.py Aug 30, 2025
Copilot finished work on behalf of mawad-amd August 30, 2025 22:07
Copilot AI requested a review from mawad-amd August 30, 2025 22:07
@mawad-amd
Copy link
Collaborator

@copilot

tests/examples/test_store_bench.py::test_store_bench[512-4294967296-8589934592-dtype0] Fatal Python error: Aborted Thread 0x00007ffb8433b640 (most recent call first): File "/opt/conda/envs/py_3.10/lib/python3.10/socket.py", line 293 in accept File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pytest_rerunfailures.py", line 433 in run_server File "/opt/conda/envs/py_3.10/lib/python3.10/threading.py", line 953 in run File "/opt/conda/envs/py_3.10/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/opt/conda/envs/py_3.10/lib/python3.10/threading.py", line 973 in _bootstrap Thread 0x00007ffb865bf740 (most recent call first): File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 896 in synchronize File "/home/runner/_work/iris/iris/iris/iris.py", line 351 in barrier File "/home/runner/_work/iris/iris/examples/01_store/store_bench.py", line 130 in bench_store File "/home/runner/_work/iris/iris/tests/examples/test_store_bench.py", line 57 in test_store_bench File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/python.py", line 1799 in runtest File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 262 in <lambda> File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 261 in call_runtest_hook File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 222 in call_and_report File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 133 in runtestprotocol File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 348 in pytest_runtestloop File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 323 in _main File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 269 in wrap_session File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/config/__init__.py", line 166 in main File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/_pytest/config/__init__.py", line 189 in console_main File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pytest/__main__.py", line 5 in <module> File "/opt/conda/envs/py_3.10/lib/python3.10/runpy.py", line 86 in _run_code File "/opt/conda/envs/py_3.10/lib/python3.10/runpy.py", line 196 in _run_module_as_main Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg.lapack_lite, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, mpi4py.MPI, hip_utils, __triton_launcher (total: 24) -------------------------------------------------------------------------- prterun noticed that process rank 0 with PID 802853 on node iris-githubrunner-mi3008x-5866b8ddb9-c5fh8 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Error: Process completed with exit code 134. 
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Aug 30, 2025

@copilot

tests/examples/test_store_bench.py::test_store_bench[512-4294967296-8589934592-dtype0] Fatal Python error: Aborted 

...

Fixed the barrier deadlock issue. The problem was an explicit warmup call and barrier before iris.do_bench that was causing a deadlock. The iris.do_bench function handles warmup and barriers internally, so the explicit calls were redundant and problematic. Removed the explicit warmup to match the pattern used in bench_load. (e753e36)

Copilot finished work on behalf of mawad-amd August 30, 2025 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants