- Notifications
You must be signed in to change notification settings - Fork 801
Open
Labels
Description
Describe the bug
While submitting consecutive host tasks to an in-order queue without explicit wait(), the execution time of each host task explodes as the number of submission increases.
To reproduce
Reproducing code
// test.cpp #include <sycl/sycl.hpp> #include <iostream> #include <thread> #include <chrono> int main(int argc, char *argv[]) { sycl::queue queue(sycl::property::queue::in_order{}); std::cout << "Using device: " << queue.get_device().get_info<sycl::info::device::name>() << "\n"; int repeat = 10000; if (argc > 1) { repeat = std::stoi(std::string(argv[1])); } int data = 0; std::cout << "Submitting " << repeat << " host tasks...\n"; auto start_time = std::chrono::high_resolution_clock::now(); for (int i = 0; i < repeat; i++) { std::this_thread::sleep_for(std::chrono::microseconds(500)); auto e = queue.submit([&](sycl::handler &cgh) { cgh.host_task([&]() { // Simulate some work on the host std::this_thread::sleep_for(std::chrono::milliseconds(1)); data++; }); }); #ifdef WAIT e.wait(); #endif } queue.wait(); auto end_time = std::chrono::high_resolution_clock::now(); std::cout << "Total execution time: " << std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time).count() << " ms\n"; if (data != repeat) { std::cerr << "Error: data mismatch! Expected " << repeat << ", got " << data << "\n"; return 1; } return 0; }Compile
Compile the code w/ and w/o explicit wait for each submission.
clang++ -fsycl test.cpp -o nowait.out clang++ -fsycl test.cpp -DWAIT -o wait.outRun
Pass the number of consecutive submission (repeat) via first argument.
./nowait.out 3000 ./wait.out 3000Results for different repeat
Total time in ms
| repeat | 10 | 100 | 1000 | 3000 | 10000 |
|---|---|---|---|---|---|
| wait.out | 16 | 162 | 1617 | 4853 | 16184 |
| nowait.out | 11 | 106 | 1396 | 12996 | 519977 |
Avg time in ms
| repeat | 10 | 100 | 1000 | 3000 | 10000 |
|---|---|---|---|---|---|
| wait.out | 1.6 | 1.62 | 1.617 | 1.618 | 1.6184 |
| nowait.out | 1.1 | 1.06 | 1.396 | 4.332 | 51.9977 |
Expected behavior
Even w/o explicit wait() for each submission (onto an in-order queue), the average execution time of each host task should be around 1ms. The 50x slowdown when repeat==10000 is not expected.
Environment
- OS: Linux
- Target device and vendor: host
- DPC++ version: 7987a43
- Dependencies version: Not relevant
Additional context
No response