Hi! I have made an object detection pipeline which is not utilizing 100% of the GPU at all time. I have also wrapped the entire app in Data Flow Tracker to see the latency. Here are the results:
Data Flow Tracking Results:
Total paths: 2
Path 1: Polyp Detection App.replayer,Polyp Detection App.detection_preprocessor,Polyp Detection App.tensor_transposer,Polyp Detection App.detection_inference,Polyp Detection App.detection_postprocessor,Polyp Detection App.detection_visualizer
Number of messages: 2074
Min Latency Message No: 56
Min end-to-end Latency (ms): 34.215
Avg end-to-end Latency (ms): 53.2704
Max Latency Message No: 614
Max end-to-end Latency (ms): 135.284
Path 2: Polyp Detection App.replayer,Polyp Detection App.detection_visualizer
Number of messages: 2075
Min Latency Message No: 57
Min end-to-end Latency (ms): 34.215
Avg end-to-end Latency (ms): 53.2678
Max Latency Message No: 615
Max end-to-end Latency (ms): 135.284
I want to increase fps of my application. What are the steps to solve this?
Hi Dan
Glad you are using holoscan sdk and analyzing latency. Your latency of 135ms means you can run around 7 FPS. The GPU will not be at 100% since it is not used in all the operators. I am not sure which gpu are you using but for nano and AGX they have integrated gpus so it is affected by other things running on your system.
Seems like the Data Flow includes is including the replayer and the holoviz. These operations take the most time as it is reading from disk and showing on the screen.
How will your final application work ? if you are capturing from a camera I suggest you run benchmark with that.
You can also disable the holoviz. You can add - headless: true under holoviz to disable holoviz.
I would also recommend you do:
Hope that helps
1 Like
Hi! Actually I am evaluating the overall advantage of moving over to HoloscanSDK from DeepStreamSDK for my task. Right now using DeepStreamSDK is giving me over 40 fps using the same model and the same task.
My final application pipeline is something like this: Camera (Webcam or V4L2 source) —> Inference with my Model —> Display the video feed with Bounding Box overlay.
Right now the pipeline is very barebones and looks like the image attached. I am working with Nvidia Jetson Orin Nano.
Is there way to parallelize or run the application entirely on GPU?
Also I have noticed that converting the my sample video (.mp4) to gxf entities, results in a huge file (9GBs for a mere 50mb .mp4). I am using this procedure: holoscan-sdk/scripts at main · nvidia-holoscan/holoscan-sdk · GitHub
If there a faster way to do this? What happens if I use camera as a input source for my app.
Appreciate the Help!