Holoscan Application not utilizing 100% GPU

dan.behnal · September 17, 2025, 9:08am

Hi! I have made an object detection pipeline which is not utilizing 100% of the GPU at all time. I have also wrapped the entire app in Data Flow Tracker to see the latency. Here are the results:
Data Flow Tracking Results:
Total paths: 2

Path 1: Polyp Detection App.replayer,Polyp Detection App.detection_preprocessor,Polyp Detection App.tensor_transposer,Polyp Detection App.detection_inference,Polyp Detection App.detection_postprocessor,Polyp Detection App.detection_visualizer
Number of messages: 2074
Min Latency Message No: 56
Min end-to-end Latency (ms): 34.215
Avg end-to-end Latency (ms): 53.2704
Max Latency Message No: 614
Max end-to-end Latency (ms): 135.284

Path 2: Polyp Detection App.replayer,Polyp Detection App.detection_visualizer
Number of messages: 2075
Min Latency Message No: 57
Min end-to-end Latency (ms): 34.215
Avg end-to-end Latency (ms): 53.2678
Max Latency Message No: 615
Max end-to-end Latency (ms): 135.284

I want to increase fps of my application. What are the steps to solve this?

aharouni · September 17, 2025, 8:51pm

Hi Dan

Glad you are using holoscan sdk and analyzing latency. Your latency of 135ms means you can run around 7 FPS. The GPU will not be at 100% since it is not used in all the operators. I am not sure which gpu are you using but for nano and AGX they have integrated gpus so it is affected by other things running on your system.

Seems like the Data Flow includes is including the replayer and the holoviz. These operations take the most time as it is reading from disk and showing on the screen.
How will your final application work ? if you are capturing from a camera I suggest you run benchmark with that.
You can also disable the holoviz. You can add - headless: true under holoviz to disable holoviz.

I would also recommend you do:

Profiling using NSight Systems Profiling
Benchmark tools available in holohub benchmark

Hope that helps

dan.behnal · September 18, 2025, 7:42am

Hi! Actually I am evaluating the overall advantage of moving over to HoloscanSDK from DeepStreamSDK for my task. Right now using DeepStreamSDK is giving me over 40 fps using the same model and the same task.
My final application pipeline is something like this: Camera (Webcam or V4L2 source) —> Inference with my Model —> Display the video feed with Bounding Box overlay.
Right now the pipeline is very barebones and looks like the image attached. I am working with Nvidia Jetson Orin Nano.
Is there way to parallelize or run the application entirely on GPU?
Also I have noticed that converting the my sample video (.mp4) to gxf entities, results in a huge file (9GBs for a mere 50mb .mp4). I am using this procedure: holoscan-sdk/scripts at main · nvidia-holoscan/holoscan-sdk · GitHub
If there a faster way to do this? What happens if I use camera as a input source for my app.

Appreciate the Help!

Topic		Replies	Views
Low GPU Utilization during inference DeepStream SDK gpu	4	1449	October 12, 2021
GPU-Util is only 70% but FPS is not high DeepStream SDK	3	642	October 12, 2021
Deepstream-app latency increasing DeepStream SDK	12	2687	October 12, 2021
Why the fps is not crossing 35 even though free GPU space available DeepStream SDK	16	853	October 12, 2021
How to increase GPU utilization for reference application DeepStream SDK performance	4	806	October 12, 2021
High CPU and low GPU utilization on Ubuntu 18.04, RTX2080. How to improve GPU utilization? DeepStream SDK performance	5	2379	June 2, 2020
Object Detection Performance Jetson Tx2 slower than expected Jetson TX2	22	14888	October 18, 2021
How to increase GPU utilization? DeepStream SDK	4	1208	October 12, 2021
Increase FPS of Jetson-inference using complete utilisation of CPU-GPU Jetson TX2	7	2303	October 18, 2021
DeepStream Python SSD : Not utilising GPU and it is slow DeepStream SDK	4	584	October 12, 2021

Holoscan Application not utilizing 100% GPU

Related topics