- During normal operation, the command “nvpmodel -q --v” caused the system to freeze.
- There is no display when connecting the desktop to HDMI.
- The network is normal and SSH login is possible. Commands such as “ls” and “cat” can be executed normally.
- Executing the “lspci” command yields no response.
The time point when the problem occurred and the abnormal log information:
2025-09-08T15:23:42.442905+08:00 localhost kernel: r8152 2-3.1.4:1.0: Direct firmware load for rtl_nic/rtl8153b-2.fw failed with error -2
2025-09-08T15:59:15.901965+08:00 localhost kernel: r8152 2-3.1.4:1.0: Direct firmware load for rtl_nic/rtl8153b-2.fw failed with error -2
kern.log (5.5 MB)
Hi,
This is a known issue in JP7.0 (r38.2).
A fix will be included in an upcoming release in the next few weeks.
We appreciate your patience in the meantime.
Thanks
Thank you for your reply.
I would like to know what the specific reason for this issue is? Is it related to PCIe?
How can we follow up on and fix this issue?
Hi, David
i see R38.2.1 already released, but we will take some time to merge the code. can you give us the seperate patch to solve destop system crashed issue?
Hi liutee,
Sorry, but we don’t have a separate patch for this issue.
Please use the R38.2.1 release instead.
Thank you for your understanding.
Thanks,
David
DevKit uses the R38.2.1 release to reproduce the desktop freeze issue
The test process is as follows:
10-10 20:50:00 Start stress test
10-11 10:20:00 End stress test, check system and desktop display are normal
10-11 10:30:00 - 20:45:00 No operation was performed
10-11 20:45:00 Determine desktop freeze
Check logs and find abnormal logs as follows:
2025-10-11T17:25:55.661249+08:00 localhost kernel: INFO: task nvidia-modeset/:1701 blocked for more than 120 seconds.
2025-10-11T17:25:55.688912+08:00 localhost kernel: INFO: task kworker/u28:0:363362 blocked for more than 120 seconds.
2025-10-11T17:25:55.709145+08:00 localhost kernel: INFO: task kworker/2:0:414595 blocked for more than 120 seconds.
2025-10-11T17:25:55.723242+08:00 localhost kernel: INFO: task kworker/u28:1:414787 blocked for more than 120 seconds.
2025-10-11T17:25:55.751098+08:00 localhost kernel: INFO: task nvpmodel:416304 blocked for more than 120 seconds.
Detailed logs are attached: kern.log
kern.log (1.8 MB)
Without any load, leave it overnight and it will reappear
2025-10-15T01:47:34.574085+08:00 localhost kernel: INFO: task nvidia-modeset/:1718 blocked for more than 120 seconds.
2025-10-15T01:47:34.580808+08:00 localhost kernel: INFO: task nv_queue:1735 blocked for more than 120 seconds.
2025-10-15T01:47:34.601736+08:00 localhost kernel: INFO: task kworker/u28:1:116948 blocked for more than 120 seconds.
2025-10-15T01:47:34.622316+08:00 localhost kernel: INFO: task kworker/u28:2:123804 blocked for more than 120 seconds.
2025-10-15T01:47:34.643317+08:00 localhost kernel: INFO: task kworker/11:0:128572 blocked for more than 120 seconds.
2025-10-15T01:47:34.663548+08:00 localhost kernel: INFO: task nvpmodel:132778 blocked for more than 120 seconds.
lspci is stuck on reading /sys/bus/pci/devices/0000:01:00.0/config
Do you have any peripheral connected? We’ve been putting our Thor devkit for like many weeks but didn’t reproduce this behavior.
1、USB hub for connecting mouse and keyboard
2、HDMI for connecting monitor
Could you help test if connecting a HDMI is the key to this issue? Remove it and see if you could still reproduce this.
OK. Retest without HDMI
Hi @wangming9
Once you confirmed whether HDMI disconnection would reproduce issue or not, please try this command with reproducible setup and see if you could still reproduce issue. Thanks.
echo performance | sudo tee /sys/class/devfreq/gpu*/governor # confirm new governor is applied grep "" /sys/class/devfreq/gpu*/governor Hi, Wayne
HDMI connect is not the key to this issue. in our thor device(not devkit), we can reproduce this issue without HDMI connected. later we will try reproduce this issue at devkit without HDMI connected.
hi,
2025-10-21T00:39:18.775061+08:00 localhost kernel: INFO: task kworker/2:1:128 blocked for more than 120 seconds.
2025-10-21T00:39:18.775126+08:00 localhost kernel: Tainted: G W O 6.8.12-tegra #1
2025-10-21T00:39:18.775134+08:00 localhost kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
2025-10-21T00:39:18.775137+08:00 localhost kernel: task:kworker/2:1 state:D stack:0 pid:128 tgid:128 ppid:2 flags:0x00000008
2025-10-21T00:39:18.775139+08:00 localhost kernel: Workqueue: pm pm_runtime_work
2025-10-21T00:39:18.775143+08:00 localhost kernel: Call trace:
2025-10-21T00:39:18.775145+08:00 localhost kernel: __switch_to+0xe0/0x110
2025-10-21T00:39:18.775148+08:00 localhost kernel: __schedule+0x368/0xc14
2025-10-21T00:39:18.775151+08:00 localhost kernel: schedule+0x34/0xd8
2025-10-21T00:39:18.775153+08:00 localhost kernel: schedule_preempt_disabled+0x24/0x48
2025-10-21T00:39:18.775156+08:00 localhost kernel: __mutex_lock.constprop.0+0x2dc/0x580
2025-10-21T00:39:18.775158+08:00 localhost kernel: __mutex_lock_slowpath+0x14/0x28
2025-10-21T00:39:18.775161+08:00 localhost kernel: mutex_lock+0x50/0x64
2025-10-21T00:39:18.775164+08:00 localhost kernel: devfreq_monitor_suspend+0x20/0xa8
2025-10-21T00:39:18.775167+08:00 localhost kernel: 0xffffc456ce5e0a9c
2025-10-21T00:39:18.775170+08:00 localhost kernel: devfreq_suspend_device+0x50/0x104
2025-10-21T00:39:18.775172+08:00 localhost kernel: nv_set_gpu_pg_mask+0x42c/0x2ca8 [nvidia]
2025-10-21T00:39:18.775175+08:00 localhost kernel: nvidia_isr_kthread_bh+0x754/0x808 [nvidia]
2025-10-21T00:39:18.775178+08:00 localhost kernel: pci_pm_runtime_suspend+0x54/0x1c0
2025-10-21T00:39:18.775180+08:00 localhost kernel: genpd_runtime_suspend+0xa8/0x25c
2025-10-21T00:39:18.775183+08:00 localhost kernel: __rpm_callback+0x48/0x1d8
2025-10-21T00:39:18.775186+08:00 localhost kernel: rpm_callback+0x74/0x80
2025-10-21T00:39:18.775189+08:00 localhost kernel: rpm_suspend+0x114/0x66c
2025-10-21T00:39:18.775191+08:00 localhost kernel: pm_runtime_work+0xdc/0xe0
2025-10-21T00:39:18.775194+08:00 localhost kernel: process_one_work+0x170/0x424
2025-10-21T00:39:18.775197+08:00 localhost kernel: worker_thread+0x328/0x440
2025-10-21T00:39:18.775199+08:00 localhost kernel: kthread+0x110/0x124
2025-10-21T00:39:18.775202+08:00 localhost kernel: ret_from_fork+0x10/0x20
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
suspending
now the gpu is in suspending status, seems it cause the desktop system crashed. and devfreq program cause the gpu enter suspending status.
we will try to disable devfreq program through “echo performance | sudo tee /sys/class/devfreq/gpu*/governor”
Hi,
如果你要更新狀況的話麻煩也提供一下你們的環境. 比方說是在NV devkit還是你們自己的板子.
基本上我們希望現在所有狀況都在NV devkit上複製就好.
Hi, Wayne
上面的log都是在devkit R38.2.1版本上复现的,不过由于devkit的数量有限,我们也在尝试自己的板子上复现,现在有个现象,我们的板子上有两块thor芯片,两个系统。这两个系统的base kernel都是跟devkit对齐的,但是一个系统会有内核线程devfreq_wq,另外一个系统没有这个内核线程。想知道,是否有内核线程devfreq_wq,是由什么决定的?
因为这两个系统的gpu的governor是完全一样的,为什么一个系统有这个内核线程,另一个没有呢?现在确实是有这个内核线程的,并且没有disable dvfs的系统,复现桌面卡死的概率比较高。
目前在devkit上,我们都是disable dvfs,并且不接显示器,在压测,目前还没复现此问题。
請你把沒有的那台機器dmesg分享一下.
刚才确认了一下,我们自己两个系统的现象是一样的,这个内核线程不是一直有,都是开机5分钟之内不停地启动,停止,5分钟以后就不再启动了。我们的系统是R38.2.0
但是R38.2.1的devkit,这个内核线程是一直不停地启动,停止,5分钟以后也依然如此。所以现在devkit比我们的板子更容易复现桌面卡死的问题。
请帮忙确认下,这个devfreq_wq内核线程什么时候会停止启动?为什么R38.2.0跟R38.2.1的版本的现象不一样?
如下是我们的板子(也就是R38.2.0)的dmesg信息。
r38.2.0_dmesg.txt (131.7 KB)
請問你所謂的一直啟動是什麼意思? 你觀察的方法是什麼?
麻煩都先回報最新的release看到的狀況就好.