All of lore.kernel.org
 help / color / mirror / Atom feed
* Radeon driver hang with 4.9.0-rt
@ 2017-02-23 18:15 Wargreen
  2017-02-24  3:38 ` Pankaj Gupta
  0 siblings, 1 reply; 18+ messages in thread
From: Wargreen @ 2017-02-23 18:15 UTC (permalink / raw)
  To: linux-rt-users

Hello linux-rt list,

After swith to a radeon HD5850, i have a full graphic hang when an
application require lot of GPU. Kill this application (via ssh) resume
the normal operation.
I run Debian SID, it's happen with both linux-image-4.9.0.1-rt-amd64 and
linux-image-4.9.0.2-rt-amd64.

Extract of my syslog (full version at
http://kalaj.darktech.org/~wargreen/logs/radeon/syslog)  :

Feb 19 23:17:19 LaChoze kernel: [  605.169580] INFO: task Xorg:2014 blocked for more than 120 seconds.
Feb 19 23:17:19 LaChoze kernel: [  605.169588]       Not tainted 4.9.0-1-rt-amd64 #1
Feb 19 23:17:19 LaChoze kernel: [  605.169590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 19 23:17:19 LaChoze kernel: [  605.169593] Xorg            D    0  2014   2012 0x00000004
Feb 19 23:17:19 LaChoze kernel: [  605.169598]  ffff9235afe24800 0000000000000000 ffff9235b6c69200 ffff9235b12e9f80
Feb 19 23:17:19 LaChoze kernel: [  605.169602]  ffff9235b4e4ee40 ffffa16b023a7be8 ffffffff85a16cbf 00ff9235b12e9f80
Feb 19 23:17:19 LaChoze kernel: [  605.169606]  ffff9235b6c69200 ffff9235b12e9f80 ffff9235b12e9f80 0000000000000000
Feb 19 23:17:19 LaChoze kernel: [  605.169611] Call Trace:
Feb 19 23:17:19 LaChoze kernel: [  605.169620]  [<ffffffff85a16cbf>] ? __schedule+0x26f/0x5b0
Feb 19 23:17:19 LaChoze kernel: [  605.169623]  [<ffffffff85a17043>] ? schedule+0x43/0xc0
Feb 19 23:17:19 LaChoze kernel: [  605.169625]  [<ffffffff85a18aa3>] ? __rt_mutex_slowlock+0x83/0x150
Feb 19 23:17:19 LaChoze kernel: [  605.169628]  [<ffffffff85a18fcc>] ? rt_mutex_slowlock+0x11c/0x2a0
Feb 19 23:17:19 LaChoze kernel: [  605.169633]  [<ffffffff854cd9d4>] ? rt_down_read+0x24/0x30
Feb 19 23:17:19 LaChoze kernel: [  605.169663]  [<ffffffffc0b2eaf7>] ? radeon_gem_create_ioctl+0x37/0x100 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169680]  [<ffffffffc09f9b9a>] ? drm_ioctl+0x1ea/0x470 [drm]
Feb 19 23:17:19 LaChoze kernel: [  605.169703]  [<ffffffffc0b2eac0>] ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169706]  [<ffffffff8547dc82>] ? unpin_current_cpu+0x12/0x70
Feb 19 23:17:19 LaChoze kernel: [  605.169723]  [<ffffffffc0afd049>] ? radeon_drm_ioctl+0x49/0x80 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169726]  [<ffffffff856257bf>] ? do_vfs_ioctl+0x9f/0x610
Feb 19 23:17:19 LaChoze kernel: [  605.169730]  [<ffffffff858f2e4d>] ? __sys_recvmsg+0x7d/0x90
Feb 19 23:17:19 LaChoze kernel: [  605.169733]  [<ffffffff85625da4>] ? SyS_ioctl+0x74/0x80
Feb 19 23:17:19 LaChoze kernel: [  605.169736]  [<ffffffff85a1b27b>] ? system_call_fast_compare_end+0xc/0x9b
Feb 19 23:17:19 LaChoze kernel: [  605.169770] INFO: task E.d3d9.OpenGL:4263 blocked for more than 120 seconds.
Feb 19 23:17:19 LaChoze kernel: [  605.169773]       Not tainted 4.9.0-1-rt-amd64 #1
Feb 19 23:17:19 LaChoze kernel: [  605.169774] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 19 23:17:19 LaChoze kernel: [  605.169776] E.d3d9.OpenGL   D    0  4263   4243 0x20020000
Feb 19 23:17:19 LaChoze kernel: [  605.169779]  ffff9235b0839b00 0000000000000000 ffff9235b6ce9200 ffff92341d6a4ec0
Feb 19 23:17:19 LaChoze kernel: [  605.169783]  ffff9235b4e4af40 ffffa16b020a3b40 ffffffff85a16cbf 0000000000000000
Feb 19 23:17:19 LaChoze kernel: [  605.169787]  ffff9235b6ce9200 ffff92341d6a4ec0 ffff92341d6a4ec0 0000000000000000
Feb 19 23:17:19 LaChoze kernel: [  605.169791] Call Trace:
Feb 19 23:17:19 LaChoze kernel: [  605.169795]  [<ffffffff85a16cbf>] ? __schedule+0x26f/0x5b0
Feb 19 23:17:19 LaChoze kernel: [  605.169798]  [<ffffffff85a17043>] ? schedule+0x43/0xc0
Feb 19 23:17:19 LaChoze kernel: [  605.169800]  [<ffffffff85a18aa3>] ? __rt_mutex_slowlock+0x83/0x150
Feb 19 23:17:19 LaChoze kernel: [  605.169802]  [<ffffffff85a18fcc>] ? rt_mutex_slowlock+0x11c/0x2a0
Feb 19 23:17:19 LaChoze kernel: [  605.169805]  [<ffffffff854cd9d4>] ? rt_down_read+0x24/0x30
Feb 19 23:17:19 LaChoze kernel: [  605.169827]  [<ffffffffc0b1becd>] ? radeon_bo_create+0x15d/0x280 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169849]  [<ffffffffc0b2e6ce>] ? radeon_gem_object_create+0xbe/0x1a0 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169870]  [<ffffffffc0b2eb29>] ? radeon_gem_create_ioctl+0x69/0x100 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169884]  [<ffffffffc09f9b9a>] ? drm_ioctl+0x1ea/0x470 [drm]
Feb 19 23:17:19 LaChoze kernel: [  605.169905]  [<ffffffffc0b2eac0>] ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169908]  [<ffffffff8547dc82>] ? unpin_current_cpu+0x12/0x70
Feb 19 23:17:19 LaChoze kernel: [  605.169924]  [<ffffffffc0afd049>] ? radeon_drm_ioctl+0x49/0x80 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169950]  [<ffffffffc0be89c0>] ? radeon_kms_compat_ioctl+0x10/0x20 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169953]  [<ffffffff8566e4ec>] ? compat_SyS_ioctl+0xbc/0x1140
Feb 19 23:17:19 LaChoze kernel: [  605.169956]  [<ffffffff85a1704d>] ? schedule+0x4d/0xc0
Feb 19 23:17:19 LaChoze kernel: [  605.169959]  [<ffffffff85403e1d>] ? do_fast_syscall_32+0x8d/0x170
Feb 19 23:17:19 LaChoze kernel: [  605.169962]  [<ffffffff85a1caac>] ? entry_SYSENTER_compat+0x4c/0x5b
[...]
Feb 19 23:19:20 LaChoze kernel: [  726.004439] INFO: task gnome-shell:1825 blocked for more than 120 seconds.
Feb 19 23:19:20 LaChoze kernel: [  726.004447]       Not tainted 4.9.0-1-rt-amd64 #1
Feb 19 23:19:20 LaChoze kernel: [  726.004449] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 19 23:19:20 LaChoze kernel: [  726.004452] gnome-shell     D    0  1825   1817 0x00000000
Feb 19 23:19:20 LaChoze kernel: [  726.004457]  ffff9235b3591200 0000000000000000 ffff9235b6d69200 ffff923596402f40
Feb 19 23:19:20 LaChoze kernel: [  726.004461]  ffff9235b4e49f80 ffffa16b01f67be8 ffffffff85a16cbf 0000000000000000
Feb 19 23:19:20 LaChoze kernel: [  726.004465]  ffff9235b6d69200 0000000000000001 ffff923596402f40 0000000000000000
Feb 19 23:19:20 LaChoze kernel: [  726.004469] Call Trace:
[.....]

Is there a workaround ?
I'm here for more informations or tests.

Thanks for the RT !


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-02-23 18:15 Radeon driver hang with 4.9.0-rt Wargreen
@ 2017-02-24  3:38 ` Pankaj Gupta
  2017-02-24 22:09   ` Wargreen
  0 siblings, 1 reply; 18+ messages in thread
From: Pankaj Gupta @ 2017-02-24  3:38 UTC (permalink / raw)
  To: Wargreen; +Cc: linux-rt-users

Hello,

> 
> Hello linux-rt list,
> 
> After swith to a radeon HD5850, i have a full graphic hang when an
> application require lot of GPU. Kill this application (via ssh) resume
> the normal operation.
> I run Debian SID, it's happen with both linux-image-4.9.0.1-rt-amd64 and
> linux-image-4.9.0.2-rt-amd64.
> 
> Extract of my syslog (full version at
> http://kalaj.darktech.org/~wargreen/logs/radeon/syslog)  :
> 
> Feb 19 23:17:19 LaChoze kernel: [  605.169580] INFO: task Xorg:2014 blocked
> for more than 120 seconds.
> Feb 19 23:17:19 LaChoze kernel: [  605.169588]       Not tainted
> 4.9.0-1-rt-amd64 #1
> Feb 19 23:17:19 LaChoze kernel: [  605.169590] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 19 23:17:19 LaChoze kernel: [  605.169593] Xorg            D    0  2014
> 2012 0x00000004
> Feb 19 23:17:19 LaChoze kernel: [  605.169598]  ffff9235afe24800
> 0000000000000000 ffff9235b6c69200 ffff9235b12e9f80
> Feb 19 23:17:19 LaChoze kernel: [  605.169602]  ffff9235b4e4ee40
> ffffa16b023a7be8 ffffffff85a16cbf 00ff9235b12e9f80
> Feb 19 23:17:19 LaChoze kernel: [  605.169606]  ffff9235b6c69200
> ffff9235b12e9f80 ffff9235b12e9f80 0000000000000000
> Feb 19 23:17:19 LaChoze kernel: [  605.169611] Call Trace:
> Feb 19 23:17:19 LaChoze kernel: [  605.169620]  [<ffffffff85a16cbf>] ?
> __schedule+0x26f/0x5b0
> Feb 19 23:17:19 LaChoze kernel: [  605.169623]  [<ffffffff85a17043>] ?
> schedule+0x43/0xc0
> Feb 19 23:17:19 LaChoze kernel: [  605.169625]  [<ffffffff85a18aa3>] ?
> __rt_mutex_slowlock+0x83/0x150
> Feb 19 23:17:19 LaChoze kernel: [  605.169628]  [<ffffffff85a18fcc>] ?
> rt_mutex_slowlock+0x11c/0x2a0
> Feb 19 23:17:19 LaChoze kernel: [  605.169633]  [<ffffffff854cd9d4>] ?
> rt_down_read+0x24/0x30
> Feb 19 23:17:19 LaChoze kernel: [  605.169663]  [<ffffffffc0b2eaf7>] ?
> radeon_gem_create_ioctl+0x37/0x100 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169680]  [<ffffffffc09f9b9a>] ?
> drm_ioctl+0x1ea/0x470 [drm]
> Feb 19 23:17:19 LaChoze kernel: [  605.169703]  [<ffffffffc0b2eac0>] ?
> radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169706]  [<ffffffff8547dc82>] ?
> unpin_current_cpu+0x12/0x70
> Feb 19 23:17:19 LaChoze kernel: [  605.169723]  [<ffffffffc0afd049>] ?
> radeon_drm_ioctl+0x49/0x80 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169726]  [<ffffffff856257bf>] ?
> do_vfs_ioctl+0x9f/0x610
> Feb 19 23:17:19 LaChoze kernel: [  605.169730]  [<ffffffff858f2e4d>] ?
> __sys_recvmsg+0x7d/0x90
> Feb 19 23:17:19 LaChoze kernel: [  605.169733]  [<ffffffff85625da4>] ?
> SyS_ioctl+0x74/0x80
> Feb 19 23:17:19 LaChoze kernel: [  605.169736]  [<ffffffff85a1b27b>] ?
> system_call_fast_compare_end+0xc/0x9b
> Feb 19 23:17:19 LaChoze kernel: [  605.169770] INFO: task E.d3d9.OpenGL:4263
> blocked for more than 120 seconds.
> Feb 19 23:17:19 LaChoze kernel: [  605.169773]       Not tainted
> 4.9.0-1-rt-amd64 #1
> Feb 19 23:17:19 LaChoze kernel: [  605.169774] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 19 23:17:19 LaChoze kernel: [  605.169776] E.d3d9.OpenGL   D    0  4263
> 4243 0x20020000
> Feb 19 23:17:19 LaChoze kernel: [  605.169779]  ffff9235b0839b00
> 0000000000000000 ffff9235b6ce9200 ffff92341d6a4ec0
> Feb 19 23:17:19 LaChoze kernel: [  605.169783]  ffff9235b4e4af40
> ffffa16b020a3b40 ffffffff85a16cbf 0000000000000000
> Feb 19 23:17:19 LaChoze kernel: [  605.169787]  ffff9235b6ce9200
> ffff92341d6a4ec0 ffff92341d6a4ec0 0000000000000000
> Feb 19 23:17:19 LaChoze kernel: [  605.169791] Call Trace:
> Feb 19 23:17:19 LaChoze kernel: [  605.169795]  [<ffffffff85a16cbf>] ?
> __schedule+0x26f/0x5b0
> Feb 19 23:17:19 LaChoze kernel: [  605.169798]  [<ffffffff85a17043>] ?
> schedule+0x43/0xc0
> Feb 19 23:17:19 LaChoze kernel: [  605.169800]  [<ffffffff85a18aa3>] ?
> __rt_mutex_slowlock+0x83/0x150
> Feb 19 23:17:19 LaChoze kernel: [  605.169802]  [<ffffffff85a18fcc>] ?
> rt_mutex_slowlock+0x11c/0x2a0
> Feb 19 23:17:19 LaChoze kernel: [  605.169805]  [<ffffffff854cd9d4>] ?
> rt_down_read+0x24/0x30
> Feb 19 23:17:19 LaChoze kernel: [  605.169827]  [<ffffffffc0b1becd>] ?
> radeon_bo_create+0x15d/0x280 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169849]  [<ffffffffc0b2e6ce>] ?
> radeon_gem_object_create+0xbe/0x1a0 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169870]  [<ffffffffc0b2eb29>] ?
> radeon_gem_create_ioctl+0x69/0x100 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169884]  [<ffffffffc09f9b9a>] ?
> drm_ioctl+0x1ea/0x470 [drm]
> Feb 19 23:17:19 LaChoze kernel: [  605.169905]  [<ffffffffc0b2eac0>] ?
> radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169908]  [<ffffffff8547dc82>] ?
> unpin_current_cpu+0x12/0x70
> Feb 19 23:17:19 LaChoze kernel: [  605.169924]  [<ffffffffc0afd049>] ?
> radeon_drm_ioctl+0x49/0x80 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169950]  [<ffffffffc0be89c0>] ?
> radeon_kms_compat_ioctl+0x10/0x20 [radeon]
> Feb 19 23:17:19 LaChoze kernel: [  605.169953]  [<ffffffff8566e4ec>] ?
> compat_SyS_ioctl+0xbc/0x1140
> Feb 19 23:17:19 LaChoze kernel: [  605.169956]  [<ffffffff85a1704d>] ?
> schedule+0x4d/0xc0
> Feb 19 23:17:19 LaChoze kernel: [  605.169959]  [<ffffffff85403e1d>] ?
> do_fast_syscall_32+0x8d/0x170
> Feb 19 23:17:19 LaChoze kernel: [  605.169962]  [<ffffffff85a1caac>] ?
> entry_SYSENTER_compat+0x4c/0x5b
> [...]
> Feb 19 23:19:20 LaChoze kernel: [  726.004439] INFO: task gnome-shell:1825
> blocked for more than 120 seconds.
> Feb 19 23:19:20 LaChoze kernel: [  726.004447]       Not tainted
> 4.9.0-1-rt-amd64 #1
> Feb 19 23:19:20 LaChoze kernel: [  726.004449] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 19 23:19:20 LaChoze kernel: [  726.004452] gnome-shell     D    0  1825
> 1817 0x00000000
> Feb 19 23:19:20 LaChoze kernel: [  726.004457]  ffff9235b3591200
> 0000000000000000 ffff9235b6d69200 ffff923596402f40
> Feb 19 23:19:20 LaChoze kernel: [  726.004461]  ffff9235b4e49f80
> ffffa16b01f67be8 ffffffff85a16cbf 0000000000000000
> Feb 19 23:19:20 LaChoze kernel: [  726.004465]  ffff9235b6d69200
> 0000000000000001 ffff923596402f40 0000000000000000
> Feb 19 23:19:20 LaChoze kernel: [  726.004469] Call Trace:
> [.....]
> 
> Is there a workaround ?
> I'm here for more informations or tests.

It looks like some other task holding the mutex resulting
these tasks to sleep and hung the system.

One step to debug this is to find out which task is holding
the mutex and for how long? Taking crash of hung system
and see whats going on is one way but a harsh one.

Another step is start with version where it was working before?
and try to bisect or narrow down.

Others might have better ideas.

> 
> Thanks for the RT !
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-02-24  3:38 ` Pankaj Gupta
@ 2017-02-24 22:09   ` Wargreen
  2017-02-26 11:55     ` Daniel Vidal
  0 siblings, 1 reply; 18+ messages in thread
From: Wargreen @ 2017-02-24 22:09 UTC (permalink / raw)
  To: Pankaj Gupta; +Cc: linux-rt-users

Thank you for your reply.
> It looks like some other task holding the mutex resulting
> these tasks to sleep and hung the system.
>
> One step to debug this is to find out which task is holding
> the mutex and for how long? Taking crash of hung system
> and see whats going on is one way but a harsh one
Because my bad english and my light technical background, i don't
understand how can i find this task.
Can you guide me ?

> Another step is start with version where it was working before?
> and try to bisect or narrow down.
From memory, it's happen with 4.0.4, 4.1.3 and 4.9.0, on two computers
(both with intel CPU). So i don't know if it was worked one day.

> Others might have better ideas.
I take all ideas !
It's my workstation (as well as my all-day computer) for real time audio
processing so i'm open to alls test who can fix this problem.

Is it the best place, or i should also post on the radeon mailing ?

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-02-24 22:09   ` Wargreen
@ 2017-02-26 11:55     ` Daniel Vidal
  2017-03-02 21:10       ` Julia Cartwright
  2017-03-17 18:15       ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 18+ messages in thread
From: Daniel Vidal @ 2017-02-26 11:55 UTC (permalink / raw)
  To: Wargreen; +Cc: Pankaj Gupta, linux-rt-users

Hi

The radeon driver hangs from 3.16 kernel.

RT patch for radeon driver only disable ( preempt_disable_rt() ) and
reenable ( preempt_enable_rt() ) the whole RT in file radeon_display.c

Obviously this is not enough.

I think the radeon developers should make some modification in the
driver but I do not know which is the right place to indicate it

Here my backtrace.

(sorry for my english)


Feb 25 22:24:34 musix64-v3 kernel: [31679.556374] chromium        D
0000000000000002     0  1487   1466 0x00000000
Feb 25 22:24:34 musix64-v3 kernel: [31679.556379]  ffff88008c0e7a20
ffff88004ff55780 ffff8800cfb1e400 00000000000000fd
Feb 25 22:24:34 musix64-v3 kernel: [31679.556382]  0000000000000001
ffff88008c0e8000 0000000000000000 0000000000000002
Feb 25 22:24:34 musix64-v3 kernel: [31679.556384]  ffff88008c0e7ac0
0000000000000002 ffff8800cfb1e400 ffffffff814dc07e
Feb 25 22:24:34 musix64-v3 kernel: [31679.556387] Call Trace:
Feb 25 22:24:34 musix64-v3 kernel: [31679.556393]
[<ffffffff814dc07e>] ? schedule+0x3e/0xc0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556396]
[<ffffffff814dd825>] ? __rt_mutex_slowlock+0x95/0x160
Feb 25 22:24:34 musix64-v3 kernel: [31679.556399]
[<ffffffff814ddde3>] ? rt_mutex_slowlock+0xf3/0x260
Feb 25 22:24:34 musix64-v3 kernel: [31679.556401]
[<ffffffff810584dd>] ? unpin_current_cpu+0xd/0x60
Feb 25 22:24:34 musix64-v3 kernel: [31679.556405]
[<ffffffff8109b01f>] ? rt__down_read+0x1f/0x30
Feb 25 22:24:34 musix64-v3 kernel: [31679.556426]
[<ffffffffa035be37>] ? radeon_cs_ioctl+0x27/0x780 [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556429]
[<ffffffff81003427>] ? __switch_to+0x27/0x580
Feb 25 22:24:34 musix64-v3 kernel: [31679.556431]
[<ffffffff814def2e>] ? _raw_spin_unlock_irq+0xe/0x50
Feb 25 22:24:34 musix64-v3 kernel: [31679.556434]
[<ffffffff8107cd98>] ? finish_task_switch+0x88/0x220
Feb 25 22:24:34 musix64-v3 kernel: [31679.556436]
[<ffffffff814db91d>] ? __schedule+0x2bd/0x9e0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556438]
[<ffffffff814dc088>] ? schedule+0x48/0xc0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556440]
[<ffffffff810ccca7>] ? futex_wait_queue_me+0xc7/0x130
Feb 25 22:24:34 musix64-v3 kernel: [31679.556442]
[<ffffffff810ce080>] ? futex_wait+0x210/0x230
Feb 25 22:24:34 musix64-v3 kernel: [31679.556445]
[<ffffffff810cca71>] ? get_futex_key+0x201/0x230
Feb 25 22:24:34 musix64-v3 kernel: [31679.556447]
[<ffffffff8107f574>] ? check_preempt_curr+0x74/0x80
Feb 25 22:24:34 musix64-v3 kernel: [31679.556454]
[<ffffffffa02cf2fe>] ? drm_ioctl+0x10e/0x460 [drm]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556467]
[<ffffffffa035be10>] ? radeon_cs_parser_init+0x480/0x480 [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556471]
[<ffffffff810584dd>] ? unpin_current_cpu+0xd/0x60
Feb 25 22:24:34 musix64-v3 kernel: [31679.556481]
[<ffffffffa0328041>] ? radeon_drm_ioctl+0x41/0x70 [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556484]
[<ffffffff811a8d4f>] ? do_vfs_ioctl+0x2bf/0x4a0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556487]
[<ffffffff811b24cb>] ? __fget+0x6b/0xb0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556489]
[<ffffffff811a8fa1>] ? SyS_ioctl+0x71/0x80
Feb 25 22:24:34 musix64-v3 kernel: [31679.556491]
[<ffffffff814df597>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
Feb 25 22:24:34 musix64-v3 kernel: [31679.556501] INFO: task
kworker/u12:0:7008 blocked for more than 120 seconds.
Feb 25 22:24:34 musix64-v3 kernel: [31679.556503]       Not tainted
4.4.47-rt59 #1
Feb 25 22:24:34 musix64-v3 kernel: [31679.556504] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 25 22:24:34 musix64-v3 kernel: [31679.556505] kworker/u12:0   D
ffff88011b6e5980     0  7008      2 0x00000000
Feb 25 22:24:34 musix64-v3 kernel: [31679.556520] Workqueue:
radeon-crtc radeon_flip_work_func [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556522]  ffff8800a92dbc38
ffff8800a5f72580 ffff88004ff55780 0000000000000000
Feb 25 22:24:34 musix64-v3 kernel: [31679.556524]  ffff88000011cfc8
ffff8800a92dc000 ffff88000011c000 ffff88000011d6d8
Feb 25 22:24:34 musix64-v3 kernel: [31679.556527]  ffff8800a92dbd18
0000000000000008 ffff88004ff55780 ffffffff814dc07e
Feb 25 22:24:34 musix64-v3 kernel: [31679.556529] Call Trace:
Feb 25 22:24:34 musix64-v3 kernel: [31679.556531]
[<ffffffff814dc07e>] ? schedule+0x3e/0xc0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556534]
[<ffffffff814de417>] ? schedule_timeout+0x177/0x320
Feb 25 22:24:34 musix64-v3 kernel: [31679.556545]
[<ffffffffa0342925>] ? radeon_fence_activity+0xf5/0x160 [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556547]
[<ffffffff81058364>] ? pin_current_cpu+0x84/0x1f0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556558]
[<ffffffffa0342f69>] ? radeon_fence_process+0x9/0x30 [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556570]
[<ffffffffa03432a8>] ?
radeon_fence_wait_seq_timeout.constprop.8+0x2a8/0x380 [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556572]
[<ffffffff81095220>] ? wait_woken+0x80/0x80
Feb 25 22:24:34 musix64-v3 kernel: [31679.556584]
[<ffffffffa0343624>] ? radeon_fence_wait+0x74/0xb0 [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556596]
[<ffffffffa0353b55>] ? radeon_flip_work_func+0x255/0x310 [radeon]
Feb 25 22:24:34 musix64-v3 kernel: [31679.556598]
[<ffffffff8107cd98>] ? finish_task_switch+0x88/0x220
Feb 25 22:24:34 musix64-v3 kernel: [31679.556601]
[<ffffffff8107027f>] ? process_one_work+0x14f/0x480
Feb 25 22:24:34 musix64-v3 kernel: [31679.556603]
[<ffffffff81070610>] ? worker_thread+0x60/0x4b0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556605]
[<ffffffff810705b0>] ? process_one_work+0x480/0x480
Feb 25 22:24:34 musix64-v3 kernel: [31679.556608]
[<ffffffff810759ca>] ? kthread+0xca/0xe0
Feb 25 22:24:34 musix64-v3 kernel: [31679.556610]
[<ffffffff81075900>] ? kthread_worker_fn+0x190/0x190
Feb 25 22:24:34 musix64-v3 kernel: [31679.556613]
[<ffffffff814df8df>] ? ret_from_fork+0x3f/0x70
Feb 25 22:24:34 musix64-v3 kernel: [31679.556615]
[<ffffffff81075900>] ? kthread_worker_fn+0x190/0x190

2017-02-24 23:09 GMT+01:00 Wargreen <wargreen@lebib.org>:
> Thank you for your reply.
>> It looks like some other task holding the mutex resulting
>> these tasks to sleep and hung the system.
>>
>> One step to debug this is to find out which task is holding
>> the mutex and for how long? Taking crash of hung system
>> and see whats going on is one way but a harsh one
> Because my bad english and my light technical background, i don't
> understand how can i find this task.
> Can you guide me ?
>
>> Another step is start with version where it was working before?
>> and try to bisect or narrow down.
> From memory, it's happen with 4.0.4, 4.1.3 and 4.9.0, on two computers
> (both with intel CPU). So i don't know if it was worked one day.
>
>> Others might have better ideas.
> I take all ideas !
> It's my workstation (as well as my all-day computer) for real time audio
> processing so i'm open to alls test who can fix this problem.
>
> Is it the best place, or i should also post on the radeon mailing ?
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-02-26 11:55     ` Daniel Vidal
@ 2017-03-02 21:10       ` Julia Cartwright
  2017-03-06 16:14         ` Wargreen
  2017-03-17 18:15       ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 18+ messages in thread
From: Julia Cartwright @ 2017-03-02 21:10 UTC (permalink / raw)
  To: Daniel Vidal; +Cc: Wargreen, Pankaj Gupta, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

On Sun, Feb 26, 2017 at 12:55:01PM +0100, Daniel Vidal wrote:
> Hi
> 
> The radeon driver hangs from 3.16 kernel.
> 
> RT patch for radeon driver only disable ( preempt_disable_rt() ) and
> reenable ( preempt_enable_rt() ) the whole RT in file radeon_display.c
> 
> Obviously this is not enough.
> 
> I think the radeon developers should make some modification in the
> driver but I do not know which is the right place to indicate it
> 
> Here my backtrace.

Two more debugging ideas:

  1. Boot with lockdep enabled, see if that gives you a meaningful
     splat.

  2. Use sysrq-trigger 'd' to dump held locks to console when stuck in
     this state, in order to track down the codepath causing problems.

   Julia

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-02 21:10       ` Julia Cartwright
@ 2017-03-06 16:14         ` Wargreen
  2017-03-06 17:33           ` Julia Cartwright
  0 siblings, 1 reply; 18+ messages in thread
From: Wargreen @ 2017-03-06 16:14 UTC (permalink / raw)
  To: Julia Cartwright, Daniel Vidal; +Cc: Pankaj Gupta, linux-rt-users


[-- Attachment #1.1: Type: text/plain, Size: 7937 bytes --]

Hi,
As you suggest, i've build the kernel 4.9 rt10 from the debian sources,
with lockdep.

Here the trace when system hang :

Mar  6 17:06:46 LaChoze kernel: [  339.062919] sysrq: SysRq : Show Locks
Held
Mar  6 17:06:46 LaChoze kernel: [  339.062929]
Mar  6 17:06:46 LaChoze kernel: [  339.062929] Showing all locks held in
the system:
Mar  6 17:06:46 LaChoze kernel: [  339.062947] 5 locks held by
irq/1-i8042/126:
Mar  6 17:06:46 LaChoze kernel: [  339.062948]  #0: 
(&serio->lock){+.+...}, at: [<ffffffff8794dab8>] serio_interrupt+0x28/0x80
Mar  6 17:06:46 LaChoze kernel: [  339.062960]  #1: 
(&dev->event_lock){+.+...}, at: [<ffffffff87953cba>] input_event+0x3a/0x60
Mar  6 17:06:46 LaChoze kernel: [  339.062967]  #2: 
(rcu_read_lock){......}, at: [<ffffffff87952ee5>]
input_pass_values.part.5+0x5/0x270
Mar  6 17:06:46 LaChoze kernel: [  339.062975]  #3: 
(rcu_read_lock){......}, at: [<ffffffff878c1eb5>] __handle_sysrq+0x5/0x220
Mar  6 17:06:46 LaChoze kernel: [  339.062982]  #4: 
(tasklist_lock){+.+...}, at: [<ffffffff874eae1d>]
debug_show_all_locks+0x3d/0x1a0
Mar  6 17:06:46 LaChoze kernel: [  339.063009] 1 lock held by in:imklog/822:
Mar  6 17:06:46 LaChoze kernel: [  339.063011]  #0: 
(&f->f_pos_lock){+.+.+.}, at: [<ffffffff8768883a>] __fdget_pos+0x4a/0x50
Mar  6 17:06:46 LaChoze kernel: [  339.063040] 2 locks held by Xorg/2365:
Mar  6 17:06:46 LaChoze kernel: [  339.063041]  #0: 
(&rdev->exclusive_lock){++++.+}, at: [<ffffffff874f3710>]
rt_down_read+0x10/0x20
Mar  6 17:06:46 LaChoze kernel: [  339.063049]  #1: 
(&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>]
rt_down_read+0x10/0x20
Mar  6 17:06:46 LaChoze kernel: [  339.063100] 2 locks held by G.Main/3575:
Mar  6 17:06:46 LaChoze kernel: [  339.063101]  #0: 
(&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>]
rt_down_read+0x10/0x20
Mar  6 17:06:46 LaChoze kernel: [  339.063108]  #1: 
(reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc087b7b0>]
ttm_bo_vm_fault+0x60/0x530 [ttm]
Mar  6 17:06:46 LaChoze kernel: [  339.063124]
Mar  6 17:06:46 LaChoze kernel: [  339.063125]
=============================================
Mar  6 17:06:46 LaChoze kernel: [  339.063125]
Mar  6 17:07:11 LaChoze kernel: [  363.523950] INFO: task Xorg:2365
blocked for more than 120 seconds.
Mar  6 17:07:11 LaChoze kernel: [  363.523959]       Not tainted
4.9.13-rt10 #1
Mar  6 17:07:11 LaChoze kernel: [  363.523961] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  6 17:07:11 LaChoze kernel: [  363.523964] Xorg            D    0 
2365   2363 0x00000004
Mar  6 17:07:11 LaChoze kernel: [  363.523970]  0000000000000000
ffff96333582b458 ffff9633027bd580 ffff96333582b440
Mar  6 17:07:11 LaChoze kernel: [  363.523975]  ffff96332f2b1c80
ffffba8646bffa50 ffffffff87af622f 00ffba8646bffb08
Mar  6 17:07:11 LaChoze kernel: [  363.523980]  ffff9633027bdbc0
ffff96333582b458 ffff96332841b800 ffff9633027bd580
Mar  6 17:07:11 LaChoze kernel: [  363.523984] Call Trace:
Mar  6 17:07:11 LaChoze kernel: [  363.523984] Call Trace:
Mar  6 17:07:11 LaChoze kernel: [  363.523992]  [<ffffffff87af622f>] ?
__schedule+0x32f/0x800
Mar  6 17:07:11 LaChoze kernel: [  363.523996]  [<ffffffff87af674c>]
schedule+0x4c/0xe0
Mar  6 17:07:11 LaChoze kernel: [  363.523998]  [<ffffffff87af8146>]
__rt_mutex_slowlock+0x56/0x1c0
Mar  6 17:07:11 LaChoze kernel: [  363.524001]  [<ffffffff87af8834>]
rt_mutex_slowlock+0x114/0x360
Mar  6 17:07:11 LaChoze kernel: [  363.524005]  [<ffffffff87665615>] ?
get_empty_filp+0x175/0x250
Mar  6 17:07:11 LaChoze kernel: [  363.524007]  [<ffffffff87af8a9a>]
rt_mutex_lock+0x1a/0x20
Mar  6 17:07:11 LaChoze kernel: [  363.524011]  [<ffffffff874f36e3>]
__rt_down_read+0x53/0x70
Mar  6 17:07:11 LaChoze kernel: [  363.524014]  [<ffffffff874f3710>] ?
rt_down_read+0x10/0x20
Mar  6 17:07:11 LaChoze kernel: [  363.524016]  [<ffffffff874f3710>]
rt_down_read+0x10/0x20
Mar  6 17:07:11 LaChoze kernel: [  363.524043]  [<ffffffffc0ae7d97>]
radeon_bo_create+0x157/0x360 [radeon]
Mar  6 17:07:11 LaChoze kernel: [  363.524064]  [<ffffffffc0afb21d>]
radeon_gem_object_create+0xbd/0x1a0 [radeon]
Mar  6 17:07:11 LaChoze kernel: [  363.524084]  [<ffffffffc0afb6ae>]
radeon_gem_create_ioctl+0x6e/0x100 [radeon]
Mar  6 17:07:11 LaChoze kernel: [  363.524101]  [<ffffffffc081ef6a>]
drm_ioctl+0x37a/0x4d0 [drm]
Mar  6 17:07:11 LaChoze kernel: [  363.524119]  [<ffffffffc0afb640>] ?
radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
Mar  6 17:07:11 LaChoze kernel: [  363.524122]  [<ffffffff877de437>] ?
debug_smp_processor_id+0x17/0x20
Mar  6 17:07:11 LaChoze kernel: [  363.524125]  [<ffffffff8748d8f6>] ?
unpin_current_cpu+0x16/0x70
Mar  6 17:07:11 LaChoze kernel: [  363.524128]  [<ffffffff874bce47>] ?
migrate_enable+0x87/0x150
Mar  6 17:07:11 LaChoze kernel: [  363.524131]  [<ffffffff87afb18f>] ?
rt_spin_unlock+0x2f/0x40
Mar  6 17:07:11 LaChoze kernel: [  363.524145]  [<ffffffffc0ac804f>]
radeon_drm_ioctl+0x4f/0x90 [radeon]
Mar  6 17:07:11 LaChoze kernel: [  363.524148]  [<ffffffff87679fd3>]
do_vfs_ioctl+0xa3/0x6f0
Mar  6 17:07:11 LaChoze kernel: [  363.524152]  [<ffffffff8768789c>] ?
__fget+0x11c/0x210
Mar  6 17:07:11 LaChoze kernel: [  363.524154]  [<ffffffff87687785>] ?
__fget+0x5/0x210
Mar  6 17:07:11 LaChoze kernel: [  363.524157]  [<ffffffff8767a699>]
SyS_ioctl+0x79/0x90
Mar  6 17:07:11 LaChoze kernel: [  363.524159]  [<ffffffff87afb845>]
system_call_fast_compare_end+0xc/0xaf
Mar  6 17:07:11 LaChoze kernel: [  363.524162]
Mar  6 17:07:11 LaChoze kernel: [  363.524162] Showing all locks held in
the system:
Mar  6 17:07:11 LaChoze kernel: [  363.524170] 2 locks held by
khungtaskd/90:
Mar  6 17:07:11 LaChoze kernel: [  363.524171]  #0: 
(rcu_read_lock){......}, at: [<ffffffff8756ef6e>] watchdog+0xae/0x640
Mar  6 17:07:11 LaChoze kernel: [  363.524180]  #1: 
(tasklist_lock){+.+...}, at: [<ffffffff874eae1d>]
debug_show_all_locks+0x3d/0x1a0
Mar  6 17:07:11 LaChoze kernel: [  363.524193] 1 lock held by in:imklog/822:
Mar  6 17:07:11 LaChoze kernel: [  363.524194]  #0: 
(&f->f_pos_lock){+.+.+.}, at: [<ffffffff8768883a>] __fdget_pos+0x4a/0x50
Mar  6 17:07:11 LaChoze kernel: [  363.524208] 2 locks held by Xorg/2365:
Mar  6 17:07:11 LaChoze kernel: [  363.524209]  #0: 
(&rdev->exclusive_lock){++++.+}, at: [<ffffffff874f3710>]
rt_down_read+0x10/0x20
Mar  6 17:07:11 LaChoze kernel: [  363.524215]  #1: 
(&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>]
rt_down_read+0x10/0x20
Mar  6 17:07:11 LaChoze kernel: [  363.524263] 2 locks held by G.Main/3575:
Mar  6 17:07:11 LaChoze kernel: [  363.524264]  #0: 
(&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>]
rt_down_read+0x10/0x20
Mar  6 17:07:11 LaChoze kernel: [  363.524270]  #1: 
(reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc087b7b0>]
ttm_bo_vm_fault+0x60/0x530 [ttm]
Mar  6 17:07:11 LaChoze kernel: [  363.524282]
Mar  6 17:07:11 LaChoze kernel: [  363.524283]
=============================================
Mar  6 17:07:11 LaChoze kernel: [  363.524283]


I hope this can help !



On 02/03/2017 22:10, Julia Cartwright wrote:
> On Sun, Feb 26, 2017 at 12:55:01PM +0100, Daniel Vidal wrote:
>> Hi
>>
>> The radeon driver hangs from 3.16 kernel.
>>
>> RT patch for radeon driver only disable ( preempt_disable_rt() ) and
>> reenable ( preempt_enable_rt() ) the whole RT in file radeon_display.c
>>
>> Obviously this is not enough.
>>
>> I think the radeon developers should make some modification in the
>> driver but I do not know which is the right place to indicate it
>>
>> Here my backtrace.
> Two more debugging ideas:
>
>   1. Boot with lockdep enabled, see if that gives you a meaningful
>      splat.
>
>   2. Use sysrq-trigger 'd' to dump held locks to console when stuck in
>      this state, in order to track down the codepath causing problems.
>
>    Julia




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-06 16:14         ` Wargreen
@ 2017-03-06 17:33           ` Julia Cartwright
  2017-03-06 17:59             ` Wargreen
  0 siblings, 1 reply; 18+ messages in thread
From: Julia Cartwright @ 2017-03-06 17:33 UTC (permalink / raw)
  To: Wargreen; +Cc: Daniel Vidal, Pankaj Gupta, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 4641 bytes --]

On Mon, Mar 06, 2017 at 05:14:43PM +0100, Wargreen wrote:
> Hi,
> As you suggest, i've build the kernel 4.9 rt10 from the debian sources,
> with lockdep.
>
> Here the trace when system hang :
>
> [  339.062919] sysrq: SysRq : Show Locks Held
> [  339.062929]
> [  339.062929] Showing all locks held in the system:
> [  339.062947] 5 locks held by irq/1-i8042/126:
> [  339.062948]  #0: (&serio->lock){+.+...}, at: [<ffffffff8794dab8>] serio_interrupt+0x28/0x80
> [  339.062960]  #1: (&dev->event_lock){+.+...}, at: [<ffffffff87953cba>] input_event+0x3a/0x60
> [  339.062967]  #2: (rcu_read_lock){......}, at: [<ffffffff87952ee5>] input_pass_values.part.5+0x5/0x270
> [  339.062975]  #3: (rcu_read_lock){......}, at: [<ffffffff878c1eb5>] __handle_sysrq+0x5/0x220
> [  339.062982]  #4: (tasklist_lock){+.+...}, at: [<ffffffff874eae1d>] debug_show_all_locks+0x3d/0x1a0
> [  339.063009] 1 lock held by in:imklog/822:
> [  339.063011]  #0: (&f->f_pos_lock){+.+.+.}, at: [<ffffffff8768883a>] __fdget_pos+0x4a/0x50

The above two threads are irrelevant.

> [  339.063040] 2 locks held by Xorg/2365:
> [  339.063041]  #0: (&rdev->exclusive_lock){++++.+}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  339.063049]  #1: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  339.063100] 2 locks held by G.Main/3575:
> [  339.063101]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  339.063108]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc087b7b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]

These two are suspicious, though.  Note that what it claimed is that
&rdev->pm.mclk_lock is acquired by both threads.  However:

> [  339.063124]
> [  339.063125] =============================================
> [  339.063125]
> [  363.523950] INFO: task Xorg:2365 blocked for more than 120 seconds.
> [  363.523959]       Not tainted 4.9.13-rt10 #1
> [  363.523961] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  363.523964] Xorg            D    0 2365   2363 0x00000004
> [  363.523970]  0000000000000000 ffff96333582b458 ffff9633027bd580 ffff96333582b440
> [  363.523975]  ffff96332f2b1c80 ffffba8646bffa50 ffffffff87af622f 00ffba8646bffb08
> [  363.523980]  ffff9633027bdbc0 ffff96333582b458 ffff96332841b800 ffff9633027bd580
> [  363.523984] Call Trace:
> [  363.523984] Call Trace:
> [  363.523992]  [<ffffffff87af622f>] ?  __schedule+0x32f/0x800
> [  363.523996]  [<ffffffff87af674c>] schedule+0x4c/0xe0
> [  363.523998]  [<ffffffff87af8146>] __rt_mutex_slowlock+0x56/0x1c0
> [  363.524001]  [<ffffffff87af8834>] rt_mutex_slowlock+0x114/0x360
> [  363.524005]  [<ffffffff87665615>] ?  get_empty_filp+0x175/0x250
> [  363.524007]  [<ffffffff87af8a9a>] rt_mutex_lock+0x1a/0x20
> [  363.524011]  [<ffffffff874f36e3>] __rt_down_read+0x53/0x70
> [  363.524014]  [<ffffffff874f3710>] ?  rt_down_read+0x10/0x20
> [  363.524016]  [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  363.524043]  [<ffffffffc0ae7d97>] radeon_bo_create+0x157/0x360 [radeon]
> [  363.524064]  [<ffffffffc0afb21d>] radeon_gem_object_create+0xbd/0x1a0 [radeon]
> [  363.524084]  [<ffffffffc0afb6ae>] radeon_gem_create_ioctl+0x6e/0x100 [radeon]
> [  363.524101]  [<ffffffffc081ef6a>] drm_ioctl+0x37a/0x4d0 [drm]
> [  363.524119]  [<ffffffffc0afb640>] ?  radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
> [  363.524122]  [<ffffffff877de437>] ?  debug_smp_processor_id+0x17/0x20
> [  363.524125]  [<ffffffff8748d8f6>] ?  unpin_current_cpu+0x16/0x70
> [  363.524128]  [<ffffffff874bce47>] ?  migrate_enable+0x87/0x150
> [  363.524131]  [<ffffffff87afb18f>] ?  rt_spin_unlock+0x2f/0x40
> [  363.524145]  [<ffffffffc0ac804f>] radeon_drm_ioctl+0x4f/0x90 [radeon]
> [  363.524148]  [<ffffffff87679fd3>] do_vfs_ioctl+0xa3/0x6f0
> [  363.524152]  [<ffffffff8768789c>] ?  __fget+0x11c/0x210
> [  363.524154]  [<ffffffff87687785>] ?  __fget+0x5/0x210
> [  363.524157]  [<ffffffff8767a699>] SyS_ioctl+0x79/0x90
> [  363.524159]  [<ffffffff87afb845>] system_call_fast_compare_end+0xc/0xaf

It appears that Xorg hasn't actually acquired the rwsem's underlying
rt_mutex, but is in the process of acquiring it, but is suspended
waiting on G.Main/3575 to release it.

> [  363.524263] 2 locks held by G.Main/3575:
> [  363.524264]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
> [  363.524270]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc087b7b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]

So, the question is: what is this G.Main/3575 thread, and what is it
doing?  Do you find it spinning?

Could be something fishy with w/w mutexes as well.

   Julia

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-06 17:33           ` Julia Cartwright
@ 2017-03-06 17:59             ` Wargreen
  2017-03-06 19:07               ` Julia Cartwright
  0 siblings, 1 reply; 18+ messages in thread
From: Wargreen @ 2017-03-06 17:59 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: Daniel Vidal, Pankaj Gupta, linux-rt-users


[-- Attachment #1.1: Type: text/plain, Size: 7033 bytes --]

Here a new try, with a process name more explicit, but without X crash
this time.
So it's the graphic's ressources demanding process.

Mar  6 18:44:47 LaChoze kernel: [  497.053439] sysrq: SysRq : Show Locks
Held
Mar  6 18:44:47 LaChoze kernel: [  497.053450]
Mar  6 18:44:47 LaChoze kernel: [  497.053450] Showing all locks held in
the system:
Mar  6 18:44:47 LaChoze kernel: [  497.053468] 5 locks held by
irq/1-i8042/126:
Mar  6 18:44:47 LaChoze kernel: [  497.053470]  #0: 
(&serio->lock){+.+...}, at: [<ffffffffa954dab8>] serio_interrupt+0x28/0x80
Mar  6 18:44:47 LaChoze kernel: [  497.053481]  #1: 
(&dev->event_lock){+.+...}, at: [<ffffffffa9553cba>] input_event+0x3a/0x60
Mar  6 18:44:47 LaChoze kernel: [  497.053489]  #2: 
(rcu_read_lock){......}, at: [<ffffffffa9552ee5>]
input_pass_values.part.5+0x5/0x270
Mar  6 18:44:47 LaChoze kernel: [  497.053496]  #3: 
(rcu_read_lock){......}, at: [<ffffffffa94c1eb5>] __handle_sysrq+0x5/0x220
Mar  6 18:44:47 LaChoze kernel: [  497.053504]  #4: 
(tasklist_lock){+.+...}, at: [<ffffffffa90eae1d>]
debug_show_all_locks+0x3d/0x1a0
Mar  6 18:44:47 LaChoze kernel: [  497.053532] 1 lock held by in:imklog/855:
Mar  6 18:44:47 LaChoze kernel: [  497.053533]  #0: 
(&f->f_pos_lock){+.+.+.}, at: [<ffffffffa928883a>] __fdget_pos+0x4a/0x50
Mar  6 18:44:47 LaChoze kernel: [  497.053562] 2 locks held by Xorg/2154:
Mar  6 18:44:47 LaChoze kernel: [  497.053563]  #0: 
(&rdev->exclusive_lock){++++.+}, at: [<ffffffffa90f3710>]
rt_down_read+0x10/0x20
Mar  6 18:44:47 LaChoze kernel: [  497.053571]  #1: 
(&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa90f3710>]
rt_down_read+0x10/0x20
Mar  6 18:44:47 LaChoze kernel: [  497.053589] 2 locks held by
gnome-shell/2344:
Mar  6 18:44:47 LaChoze kernel: [  497.053590]  #0: 
(&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa90f3710>]
rt_down_read+0x10/0x20
Mar  6 18:44:47 LaChoze kernel: [  497.053597]  #1: 
(reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc0af47b0>]
ttm_bo_vm_fault+0x60/0x530 [ttm]
Mar  6 18:44:47 LaChoze kernel: [  497.053637]
Mar  6 18:44:47 LaChoze kernel: [  497.053638]
=============================================



On 06/03/2017 18:33, Julia Cartwright wrote:
> On Mon, Mar 06, 2017 at 05:14:43PM +0100, Wargreen wrote:
>> Hi,
>> As you suggest, i've build the kernel 4.9 rt10 from the debian sources,
>> with lockdep.
>>
>> Here the trace when system hang :
>>
>> [  339.062919] sysrq: SysRq : Show Locks Held
>> [  339.062929]
>> [  339.062929] Showing all locks held in the system:
>> [  339.062947] 5 locks held by irq/1-i8042/126:
>> [  339.062948]  #0: (&serio->lock){+.+...}, at: [<ffffffff8794dab8>] serio_interrupt+0x28/0x80
>> [  339.062960]  #1: (&dev->event_lock){+.+...}, at: [<ffffffff87953cba>] input_event+0x3a/0x60
>> [  339.062967]  #2: (rcu_read_lock){......}, at: [<ffffffff87952ee5>] input_pass_values.part.5+0x5/0x270
>> [  339.062975]  #3: (rcu_read_lock){......}, at: [<ffffffff878c1eb5>] __handle_sysrq+0x5/0x220
>> [  339.062982]  #4: (tasklist_lock){+.+...}, at: [<ffffffff874eae1d>] debug_show_all_locks+0x3d/0x1a0
>> [  339.063009] 1 lock held by in:imklog/822:
>> [  339.063011]  #0: (&f->f_pos_lock){+.+.+.}, at: [<ffffffff8768883a>] __fdget_pos+0x4a/0x50
> The above two threads are irrelevant.
>
>> [  339.063040] 2 locks held by Xorg/2365:
>> [  339.063041]  #0: (&rdev->exclusive_lock){++++.+}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
>> [  339.063049]  #1: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
>> [  339.063100] 2 locks held by G.Main/3575:
>> [  339.063101]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
>> [  339.063108]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc087b7b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]
> These two are suspicious, though.  Note that what it claimed is that
> &rdev->pm.mclk_lock is acquired by both threads.  However:
>
>> [  339.063124]
>> [  339.063125] =============================================
>> [  339.063125]
>> [  363.523950] INFO: task Xorg:2365 blocked for more than 120 seconds.
>> [  363.523959]       Not tainted 4.9.13-rt10 #1
>> [  363.523961] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  363.523964] Xorg            D    0 2365   2363 0x00000004
>> [  363.523970]  0000000000000000 ffff96333582b458 ffff9633027bd580 ffff96333582b440
>> [  363.523975]  ffff96332f2b1c80 ffffba8646bffa50 ffffffff87af622f 00ffba8646bffb08
>> [  363.523980]  ffff9633027bdbc0 ffff96333582b458 ffff96332841b800 ffff9633027bd580
>> [  363.523984] Call Trace:
>> [  363.523984] Call Trace:
>> [  363.523992]  [<ffffffff87af622f>] ?  __schedule+0x32f/0x800
>> [  363.523996]  [<ffffffff87af674c>] schedule+0x4c/0xe0
>> [  363.523998]  [<ffffffff87af8146>] __rt_mutex_slowlock+0x56/0x1c0
>> [  363.524001]  [<ffffffff87af8834>] rt_mutex_slowlock+0x114/0x360
>> [  363.524005]  [<ffffffff87665615>] ?  get_empty_filp+0x175/0x250
>> [  363.524007]  [<ffffffff87af8a9a>] rt_mutex_lock+0x1a/0x20
>> [  363.524011]  [<ffffffff874f36e3>] __rt_down_read+0x53/0x70
>> [  363.524014]  [<ffffffff874f3710>] ?  rt_down_read+0x10/0x20
>> [  363.524016]  [<ffffffff874f3710>] rt_down_read+0x10/0x20
>> [  363.524043]  [<ffffffffc0ae7d97>] radeon_bo_create+0x157/0x360 [radeon]
>> [  363.524064]  [<ffffffffc0afb21d>] radeon_gem_object_create+0xbd/0x1a0 [radeon]
>> [  363.524084]  [<ffffffffc0afb6ae>] radeon_gem_create_ioctl+0x6e/0x100 [radeon]
>> [  363.524101]  [<ffffffffc081ef6a>] drm_ioctl+0x37a/0x4d0 [drm]
>> [  363.524119]  [<ffffffffc0afb640>] ?  radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
>> [  363.524122]  [<ffffffff877de437>] ?  debug_smp_processor_id+0x17/0x20
>> [  363.524125]  [<ffffffff8748d8f6>] ?  unpin_current_cpu+0x16/0x70
>> [  363.524128]  [<ffffffff874bce47>] ?  migrate_enable+0x87/0x150
>> [  363.524131]  [<ffffffff87afb18f>] ?  rt_spin_unlock+0x2f/0x40
>> [  363.524145]  [<ffffffffc0ac804f>] radeon_drm_ioctl+0x4f/0x90 [radeon]
>> [  363.524148]  [<ffffffff87679fd3>] do_vfs_ioctl+0xa3/0x6f0
>> [  363.524152]  [<ffffffff8768789c>] ?  __fget+0x11c/0x210
>> [  363.524154]  [<ffffffff87687785>] ?  __fget+0x5/0x210
>> [  363.524157]  [<ffffffff8767a699>] SyS_ioctl+0x79/0x90
>> [  363.524159]  [<ffffffff87afb845>] system_call_fast_compare_end+0xc/0xaf
> It appears that Xorg hasn't actually acquired the rwsem's underlying
> rt_mutex, but is in the process of acquiring it, but is suspended
> waiting on G.Main/3575 to release it.
>
>> [  363.524263] 2 locks held by G.Main/3575:
>> [  363.524264]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffff874f3710>] rt_down_read+0x10/0x20
>> [  363.524270]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc087b7b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]
> So, the question is: what is this G.Main/3575 thread, and what is it
> doing?  Do you find it spinning?
>
> Could be something fishy with w/w mutexes as well.
>
>    Julia




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-06 17:59             ` Wargreen
@ 2017-03-06 19:07               ` Julia Cartwright
  2017-03-07 18:41                 ` Wargreen
  0 siblings, 1 reply; 18+ messages in thread
From: Julia Cartwright @ 2017-03-06 19:07 UTC (permalink / raw)
  To: Wargreen; +Cc: Daniel Vidal, Pankaj Gupta, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 1165 bytes --]

On Mon, Mar 06, 2017 at 06:59:26PM +0100, Wargreen wrote:
> Here a new try, with a process name more explicit, but without X crash
> this time.
> So it's the graphic's ressources demanding process.
>
> [  497.053562] 2 locks held by Xorg/2154:
> [  497.053563]  #0: (&rdev->exclusive_lock){++++.+}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
> [  497.053571]  #1: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
> [  497.053589] 2 locks held by gnome-shell/2344:
> [  497.053590]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
> [  497.053597]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc0af47b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]

Unfortunately, this doesn't get us any more data than we already had.
For some reason, a thread (G.Main in the prior trace, gnome-shell here)
in this state fails to make progress.

We need to better understand what this thread is doing.  Is it
Runnable/spinning?  Is it itself waiting on a lock?  Dumping the stack
from it in this state might be revealing.  The contents of
/proc/<pid>/stack might be of interest.

   Julia

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-06 19:07               ` Julia Cartwright
@ 2017-03-07 18:41                 ` Wargreen
  2017-03-10 17:51                   ` Wargreen
  0 siblings, 1 reply; 18+ messages in thread
From: Wargreen @ 2017-03-07 18:41 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: Daniel Vidal, Pankaj Gupta, linux-rt-users


[-- Attachment #1.1: Type: text/plain, Size: 4126 bytes --]

Sorry, i'm more a user than a debugger...

So, the /proc/[PID]/stack output, with a gnome_shell:
[<ffffffffc0b37e9f>] radeon_fence_default_wait+0xbf/0x170 [radeon]
[<ffffffffae34227d>] fence_wait_timeout+0x9d/0x350
[<ffffffffc0a2dba5>] ttm_bo_vm_fault+0x455/0x530 [ttm]
[<ffffffffc0b39e27>] radeon_ttm_fault+0x47/0x70 [radeon]
[<ffffffffae00f9d1>] __do_fault+0x81/0x170
[<ffffffffae01377b>] handle_mm_fault+0x57b/0x1450
[<ffffffffade70459>] __do_page_fault+0x289/0x5d0
[<ffffffffade707c2>] do_page_fault+0x22/0x30
[<ffffffffae4fccf8>] page_fault+0x28/0x30
[<ffffffffffffffff>] 0xffffffffffffffff

And the corresponding locks :
Mar  7 19:15:49 LaChoze kernel: [  239.624948] NOHZ:
local_softirq_pending 80
Mar  7 19:15:57 LaChoze kernel: [  247.718336] sysrq: SysRq : Show Locks
Held
Mar  7 19:15:57 LaChoze kernel: [  247.718342]
Mar  7 19:15:57 LaChoze kernel: [  247.718342] Showing all locks held in
the system:
Mar  7 19:15:57 LaChoze kernel: [  247.718371] 5 locks held by
irq/1-i8042/126:
Mar  7 19:15:57 LaChoze kernel: [  247.718372]  #0: 
(&serio->lock){+.+...}, at: [<ffffffffae34dab8>] serio_interrupt+0x28/0x80
Mar  7 19:15:57 LaChoze kernel: [  247.718378]  #1: 
(&dev->event_lock){+.+...}, at: [<ffffffffae353cba>] input_event+0x3a/0x60
Mar  7 19:15:57 LaChoze kernel: [  247.718382]  #2: 
(rcu_read_lock){......}, at: [<ffffffffae352ee5>]
input_pass_values.part.5+0x5/0x270
Mar  7 19:15:57 LaChoze kernel: [  247.718385]  #3: 
(rcu_read_lock){......}, at: [<ffffffffae2c1eb5>] __handle_sysrq+0x5/0x220
Mar  7 19:15:57 LaChoze kernel: [  247.718389]  #4: 
(tasklist_lock){+.+...}, at: [<ffffffffadeeae1d>]
debug_show_all_locks+0x3d/0x1a0
Mar  7 19:15:57 LaChoze kernel: [  247.718412] 1 lock held by in:imklog/895:
Mar  7 19:15:57 LaChoze kernel: [  247.718412]  #0: 
(&f->f_pos_lock){+.+.+.}, at: [<ffffffffae08883a>] __fdget_pos+0x4a/0x50
Mar  7 19:15:57 LaChoze kernel: [  247.718435] 2 locks held by Xorg/2215:
Mar  7 19:15:57 LaChoze kernel: [  247.718435]  #0: 
(&rdev->exclusive_lock){++++.+}, at: [<ffffffffadef3710>]
rt_down_read+0x10/0x20
Mar  7 19:15:57 LaChoze kernel: [  247.718439]  #1: 
(&rdev->pm.mclk_lock){++++++}, at: [<ffffffffadef3710>]
rt_down_read+0x10/0x20
Mar  7 19:15:57 LaChoze kernel: [  247.718452] 2 locks held by
gnome-shell/2412:
Mar  7 19:15:57 LaChoze kernel: [  247.718453]  #0: 
(&rdev->pm.mclk_lock){++++++}, at: [<ffffffffadef3710>]
rt_down_read+0x10/0x20
Mar  7 19:15:57 LaChoze kernel: [  247.718456]  #1: 
(reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc0a2d7b0>]
ttm_bo_vm_fault+0x60/0x530 [ttm]
Mar  7 19:15:57 LaChoze kernel: [  247.718487]
Mar  7 19:15:57 LaChoze kernel: [  247.718487]
=============================================

If needeed, all theres logs are here :
http://kalaj.darktech.org/~wargreen/logs/radeon/
I'm here for the next step.



On 06/03/2017 20:07, Julia Cartwright wrote:
> On Mon, Mar 06, 2017 at 06:59:26PM +0100, Wargreen wrote:
>> Here a new try, with a process name more explicit, but without X crash
>> this time.
>> So it's the graphic's ressources demanding process.
>>
>> [  497.053562] 2 locks held by Xorg/2154:
>> [  497.053563]  #0: (&rdev->exclusive_lock){++++.+}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
>> [  497.053571]  #1: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
>> [  497.053589] 2 locks held by gnome-shell/2344:
>> [  497.053590]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
>> [  497.053597]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc0af47b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]
> Unfortunately, this doesn't get us any more data than we already had.
> For some reason, a thread (G.Main in the prior trace, gnome-shell here)
> in this state fails to make progress.
>
> We need to better understand what this thread is doing.  Is it
> Runnable/spinning?  Is it itself waiting on a lock?  Dumping the stack
> from it in this state might be revealing.  The contents of
> /proc/<pid>/stack might be of interest.
>
>    Julia




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-07 18:41                 ` Wargreen
@ 2017-03-10 17:51                   ` Wargreen
  2017-03-15 19:20                     ` Julia Cartwright
  2017-03-17 18:12                     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 18+ messages in thread
From: Wargreen @ 2017-03-10 17:51 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: Daniel Vidal, Pankaj Gupta, linux-rt-users


[-- Attachment #1.1: Type: text/plain, Size: 4396 bytes --]

Do you think i should send an issue to DRM/Radeon ?

Thanks for your help !


On 07/03/2017 19:41, Wargreen wrote:
> Sorry, i'm more a user than a debugger...
>
> So, the /proc/[PID]/stack output, with a gnome_shell:
> [<ffffffffc0b37e9f>] radeon_fence_default_wait+0xbf/0x170 [radeon]
> [<ffffffffae34227d>] fence_wait_timeout+0x9d/0x350
> [<ffffffffc0a2dba5>] ttm_bo_vm_fault+0x455/0x530 [ttm]
> [<ffffffffc0b39e27>] radeon_ttm_fault+0x47/0x70 [radeon]
> [<ffffffffae00f9d1>] __do_fault+0x81/0x170
> [<ffffffffae01377b>] handle_mm_fault+0x57b/0x1450
> [<ffffffffade70459>] __do_page_fault+0x289/0x5d0
> [<ffffffffade707c2>] do_page_fault+0x22/0x30
> [<ffffffffae4fccf8>] page_fault+0x28/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> And the corresponding locks :
> Mar  7 19:15:49 LaChoze kernel: [  239.624948] NOHZ:
> local_softirq_pending 80
> Mar  7 19:15:57 LaChoze kernel: [  247.718336] sysrq: SysRq : Show Locks
> Held
> Mar  7 19:15:57 LaChoze kernel: [  247.718342]
> Mar  7 19:15:57 LaChoze kernel: [  247.718342] Showing all locks held in
> the system:
> Mar  7 19:15:57 LaChoze kernel: [  247.718371] 5 locks held by
> irq/1-i8042/126:
> Mar  7 19:15:57 LaChoze kernel: [  247.718372]  #0: 
> (&serio->lock){+.+...}, at: [<ffffffffae34dab8>] serio_interrupt+0x28/0x80
> Mar  7 19:15:57 LaChoze kernel: [  247.718378]  #1: 
> (&dev->event_lock){+.+...}, at: [<ffffffffae353cba>] input_event+0x3a/0x60
> Mar  7 19:15:57 LaChoze kernel: [  247.718382]  #2: 
> (rcu_read_lock){......}, at: [<ffffffffae352ee5>]
> input_pass_values.part.5+0x5/0x270
> Mar  7 19:15:57 LaChoze kernel: [  247.718385]  #3: 
> (rcu_read_lock){......}, at: [<ffffffffae2c1eb5>] __handle_sysrq+0x5/0x220
> Mar  7 19:15:57 LaChoze kernel: [  247.718389]  #4: 
> (tasklist_lock){+.+...}, at: [<ffffffffadeeae1d>]
> debug_show_all_locks+0x3d/0x1a0
> Mar  7 19:15:57 LaChoze kernel: [  247.718412] 1 lock held by in:imklog/895:
> Mar  7 19:15:57 LaChoze kernel: [  247.718412]  #0: 
> (&f->f_pos_lock){+.+.+.}, at: [<ffffffffae08883a>] __fdget_pos+0x4a/0x50
> Mar  7 19:15:57 LaChoze kernel: [  247.718435] 2 locks held by Xorg/2215:
> Mar  7 19:15:57 LaChoze kernel: [  247.718435]  #0: 
> (&rdev->exclusive_lock){++++.+}, at: [<ffffffffadef3710>]
> rt_down_read+0x10/0x20
> Mar  7 19:15:57 LaChoze kernel: [  247.718439]  #1: 
> (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffadef3710>]
> rt_down_read+0x10/0x20
> Mar  7 19:15:57 LaChoze kernel: [  247.718452] 2 locks held by
> gnome-shell/2412:
> Mar  7 19:15:57 LaChoze kernel: [  247.718453]  #0: 
> (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffadef3710>]
> rt_down_read+0x10/0x20
> Mar  7 19:15:57 LaChoze kernel: [  247.718456]  #1: 
> (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc0a2d7b0>]
> ttm_bo_vm_fault+0x60/0x530 [ttm]
> Mar  7 19:15:57 LaChoze kernel: [  247.718487]
> Mar  7 19:15:57 LaChoze kernel: [  247.718487]
> =============================================
>
> If needeed, all theres logs are here :
> http://kalaj.darktech.org/~wargreen/logs/radeon/
> I'm here for the next step.
>
>
>
> On 06/03/2017 20:07, Julia Cartwright wrote:
>> On Mon, Mar 06, 2017 at 06:59:26PM +0100, Wargreen wrote:
>>> Here a new try, with a process name more explicit, but without X crash
>>> this time.
>>> So it's the graphic's ressources demanding process.
>>>
>>> [  497.053562] 2 locks held by Xorg/2154:
>>> [  497.053563]  #0: (&rdev->exclusive_lock){++++.+}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
>>> [  497.053571]  #1: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
>>> [  497.053589] 2 locks held by gnome-shell/2344:
>>> [  497.053590]  #0: (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa90f3710>] rt_down_read+0x10/0x20
>>> [  497.053597]  #1: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc0af47b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]
>> Unfortunately, this doesn't get us any more data than we already had.
>> For some reason, a thread (G.Main in the prior trace, gnome-shell here)
>> in this state fails to make progress.
>>
>> We need to better understand what this thread is doing.  Is it
>> Runnable/spinning?  Is it itself waiting on a lock?  Dumping the stack
>> from it in this state might be revealing.  The contents of
>> /proc/<pid>/stack might be of interest.
>>
>>    Julia
>
>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-10 17:51                   ` Wargreen
@ 2017-03-15 19:20                     ` Julia Cartwright
  2017-03-17 18:12                     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 18+ messages in thread
From: Julia Cartwright @ 2017-03-15 19:20 UTC (permalink / raw)
  To: Wargreen; +Cc: Daniel Vidal, Pankaj Gupta, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 3637 bytes --]

On Fri, Mar 10, 2017 at 06:51:27PM +0100, Wargreen wrote:
> Do you think i should send an issue to DRM/Radeon ?

More eyes certainly wouldn't hurt.

> On 07/03/2017 19:41, Wargreen wrote:
> > Sorry, i'm more a user than a debugger...
> >
> > So, the /proc/[PID]/stack output, with a gnome_shell:
> > [<ffffffffc0b37e9f>] radeon_fence_default_wait+0xbf/0x170 [radeon]
> > [<ffffffffae34227d>] fence_wait_timeout+0x9d/0x350
> > [<ffffffffc0a2dba5>] ttm_bo_vm_fault+0x455/0x530 [ttm]
> > [<ffffffffc0b39e27>] radeon_ttm_fault+0x47/0x70 [radeon]
> > [<ffffffffae00f9d1>] __do_fault+0x81/0x170
> > [<ffffffffae01377b>] handle_mm_fault+0x57b/0x1450
> > [<ffffffffade70459>] __do_page_fault+0x289/0x5d0
> > [<ffffffffade707c2>] do_page_fault+0x22/0x30
> > [<ffffffffae4fccf8>] page_fault+0x28/0x30
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > And the corresponding locks :
> > Mar  7 19:15:49 LaChoze kernel: [  239.624948] NOHZ:
> > local_softirq_pending 80
> > Mar  7 19:15:57 LaChoze kernel: [  247.718336] sysrq: SysRq : Show Locks
> > Held
> > Mar  7 19:15:57 LaChoze kernel: [  247.718342]
> > Mar  7 19:15:57 LaChoze kernel: [  247.718342] Showing all locks held in
> > the system:
> > Mar  7 19:15:57 LaChoze kernel: [  247.718371] 5 locks held by
> > irq/1-i8042/126:
> > Mar  7 19:15:57 LaChoze kernel: [  247.718372]  #0: 
> > (&serio->lock){+.+...}, at: [<ffffffffae34dab8>] serio_interrupt+0x28/0x80
> > Mar  7 19:15:57 LaChoze kernel: [  247.718378]  #1: 
> > (&dev->event_lock){+.+...}, at: [<ffffffffae353cba>] input_event+0x3a/0x60
> > Mar  7 19:15:57 LaChoze kernel: [  247.718382]  #2: 
> > (rcu_read_lock){......}, at: [<ffffffffae352ee5>]
> > input_pass_values.part.5+0x5/0x270
> > Mar  7 19:15:57 LaChoze kernel: [  247.718385]  #3: 
> > (rcu_read_lock){......}, at: [<ffffffffae2c1eb5>] __handle_sysrq+0x5/0x220
> > Mar  7 19:15:57 LaChoze kernel: [  247.718389]  #4: 
> > (tasklist_lock){+.+...}, at: [<ffffffffadeeae1d>]
> > debug_show_all_locks+0x3d/0x1a0
> > Mar  7 19:15:57 LaChoze kernel: [  247.718412] 1 lock held by in:imklog/895:
> > Mar  7 19:15:57 LaChoze kernel: [  247.718412]  #0: 
> > (&f->f_pos_lock){+.+.+.}, at: [<ffffffffae08883a>] __fdget_pos+0x4a/0x50
> > Mar  7 19:15:57 LaChoze kernel: [  247.718435] 2 locks held by Xorg/2215:
> > Mar  7 19:15:57 LaChoze kernel: [  247.718435]  #0: 
> > (&rdev->exclusive_lock){++++.+}, at: [<ffffffffadef3710>]
> > rt_down_read+0x10/0x20
> > Mar  7 19:15:57 LaChoze kernel: [  247.718439]  #1: 
> > (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffadef3710>]
> > rt_down_read+0x10/0x20
> > Mar  7 19:15:57 LaChoze kernel: [  247.718452] 2 locks held by
> > gnome-shell/2412:
> > Mar  7 19:15:57 LaChoze kernel: [  247.718453]  #0: 
> > (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffadef3710>]
> > rt_down_read+0x10/0x20
> > Mar  7 19:15:57 LaChoze kernel: [  247.718456]  #1: 
> > (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc0a2d7b0>]
> > ttm_bo_vm_fault+0x60/0x530 [ttm]
> > Mar  7 19:15:57 LaChoze kernel: [  247.718487]
> > Mar  7 19:15:57 LaChoze kernel: [  247.718487]
> > =============================================

Hmm, there is nothing obviously wrong here, or at least nothing that
sticks out to me.  I see that there are a few fence tracepoints that the
kernel has made available, perhaps you can run an execution in the
working case with those tracepoints enabled (and probably sched/ irq/
events as well), and compare them to a test run where it fails.

Beyond that, you might have to get help from folks more familiar with
this particular driver :(.

   Julia

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-10 17:51                   ` Wargreen
  2017-03-15 19:20                     ` Julia Cartwright
@ 2017-03-17 18:12                     ` Sebastian Andrzej Siewior
  2017-03-20 13:48                       ` Wargreen
       [not found]                       ` <06ee1435-7fb6-3cf5-c305-96a020665c01@lebib.org>
  1 sibling, 2 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-03-17 18:12 UTC (permalink / raw)
  To: Wargreen; +Cc: Julia Cartwright, Daniel Vidal, Pankaj Gupta, linux-rt-users

On 2017-03-10 18:51:27 [+0100], Wargreen wrote:
> Do you think i should send an issue to DRM/Radeon ?

Wargreen, how long does it take to trigger this issue for you?

What is it, that you do to trigger this? What is _exactly_ the thing you
do?

Could you please additionally also do
	echo t > /proc/sysrq-trigger

once this happens? And please make sure your dmesg buffer big enough to
handle the output (and if not, please increase CONFIG_LOG_BUF_SHIFT).

It seems that the first logs you uploaded kalaj.darktech.org had to
frame pointers enabled but in the recent logs you included here in this
thread you had them enabled. Good. Please keep them enabled.

Please attach the logs to the email and do not paste them since they get
reformatted and it is hard to read them.

> Thanks for your help !

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-02-26 11:55     ` Daniel Vidal
  2017-03-02 21:10       ` Julia Cartwright
@ 2017-03-17 18:15       ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-03-17 18:15 UTC (permalink / raw)
  To: Daniel Vidal; +Cc: Wargreen, Pankaj Gupta, linux-rt-users

On 2017-02-26 12:55:01 [+0100], Daniel Vidal wrote:
> Hi
Hi Daniel,

> The radeon driver hangs from 3.16 kernel.
> 
> RT patch for radeon driver only disable ( preempt_disable_rt() ) and
> reenable ( preempt_enable_rt() ) the whole RT in file radeon_display.c

how long does it take for you to get this reproduced? Do you simple surf
with chrome?

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-17 18:12                     ` Sebastian Andrzej Siewior
@ 2017-03-20 13:48                       ` Wargreen
       [not found]                       ` <06ee1435-7fb6-3cf5-c305-96a020665c01@lebib.org>
  1 sibling, 0 replies; 18+ messages in thread
From: Wargreen @ 2017-03-20 13:48 UTC (permalink / raw)
  Cc: linux-rt-users


[-- Attachment #1.1: Type: text/plain, Size: 1762 bytes --]

It seem than my mail was not distributed by Majordomo. Maybe to big attachements ?

So here this mail without the files, but there remain disponible to : 
http://kalaj.darktech.org/~wargreen/logs/radeon/2448_syslog.log
http://kalaj.darktech.org/~wargreen/logs/radeon/2448_stack.log



Hello, thanks for your interest !

On 17/03/2017 19:12, Sebastian Andrzej Siewior wrote:

> Wargreen, how long does it take to trigger this issue for you?
> What is it, that you do to trigger this? What is _exactly_ the thing you
> do?

The common point each time it's an action with the GUI, for exemple
trigger the gnome's hot corner or launch a 3D game. This time just
log-in into my gnome session, and it hang when the fade of the desktop
should happen. (sorry for my english)

> Could you please additionally also do
> 	echo t > /proc/sysrq-trigger
>
> once this happens? And please make sure your dmesg buffer big enough to
> handle the output (and if not, please increase CONFIG_LOG_BUF_SHIFT).

Attached, the log with sysrq "d", "t" and the /proc/PID/stack output.

> It seems that the first logs you uploaded kalaj.darktech.org had to
> frame pointers enabled but in the recent logs you included here in this
> thread you had them enabled. Good. Please keep them enabled.

If needed for debuging, i can re-build the kernel with others options.

> Please attach the logs to the email and do not paste them since they get
> reformatted and it is hard to read them.
>
>> Thanks for your help !

Again !

> Sebastian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
       [not found]                       ` <06ee1435-7fb6-3cf5-c305-96a020665c01@lebib.org>
@ 2017-03-20 21:18                         ` Sebastian Andrzej Siewior
  2017-03-21 19:38                           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-03-20 21:18 UTC (permalink / raw)
  To: Wargreen; +Cc: Julia Cartwright, Daniel Vidal, Pankaj Gupta, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 3565 bytes --]

On 2017-03-19 15:29:17 [+0100], Wargreen wrote:
> Hello, thanks for your interest !
Hi,

> On 17/03/2017 19:12, Sebastian Andrzej Siewior wrote:
> > Wargreen, how long does it take to trigger this issue for you?
> > What is it, that you do to trigger this? What is _exactly_ the thing you
> > do?
> The common point each time it's an action with the GUI, for exemple
> trigger the gnome's hot corner or launch a 3D game. This time just
> log-in into my gnome session, and it hang when the fade of the desktop
> should happen. (sorry for my english)

So I have one body where it triggers from time to time and it triggers
less likely with debug. You on the other hand have something to trigger
it, good. Maybe I should try the 3D Game thing because gnome is working
fine here…

> > Could you please additionally also do
> > 	echo t > /proc/sysrq-trigger
> >
> > once this happens? And please make sure your dmesg buffer big enough to
> > handle the output (and if not, please increase CONFIG_LOG_BUF_SHIFT).
> Attached, the log with sysrq "d", "t" and the /proc/PID/stack output.

there d but I kind of miss t.

> > It seems that the first logs you uploaded kalaj.darktech.org had to
> > frame pointers enabled but in the recent logs you included here in this
> > thread you had them enabled. Good. Please keep them enabled.
> If needed for debuging, i can re-build the kernel with others options.

|1 lock held by gnome-shell/1993:
| #0:  (&rdev->exclusive_lock){++++.+}, at: [<ffffffffa6ef3710>] rt_down_read+0x10/0x20
|2 locks held by Xorg/2230:
| #0:  (&rdev->exclusive_lock){++++.+}, at: [<ffffffffa6ef3710>] rt_down_read+0x10/0x20
| #1:  (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa6ef3710>] rt_down_read+0x10/0x20
|2 locks held by gnome-shell/2448:
| #0:  (&rdev->pm.mclk_lock){++++++}, at: [<ffffffffa6ef3710>] rt_down_read+0x10/0x20
| #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffc08b47b0>] ttm_bo_vm_fault+0x60/0x530 [ttm]

1993 wants exclusive_lock and blocks on it.
2230 has exclusive_lock and blocks on mclk_lock.
2448 has mclk_lock nad reservation_ww_class_mutex. I *think* that one
waits for an IRQ from the GPU which is not coming. My guess is, that it
sits somewhere in ttm_bo_vm_fault_idle() and waits for wakeup.
If this triggers so easy, could you please check if the interrupt
counter for radeon in /proc/interrupts increases after the accident?

Could you please recompile the kernel with CONFIG_ENABLE_DEFAULT_TRACERS
enable and the patch attached? The patch should fix one trace event for
RT, add the flags argument to the trace and disable the trace once the
first "hung" is sent by the kernel.
After boot please enable the following trace events:

  echo 1 > /sys/kernel/debug/tracing/events/fence/enable
  echo 1 > /sys/kernel/debug/tracing/events/radeon/enable
  echo 1 > /sys/kernel/debug/tracing/events/sched/sched_switch/enable 
  echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable 
  
  echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb 
  echo 60 > /proc/sys/kernel/hung_task_timeout_secs

The last two are for tuning. I decreased the hung-task-time out from 120
to 60 secs and increased the tracing buffer to 16MiB. I hope that is
enough. One this triggers, please send the dmesg output with
sysrq-trigger t (the syslog file is just fine) and the output of
/sys/kernel/debug/tracing/trace. That fine will be huge so please
compress it first and it might be best to leave it for download and send
it to the list.

Sebastian

[-- Attachment #2: radeon_debug.patch --]
[-- Type: text/x-diff, Size: 3477 bytes --]

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 1b0dcad916b0..1e0623a93dce 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -395,6 +395,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
 unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
+unsigned radeon_fence_count_emitted_trace(struct radeon_device *rdev, int ring);
 bool radeon_fence_need_sync(struct radeon_fence *fence, int ring);
 void radeon_fence_note_sync(struct radeon_fence *fence, int ring);
 static inline struct radeon_fence *radeon_fence_later(struct radeon_fence *a,
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 7ef075acde9c..4da354eb6ed6 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -745,6 +745,23 @@ unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring)
 	return (unsigned)emitted;
 }
 
+unsigned radeon_fence_count_emitted_trace(struct radeon_device *rdev, int ring)
+{
+	uint64_t emitted;
+
+	/* We are not protected by ring lock when reading the last sequence
+	 * but it's ok to report slightly wrong fence count here.
+	 */
+	emitted = rdev->fence_drv[ring].sync_seq[ring]
+		- atomic64_read(&rdev->fence_drv[ring].last_seq);
+	/* to avoid 32bits warp around */
+	if (emitted > 0x10000000) {
+		emitted = 0x10000000;
+	}
+	return (unsigned)emitted;
+}
+
+
 /**
  * radeon_fence_need_sync - do we need a semaphore
  *
diff --git a/drivers/gpu/drm/radeon/radeon_trace.h b/drivers/gpu/drm/radeon/radeon_trace.h
index fdce4062901f..1a063ef3b309 100644
--- a/drivers/gpu/drm/radeon/radeon_trace.h
+++ b/drivers/gpu/drm/radeon/radeon_trace.h
@@ -38,7 +38,7 @@ TRACE_EVENT(radeon_cs,
 	    TP_fast_assign(
 			   __entry->ring = p->ring;
 			   __entry->dw = p->chunk_ib->length_dw;
-			   __entry->fences = radeon_fence_count_emitted(
+			   __entry->fences = radeon_fence_count_emitted_trace(
 				p->rdev, p->ring);
 			   ),
 	    TP_printk("ring=%u, dw=%u, fences=%u",
diff --git a/include/trace/events/fence.h b/include/trace/events/fence.h
index d6dfa05ba322..ed1be87c9834 100644
--- a/include/trace/events/fence.h
+++ b/include/trace/events/fence.h
@@ -59,6 +59,7 @@ DECLARE_EVENT_CLASS(fence,
 		__string(timeline, fence->ops->get_timeline_name(fence))
 		__field(unsigned int, context)
 		__field(unsigned int, seqno)
+		__field(unsigned long, flags)
 	),
 
 	TP_fast_assign(
@@ -66,11 +67,12 @@ DECLARE_EVENT_CLASS(fence,
 		__assign_str(timeline, fence->ops->get_timeline_name(fence))
 		__entry->context = fence->context;
 		__entry->seqno = fence->seqno;
+		__entry->flags = fence->flags;
 	),
 
-	TP_printk("driver=%s timeline=%s context=%u seqno=%u",
+	TP_printk("driver=%s timeline=%s context=%u seqno=%u fl=%lx",
 		  __get_str(driver), __get_str(timeline), __entry->context,
-		  __entry->seqno)
+		  __entry->seqno, __entry->flags)
 );
 
 DEFINE_EVENT(fence, fence_emit,
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 2b59c82cc3e1..6798bf80852a 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -97,6 +97,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 	}
 
 	trace_sched_process_hang(t);
+	tracing_off();
 
 	if (!sysctl_hung_task_warnings && !sysctl_hung_task_panic)
 		return;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Radeon driver hang with 4.9.0-rt
  2017-03-20 21:18                         ` Sebastian Andrzej Siewior
@ 2017-03-21 19:38                           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-03-21 19:38 UTC (permalink / raw)
  To: Wargreen; +Cc: Julia Cartwright, Daniel Vidal, Pankaj Gupta, linux-rt-users

On 2017-03-20 22:18:16 [+0100], To Wargreen wrote:
> 1993 wants exclusive_lock and blocks on it.
> 2230 has exclusive_lock and blocks on mclk_lock.
> 2448 has mclk_lock nad reservation_ww_class_mutex. I *think* that one
> waits for an IRQ from the GPU which is not coming. My guess is, that it
> sits somewhere in ttm_bo_vm_fault_idle() and waits for wakeup.
> If this triggers so easy, could you please check if the interrupt
> counter for radeon in /proc/interrupts increases after the accident?

The other theory is that the code can't continue because we can't have
multiple reader of exclusive_lock. I will try to verify that but this
will take some time…

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Radeon driver hang with 4.9.0-rt
@ 2017-02-23 18:03 Wargreen
  0 siblings, 0 replies; 18+ messages in thread
From: Wargreen @ 2017-02-23 18:03 UTC (permalink / raw)
  To: linux-rt-users


[-- Attachment #1.1: Type: text/plain, Size: 5372 bytes --]

Hello linux-rt list,

After swith to a radeon HD5850, i have a full graphic hang when an
application require lot of GPU. Kill this application (via ssh) resume
the normal operation.
I run Debian SID, it's happen with both linux-image-4.9.0.1-rt-amd64 and
linux-image-4.9.0.2-rt-amd64.

Extract of my syslog (full version at
http://kalaj.darktech.org/~wargreen/logs/radeon/syslog)  :

Feb 19 23:17:19 LaChoze kernel: [  605.169580] INFO: task Xorg:2014
blocked for more than 120 seconds.
Feb 19 23:17:19 LaChoze kernel: [  605.169588]       Not tainted
4.9.0-1-rt-amd64 #1
Feb 19 23:17:19 LaChoze kernel: [  605.169590] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 19 23:17:19 LaChoze kernel: [  605.169593] Xorg            D    0 
2014   2012 0x00000004
Feb 19 23:17:19 LaChoze kernel: [  605.169598]  ffff9235afe24800
0000000000000000 ffff9235b6c69200 ffff9235b12e9f80
Feb 19 23:17:19 LaChoze kernel: [  605.169602]  ffff9235b4e4ee40
ffffa16b023a7be8 ffffffff85a16cbf 00ff9235b12e9f80
Feb 19 23:17:19 LaChoze kernel: [  605.169606]  ffff9235b6c69200
ffff9235b12e9f80 ffff9235b12e9f80 0000000000000000
Feb 19 23:17:19 LaChoze kernel: [  605.169611] Call Trace:
Feb 19 23:17:19 LaChoze kernel: [  605.169620]  [<ffffffff85a16cbf>] ?
__schedule+0x26f/0x5b0
Feb 19 23:17:19 LaChoze kernel: [  605.169623]  [<ffffffff85a17043>] ?
schedule+0x43/0xc0
Feb 19 23:17:19 LaChoze kernel: [  605.169625]  [<ffffffff85a18aa3>] ?
__rt_mutex_slowlock+0x83/0x150
Feb 19 23:17:19 LaChoze kernel: [  605.169628]  [<ffffffff85a18fcc>] ?
rt_mutex_slowlock+0x11c/0x2a0
Feb 19 23:17:19 LaChoze kernel: [  605.169633]  [<ffffffff854cd9d4>] ?
rt_down_read+0x24/0x30
Feb 19 23:17:19 LaChoze kernel: [  605.169663]  [<ffffffffc0b2eaf7>] ?
radeon_gem_create_ioctl+0x37/0x100 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169680]  [<ffffffffc09f9b9a>] ?
drm_ioctl+0x1ea/0x470 [drm]
Feb 19 23:17:19 LaChoze kernel: [  605.169703]  [<ffffffffc0b2eac0>] ?
radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169706]  [<ffffffff8547dc82>] ?
unpin_current_cpu+0x12/0x70
Feb 19 23:17:19 LaChoze kernel: [  605.169723]  [<ffffffffc0afd049>] ?
radeon_drm_ioctl+0x49/0x80 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169726]  [<ffffffff856257bf>] ?
do_vfs_ioctl+0x9f/0x610
Feb 19 23:17:19 LaChoze kernel: [  605.169730]  [<ffffffff858f2e4d>] ?
__sys_recvmsg+0x7d/0x90
Feb 19 23:17:19 LaChoze kernel: [  605.169733]  [<ffffffff85625da4>] ?
SyS_ioctl+0x74/0x80
Feb 19 23:17:19 LaChoze kernel: [  605.169736]  [<ffffffff85a1b27b>] ?
system_call_fast_compare_end+0xc/0x9b
Feb 19 23:17:19 LaChoze kernel: [  605.169770] INFO: task
E.d3d9.OpenGL:4263 blocked for more than 120 seconds.
Feb 19 23:17:19 LaChoze kernel: [  605.169773]       Not tainted
4.9.0-1-rt-amd64 #1
Feb 19 23:17:19 LaChoze kernel: [  605.169774] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 19 23:17:19 LaChoze kernel: [  605.169776] E.d3d9.OpenGL   D    0 
4263   4243 0x20020000
Feb 19 23:17:19 LaChoze kernel: [  605.169779]  ffff9235b0839b00
0000000000000000 ffff9235b6ce9200 ffff92341d6a4ec0
Feb 19 23:17:19 LaChoze kernel: [  605.169783]  ffff9235b4e4af40
ffffa16b020a3b40 ffffffff85a16cbf 0000000000000000
Feb 19 23:17:19 LaChoze kernel: [  605.169787]  ffff9235b6ce9200
ffff92341d6a4ec0 ffff92341d6a4ec0 0000000000000000
Feb 19 23:17:19 LaChoze kernel: [  605.169791] Call Trace:
Feb 19 23:17:19 LaChoze kernel: [  605.169795]  [<ffffffff85a16cbf>] ?
__schedule+0x26f/0x5b0
Feb 19 23:17:19 LaChoze kernel: [  605.169798]  [<ffffffff85a17043>] ?
schedule+0x43/0xc0
Feb 19 23:17:19 LaChoze kernel: [  605.169800]  [<ffffffff85a18aa3>] ?
__rt_mutex_slowlock+0x83/0x150
Feb 19 23:17:19 LaChoze kernel: [  605.169802]  [<ffffffff85a18fcc>] ?
rt_mutex_slowlock+0x11c/0x2a0
Feb 19 23:17:19 LaChoze kernel: [  605.169805]  [<ffffffff854cd9d4>] ?
rt_down_read+0x24/0x30
Feb 19 23:17:19 LaChoze kernel: [  605.169827]  [<ffffffffc0b1becd>] ?
radeon_bo_create+0x15d/0x280 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169849]  [<ffffffffc0b2e6ce>] ?
radeon_gem_object_create+0xbe/0x1a0 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169870]  [<ffffffffc0b2eb29>] ?
radeon_gem_create_ioctl+0x69/0x100 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169884]  [<ffffffffc09f9b9a>] ?
drm_ioctl+0x1ea/0x470 [drm]
Feb 19 23:17:19 LaChoze kernel: [  605.169905]  [<ffffffffc0b2eac0>] ?
radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169908]  [<ffffffff8547dc82>] ?
unpin_current_cpu+0x12/0x70
Feb 19 23:17:19 LaChoze kernel: [  605.169924]  [<ffffffffc0afd049>] ?
radeon_drm_ioctl+0x49/0x80 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169950]  [<ffffffffc0be89c0>] ?
radeon_kms_compat_ioctl+0x10/0x20 [radeon]
Feb 19 23:17:19 LaChoze kernel: [  605.169953]  [<ffffffff8566e4ec>] ?
compat_SyS_ioctl+0xbc/0x1140
Feb 19 23:17:19 LaChoze kernel: [  605.169956]  [<ffffffff85a1704d>] ?
schedule+0x4d/0xc0
Feb 19 23:17:19 LaChoze kernel: [  605.169959]  [<ffffffff85403e1d>] ?
do_fast_syscall_32+0x8d/0x170
Feb 19 23:17:19 LaChoze kernel: [  605.169962]  [<ffffffff85a1caac>] ?
entry_SYSENTER_compat+0x4c/0x5b
[...]

Is there a workaround ?
I'm here for more informations or tests.

Thanks for the RT !



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-03-21 20:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-23 18:15 Radeon driver hang with 4.9.0-rt Wargreen
2017-02-24  3:38 ` Pankaj Gupta
2017-02-24 22:09   ` Wargreen
2017-02-26 11:55     ` Daniel Vidal
2017-03-02 21:10       ` Julia Cartwright
2017-03-06 16:14         ` Wargreen
2017-03-06 17:33           ` Julia Cartwright
2017-03-06 17:59             ` Wargreen
2017-03-06 19:07               ` Julia Cartwright
2017-03-07 18:41                 ` Wargreen
2017-03-10 17:51                   ` Wargreen
2017-03-15 19:20                     ` Julia Cartwright
2017-03-17 18:12                     ` Sebastian Andrzej Siewior
2017-03-20 13:48                       ` Wargreen
     [not found]                       ` <06ee1435-7fb6-3cf5-c305-96a020665c01@lebib.org>
2017-03-20 21:18                         ` Sebastian Andrzej Siewior
2017-03-21 19:38                           ` Sebastian Andrzej Siewior
2017-03-17 18:15       ` Sebastian Andrzej Siewior
  -- strict thread matches above, loose matches on Subject: below --
2017-02-23 18:03 Wargreen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.