All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
       [not found] <2699.222.92.8.142.1318833267.squirrel@mail.lemote.com>
@ 2011-10-18  8:35 ` Chen Jie
  2011-10-20 16:31   ` Michel Dänzer
  0 siblings, 1 reply; 5+ messages in thread
From: Chen Jie @ 2011-10-18  8:35 UTC (permalink / raw)
  To: chenhc; +Cc: Michel Dänzer, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4526 bytes --]

Hi,

在 2011年10月17日 下午2:34, <chenhc@lemote.com>写道:

> If I start X but switch to the console, then do suspend & resume, "GPU
> reset" hardly happen. but there is a new problem that the IRQ of radeon
> card is disabled. Maybe "GPU reset" has something to do with "IRQ
> disabled"?
>
> I have tried "irqpoll", it doesn't fix this problem.
>
> [  571.914062] irq 6: nobody cared (try booting with the "irqpoll" option)
> [  571.914062] Call Trace:
> [  571.914062] [<ffffffff806f3248>] dump_stack+0x8/0x34
> [  571.914062] [<ffffffff8027e1e4>] __report_bad_irq.clone.6+0x44/0x15c
> [  571.914062] [<ffffffff8027e584>] note_interrupt+0x204/0x2a0
> [  571.914062] [<ffffffff8027c7cc>] handle_irq_event_percpu+0x19c/0x1f8
> [  571.914062] [<ffffffff8027c890>] handle_irq_event+0x68/0xa8
> [  571.914062] [<ffffffff8027f038>] handle_level_irq+0xd8/0x13c
> [  571.914062] [<ffffffff8027bec8>] generic_handle_irq+0x48/0x58
> [  571.914062] [<ffffffff80204574>] do_IRQ+0x18/0x24
> [  571.914062] [<ffffffff8020152c>] mach_irq_dispatch+0xf0/0x194
> [  571.914062] [<ffffffff80202a40>] ret_from_irq+0x0/0x4
> [  571.914062]
> [  571.914062] handlers:
> [  571.914062] [<ffffffff8053bba8>] radeon_driver_irq_handler_kms
>
> P.S.: use the latest kernel from git, and irq6 is not shared by other
> devices.
>
> Does fence_wait depends on GPU's interrupt? If yes, then can I say "GPU
lockup" is caused by unexpected disabling of GPU's irq?


> > Hi Alex, Michel
> >
> > 2011/10/5 Alex Deucher <alexdeucher@gmail.com>
> >
> >> 2011/10/5 Michel D鋘zer <michel@daenzer.net>:
> >> > On Don, 2011-09-29 at 17:17 +0800, Chen Jie wrote:
> >> >>
> >> >> We got occasionally "GPU lockup" after resuming from suspend(on
> >> mipsel
> >> >> platform with a mips64 compatible CPU and rs780e, the kernel is
> >> >> 3.1.0-rc8 64bit).  Related kernel message:
> >> >
> >> > [...]
> >> >
> >> >> [  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
> >> >> 10019msec
> >> >> [  177.089843] ------------[ cut here ]------------
> >> >> [  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
> >> >> radeon_fence_wait+0x25c/0x33c()
> >> >> [  177.105468] GPU lockup (waiting for 0x000013C3 last fence id
> >> >> 0x000013AD)
> >> >> [  177.113281] Modules linked in: psmouse serio_raw
> >> >> [  177.117187] Call Trace:
> >> >> [  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
> >> >> [  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
> >> >> [  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
> >> >> [  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
> >> >> [  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
> >> >> [  177.148437] [<ffffffff8053b478>] radeon_gem_wait_idle_ioctl
> >> >> +0x80/0x114
> >> >> [  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
> >> >> [  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
> >> >> [  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
> >> >> [  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
> >> >> [  177.179687] ---[ end trace 92f63d998efe4c6d ]---
> >> >> [  177.187500] radeon 0000:01:05.0: GPU softreset
> >> >> [  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
> >> >> [  177.195312] radeon 0000:01:05.0:
> >> R_008014_GRBM_STATUS2=0x00111103
> >> >> [  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
> >> >> [  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
> >> >
> >> > [...]
> >> >
> >> >> What may cause a "GPU lockup"?
> >> >
> >> > Lots of things... The most common cause is an incorrect command stream
> >> > sent to the GPU by userspace or the kernel.
> >> >
> >> >> Why reset didn't work?
> >> >
> >> > Might be related to 'Wait for MC idle timedout !', but I don't know
> >> > offhand what could be up with that.
> >> >
> >> >
> >> >> BTW,  one question:
> >> >> I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes
> >> >> need_dma32 was set.
> >> >> Is it correct? (drivers/char/agp is not available on mips, could that
> >> >> be the reason?)
> >> >
> >> > Not sure, Alex?
> >>
> >> You don't AGP for newer IGP cards (rs4xx+).  It gets set by default if
> >> the card is not AGP or PCIE.  That should be changed as only the
> >> legacy r1xx PCI GART block has that limitation.  I'll send a patch out
> >> shortly.
> >>
> >> Got it, thanks for the reply.
> >
>

[-- Attachment #1.2: Type: text/html, Size: 6170 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
  2011-10-18  8:35 ` Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend Chen Jie
@ 2011-10-20 16:31   ` Michel Dänzer
  0 siblings, 0 replies; 5+ messages in thread
From: Michel Dänzer @ 2011-10-20 16:31 UTC (permalink / raw)
  To: Chen Jie; +Cc: chenhc, dri-devel

On Die, 2011-10-18 at 16:35 +0800, Chen Jie wrote:
> 
> 在 2011年10月17日 下午2:34, <chenhc@lemote.com>写道:
>         If I start X but switch to the console, then do suspend &
>         resume, "GPU
>         reset" hardly happen. but there is a new problem that the IRQ
>         of radeon
>         card is disabled. Maybe "GPU reset" has something to do with
>         "IRQ
>         disabled"?
>         
>         I have tried "irqpoll", it doesn't fix this problem.
>         
>         [  571.914062] irq 6: nobody cared (try booting with the
>         "irqpoll" option)
>         [  571.914062] Call Trace:
>         [  571.914062] [<ffffffff806f3248>] dump_stack+0x8/0x34
>         [  571.914062] [<ffffffff8027e1e4>] __report_bad_irq.clone.6
>         +0x44/0x15c
>         [  571.914062] [<ffffffff8027e584>] note_interrupt+0x204/0x2a0
>         [  571.914062] [<ffffffff8027c7cc>] handle_irq_event_percpu
>         +0x19c/0x1f8
>         [  571.914062] [<ffffffff8027c890>] handle_irq_event+0x68/0xa8
>         [  571.914062] [<ffffffff8027f038>] handle_level_irq
>         +0xd8/0x13c
>         [  571.914062] [<ffffffff8027bec8>] generic_handle_irq
>         +0x48/0x58
>         [  571.914062] [<ffffffff80204574>] do_IRQ+0x18/0x24
>         [  571.914062] [<ffffffff8020152c>] mach_irq_dispatch
>         +0xf0/0x194
>         [  571.914062] [<ffffffff80202a40>] ret_from_irq+0x0/0x4
>         [  571.914062]
>         [  571.914062] handlers:
>         [  571.914062] [<ffffffff8053bba8>]
>         radeon_driver_irq_handler_kms
>         
>         P.S.: use the latest kernel from git, and irq6 is not shared
>         by other
>         devices.
>         
> Does fence_wait depends on GPU's interrupt? If yes, then can I say
> "GPU lockup" is caused by unexpected disabling of GPU's irq?

No, if the GPU didn't actually lock up, the fences should still signal
eventually, as radeon_fence_signaled()->radeon_fence_poll_locked() is
called after the wait for the SW interrupt times out. 


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
  2011-10-05  9:41 ` Michel Dänzer
@ 2011-10-05 13:54   ` Alex Deucher
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Deucher @ 2011-10-05 13:54 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel, Chen Jie

2011/10/5 Michel Dänzer <michel@daenzer.net>:
> On Don, 2011-09-29 at 17:17 +0800, Chen Jie wrote:
>>
>> We got occasionally "GPU lockup" after resuming from suspend(on mipsel
>> platform with a mips64 compatible CPU and rs780e, the kernel is
>> 3.1.0-rc8 64bit).  Related kernel message:
>
> [...]
>
>> [  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
>> 10019msec
>> [  177.089843] ------------[ cut here ]------------
>> [  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
>> radeon_fence_wait+0x25c/0x33c()
>> [  177.105468] GPU lockup (waiting for 0x000013C3 last fence id
>> 0x000013AD)
>> [  177.113281] Modules linked in: psmouse serio_raw
>> [  177.117187] Call Trace:
>> [  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
>> [  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
>> [  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
>> [  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
>> [  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
>> [  177.148437] [<ffffffff8053b478>] radeon_gem_wait_idle_ioctl
>> +0x80/0x114
>> [  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
>> [  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
>> [  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
>> [  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
>> [  177.179687] ---[ end trace 92f63d998efe4c6d ]---
>> [  177.187500] radeon 0000:01:05.0: GPU softreset
>> [  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
>> [  177.195312] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00111103
>> [  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
>> [  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
>
> [...]
>
>> What may cause a "GPU lockup"?
>
> Lots of things... The most common cause is an incorrect command stream
> sent to the GPU by userspace or the kernel.
>
>> Why reset didn't work?
>
> Might be related to 'Wait for MC idle timedout !', but I don't know
> offhand what could be up with that.
>
>
>> BTW,  one question:
>> I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes
>> need_dma32 was set.
>> Is it correct? (drivers/char/agp is not available on mips, could that
>> be the reason?)
>
> Not sure, Alex?

You don't AGP for newer IGP cards (rs4xx+).  It gets set by default if
the card is not AGP or PCIE.  That should be changed as only the
legacy r1xx PCI GART block has that limitation.  I'll send a patch out
shortly.

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
  2011-09-29  9:17 Chen Jie
@ 2011-10-05  9:41 ` Michel Dänzer
  2011-10-05 13:54   ` Alex Deucher
  0 siblings, 1 reply; 5+ messages in thread
From: Michel Dänzer @ 2011-10-05  9:41 UTC (permalink / raw)
  To: Chen Jie; +Cc: dri-devel

On Don, 2011-09-29 at 17:17 +0800, Chen Jie wrote:
> 
> We got occasionally "GPU lockup" after resuming from suspend(on mipsel
> platform with a mips64 compatible CPU and rs780e, the kernel is
> 3.1.0-rc8 64bit).  Related kernel message:

[...]

> [  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
> 10019msec
> [  177.089843] ------------[ cut here ]------------
> [  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
> radeon_fence_wait+0x25c/0x33c()
> [  177.105468] GPU lockup (waiting for 0x000013C3 last fence id
> 0x000013AD)
> [  177.113281] Modules linked in: psmouse serio_raw
> [  177.117187] Call Trace:
> [  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
> [  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
> [  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
> [  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
> [  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
> [  177.148437] [<ffffffff8053b478>] radeon_gem_wait_idle_ioctl
> +0x80/0x114
> [  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
> [  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
> [  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
> [  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
> [  177.179687] ---[ end trace 92f63d998efe4c6d ]---
> [  177.187500] radeon 0000:01:05.0: GPU softreset
> [  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
> [  177.195312] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00111103
> [  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
> [  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !

[...]

> What may cause a "GPU lockup"?

Lots of things... The most common cause is an incorrect command stream
sent to the GPU by userspace or the kernel. 

> Why reset didn't work?

Might be related to 'Wait for MC idle timedout !', but I don't know
offhand what could be up with that. 


> BTW,  one question:
> I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes
> need_dma32 was set.
> Is it correct? (drivers/char/agp is not available on mips, could that
> be the reason?)

Not sure, Alex?


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.
@ 2011-09-29  9:17 Chen Jie
  2011-10-05  9:41 ` Michel Dänzer
  0 siblings, 1 reply; 5+ messages in thread
From: Chen Jie @ 2011-09-29  9:17 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Michel Dänzer, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4432 bytes --]

Hi,

Add more information.

We got occasionally "GPU lockup" after resuming from suspend(on mipsel
platform with a mips64 compatible CPU and rs780e, the kernel is 3.1.0-rc8
64bit).  Related kernel message:
/* return from STR */
[  156.152343] radeon 0000:01:05.0: WB enabled
[  156.187500] [drm] ring test succeeded in 0 usecs
[  156.187500] [drm] ib test succeeded in 0 usecs
[  156.398437] ata2: SATA link down (SStatus 0 SControl 300)
[  156.398437] ata3: SATA link down (SStatus 0 SControl 300)
[  156.398437] ata4: SATA link down (SStatus 0 SControl 300)
[  156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  156.597656] ata1.00: configured for UDMA/133
[  156.613281] usb 1-5: reset high speed USB device number 4 using ehci_hcd
[  157.027343] usb 3-2: reset low speed USB device number 2 using ohci_hcd
[  157.609375] usb 3-3: reset low speed USB device number 3 using ohci_hcd
[  157.683593] r8169 0000:02:00.0: eth0: link up
[  165.621093] PM: resume of devices complete after 9679.556 msecs
[  165.628906] Restarting tasks ... done.
[  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
10019msec
[  177.089843] ------------[ cut here ]------------
[  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
radeon_fence_wait+0x25c/0x33c()
[  177.105468] GPU lockup (waiting for 0x000013C3 last fence id 0x000013AD)
[  177.113281] Modules linked in: psmouse serio_raw
[  177.117187] Call Trace:
[  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
[  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
[  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
[  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
[  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
[  177.148437] [<ffffffff8053b478>] radeon_gem_wait_idle_ioctl+0x80/0x114
[  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
[  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
[  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
[  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
[  177.179687] ---[ end trace 92f63d998efe4c6d ]---
[  177.187500] radeon 0000:01:05.0: GPU softreset
[  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
[  177.195312] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00111103
[  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
[  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.367187] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[  177.390625] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
[  177.414062] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[  177.417968] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[  177.425781] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x2002B040
[  177.433593] radeon 0000:01:05.0: GPU reset succeed
[  177.605468] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.761718] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.804687] radeon 0000:01:05.0: WB enabled
[  178.000000] [drm:r600_ring_test] *ERROR* radeon: ring test failed
(scratch(0x8504)=0xCAFEDEAD)
[  178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume
[  178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(5).
[  178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(6).
...

What may cause a "GPU lockup"? Why reset didn't work? Any idea?

BTW,  one question:
I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes
need_dma32 was set.
Is it correct? (drivers/char/agp is not available on mips, could that be the
reason?)


[  177.179687]在 2011年9月28日 下午3:23, <chenhc@lemote.com>写道:

> Hi Alex,
>
> When we do STR (S3) with a RS780E radeon card on MIPS platform. "GPU
> reset" may happen after resume (the possibility is about 5%). After that,
> X is unusuable.
>
> We know there is a "ring test" at system resume time and GPU reset time.
> Whether GPU reset happens, the "ring test" at system resume time is always
> successful. But the "ring test" at GPU reset time usually fails.
>
> We use the latest kernel (3.1.0-RC8 from git) and X.org is 7.6.
>
> Any ideas?
>
> Best regards,
> Huacai Chen
>
>

Regards,
- Chen Jie

[-- Attachment #1.2: Type: text/html, Size: 8244 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-10-20 16:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2699.222.92.8.142.1318833267.squirrel@mail.lemote.com>
2011-10-18  8:35 ` Re:[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend Chen Jie
2011-10-20 16:31   ` Michel Dänzer
2011-09-29  9:17 Chen Jie
2011-10-05  9:41 ` Michel Dänzer
2011-10-05 13:54   ` Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.