linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
       [not found] <2b9aabdc-6f44-40cc-ad0b-4180101570c7@mail.android.htc.com>
@ 2012-10-25  5:22 ` Justin P. Mattock
  2012-10-25  8:16   ` Daniel Vetter
  0 siblings, 1 reply; 11+ messages in thread
From: Justin P. Mattock @ 2012-10-25  5:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel, Linux Kernel Mailing List

>
>
> On Tue, Oct 23, 2012 at 10:06:52AM -0700, Justin P. Mattock wrote:
>  > This is happening both with MAINLINE and NEXT.
>  >
>  > basically system is running fine, then under load system becomes
>  > really sluggish and unresponsive. I was able to get dmesg of the
>  > error..:
>  >
>  > [ 7745.007008] ath9k 0000:05:00.0 wlan0: disabling VHT as WMM/QoS is
>  > not supported by the AP
>  > [ 7745.007736] wlan0: associate with 68:7f:74:b8:05:82 (try 1/3)
>  > [ 7745.011456] wlan0: RX AssocResp from 68:7f:74:b8:05:82
>  > (capab=0x411 status=0 aid=5)
>  > [ 7745.011529] wlan0: associated
>  > [ 8120.812482] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
>  > elapsed... GPU hung
>  > [ 8120.812642] [drm] capturing error event; look for more
>  > information in /debug/dri/0/i915_error_state
>  > [ 8122.328682] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
>  > elapsed... GPU hung
>  > [ 8122.328845] [drm:i915_reset] *ERROR* GPU hanging too fast,
>  > declaring wedged!
>  > [ 8122.328850] [drm:i915_reset] *ERROR* Failed to reset chip.
>  >
>  > full log is here: http://fpaste.org/7xH8/
>  >
>  > as for good kernels from what I remember 3.6.0-rc1. I can try a
>  > bisect on this once I get the time. or if anybody has a patch I can
>  > test.
>
> Can you please rehand your machine, and then grab the i915_error_state
> from debugfs? That contains the gpu hang dump we need to diagnose things.
>
> And the bisect would obviously be awesome.
>
> Thanks, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

took a bit to trigger, but finally fired off.

here is a link to the file..: intel_error_decode
http://www.filefactory.com/file/22bypyjhs4mx

the file was to large to send to the list.. let me know if you need more 
info with this.
also if anybody has any ideas to trigger this would be appreciated so 
the bisect can be more precise. right now dont even think its worth it, 
due to not being able to trigger the crash causing the bisect to go 
astray and pointing to a wrong commit(which has happened in the past) 
but then again you never know.

Justin P. Mattock

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-25  5:22 ` [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung Justin P. Mattock
@ 2012-10-25  8:16   ` Daniel Vetter
  2012-10-25  8:47     ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2012-10-25  8:16 UTC (permalink / raw)
  To: Justin P. Mattock; +Cc: dri-devel, Linux Kernel Mailing List

On Thu, Oct 25, 2012 at 7:22 AM, Justin P. Mattock
<justinmattock@gmail.com> wrote:
>
> here is a link to the file..: intel_error_decode
> http://www.filefactory.com/file/22bypyjhs4mx

I haven't figured out how to access this thing. Can you please file a
bug report on bugs.freedesktop.org and attach it there?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-25  8:16   ` Daniel Vetter
@ 2012-10-25  8:47     ` Chris Wilson
  2012-10-26  4:43       ` Justin P. Mattock
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2012-10-25  8:47 UTC (permalink / raw)
  To: Daniel Vetter, Justin P. Mattock; +Cc: dri-devel, Linux Kernel Mailing List

On Thu, 25 Oct 2012 10:16:08 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Oct 25, 2012 at 7:22 AM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
> >
> > here is a link to the file..: intel_error_decode
> > http://www.filefactory.com/file/22bypyjhs4mx
> 
> I haven't figured out how to access this thing. Can you please file a
> bug report on bugs.freedesktop.org and attach it there?

No worries, it is another ILK hang similar to the ones reported earlier
- it just seems the ring stops advancing. Hopefully it is a missing w/a
from http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-25  8:47     ` Chris Wilson
@ 2012-10-26  4:43       ` Justin P. Mattock
  2012-10-26  8:05         ` Daniel Vetter
  0 siblings, 1 reply; 11+ messages in thread
From: Justin P. Mattock @ 2012-10-26  4:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, dri-devel, Linux Kernel Mailing List

On 10/25/2012 01:47 AM, Chris Wilson wrote:
> On Thu, 25 Oct 2012 10:16:08 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Thu, Oct 25, 2012 at 7:22 AM, Justin P. Mattock
>> <justinmattock@gmail.com> wrote:
>>>
>>> here is a link to the file..: intel_error_decode
>>> http://www.filefactory.com/file/22bypyjhs4mx
>>
>> I haven't figured out how to access this thing. Can you please file a
>> bug report on bugs.freedesktop.org and attach it there?

Oops.. I filed with the kernel. maybe can just add a cc's
https://bugzilla.kernel.org/show_bug.cgi?id=49571

>
> No worries, it is another ILK hang similar to the ones reported earlier
> - it just seems the ring stops advancing. Hopefully it is a missing w/a
> from http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile
> -Chris
>

well if this means building libdrm etc.. then thats not a problem, more 
time consuming if anything. perhaps an *.rpm that I can test to see?


Justin P. Mattock

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-26  4:43       ` Justin P. Mattock
@ 2012-10-26  8:05         ` Daniel Vetter
  2012-10-26 17:35           ` Justin P. Mattock
  2012-10-26 20:57           ` Justin P. Mattock
  0 siblings, 2 replies; 11+ messages in thread
From: Daniel Vetter @ 2012-10-26  8:05 UTC (permalink / raw)
  To: Justin P. Mattock; +Cc: Chris Wilson, dri-devel, Linux Kernel Mailing List

On Fri, Oct 26, 2012 at 6:43 AM, Justin P. Mattock
<justinmattock@gmail.com> wrote:
>>
>> No worries, it is another ILK hang similar to the ones reported earlier
>> - it just seems the ring stops advancing. Hopefully it is a missing w/a
>> from http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile
>> -Chris
>>
>
> well if this means building libdrm etc.. then thats not a problem, more time
> consuming if anything. perhaps an *.rpm that I can test to see?

It's not libdrm, the above is just a kernel git tree with a bunch of
ironlake workarounds.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-26  8:05         ` Daniel Vetter
@ 2012-10-26 17:35           ` Justin P. Mattock
  2012-10-26 20:57           ` Justin P. Mattock
  1 sibling, 0 replies; 11+ messages in thread
From: Justin P. Mattock @ 2012-10-26 17:35 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Chris Wilson, dri-devel, Linux Kernel Mailing List

On 10/26/2012 01:05 AM, Daniel Vetter wrote:
> On Fri, Oct 26, 2012 at 6:43 AM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
>>>
>>> No worries, it is another ILK hang similar to the ones reported earlier
>>> - it just seems the ring stops advancing. Hopefully it is a missing w/a
>>> from http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile
>>> -Chris
>>>
>>
>> well if this means building libdrm etc.. then thats not a problem, more time
>> consuming if anything. perhaps an *.rpm that I can test to see?
>
> It's not libdrm, the above is just a kernel git tree with a bunch of
> ironlake workarounds.
> -Daniel
>


hmm.. then in that case maybe I should pull and run that kernel to see 
if the crash occurs, before bisecting(if anything).
will do once I get time to download.

Justin P. Mattock

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-26  8:05         ` Daniel Vetter
  2012-10-26 17:35           ` Justin P. Mattock
@ 2012-10-26 20:57           ` Justin P. Mattock
  2012-10-27 13:56             ` Daniel Vetter
  1 sibling, 1 reply; 11+ messages in thread
From: Justin P. Mattock @ 2012-10-26 20:57 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Chris Wilson, dri-devel, Linux Kernel Mailing List

On 10/26/2012 01:05 AM, Daniel Vetter wrote:
> On Fri, Oct 26, 2012 at 6:43 AM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
>>>
>>> No worries, it is another ILK hang similar to the ones reported earlier
>>> - it just seems the ring stops advancing. Hopefully it is a missing w/a
>>> from http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile
>>> -Chris
>>>
>>
>> well if this means building libdrm etc.. then thats not a problem, more time
>> consuming if anything. perhaps an *.rpm that I can test to see?
>
> It's not libdrm, the above is just a kernel git tree with a bunch of
> ironlake workarounds.
> -Daniel
>


nice..

:~/drm> git clone git://people.freedesktop.org/~danvet/drm
Cloning into 'drm'...
remote: Counting objects: 2728390, done.
remote: Compressing objects: 100% (418606/418606), done.
remote: Total 2728390 (delta 2293727), reused 2717443 (delta 2282880)
Receiving objects: 100% (2728390/2728390), 637.95 MiB | 599 KiB/s, done.
Resolving deltas: 100% (2293727/2293727), done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.


so now I have to go on a witch hunt for 600MB's in my system.

Justin P. Mattock

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-26 20:57           ` Justin P. Mattock
@ 2012-10-27 13:56             ` Daniel Vetter
  2012-10-27 19:11               ` Justin P. Mattock
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2012-10-27 13:56 UTC (permalink / raw)
  To: Justin P. Mattock; +Cc: Chris Wilson, dri-devel, Linux Kernel Mailing List

On Fri, Oct 26, 2012 at 10:57 PM, Justin P. Mattock
<justinmattock@gmail.com> wrote:
>
> :~/drm> git clone git://people.freedesktop.org/~danvet/drm
> Cloning into 'drm'...
> remote: Counting objects: 2728390, done.
> remote: Compressing objects: 100% (418606/418606), done.
> remote: Total 2728390 (delta 2293727), reused 2717443 (delta 2282880)
> Receiving objects: 100% (2728390/2728390), 637.95 MiB | 599 KiB/s, done.
> Resolving deltas: 100% (2293727/2293727), done.
> warning: remote HEAD refers to nonexistent ref, unable to checkout.
>
>
> so now I have to go on a witch hunt for 600MB's in my system.

$ git checkout origin/ilk-wa-pile

... and you have the right branch checked out. No need for pitchforks
and witch hunts ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-27 13:56             ` Daniel Vetter
@ 2012-10-27 19:11               ` Justin P. Mattock
  0 siblings, 0 replies; 11+ messages in thread
From: Justin P. Mattock @ 2012-10-27 19:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Chris Wilson, dri-devel, Linux Kernel Mailing List

On 10/27/2012 06:56 AM, Daniel Vetter wrote:
> On Fri, Oct 26, 2012 at 10:57 PM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
>>
>> :~/drm> git clone git://people.freedesktop.org/~danvet/drm
>> Cloning into 'drm'...
>> remote: Counting objects: 2728390, done.
>> remote: Compressing objects: 100% (418606/418606), done.
>> remote: Total 2728390 (delta 2293727), reused 2717443 (delta 2282880)
>> Receiving objects: 100% (2728390/2728390), 637.95 MiB | 599 KiB/s, done.
>> Resolving deltas: 100% (2293727/2293727), done.
>> warning: remote HEAD refers to nonexistent ref, unable to checkout.
>>
>>
>> so now I have to go on a witch hunt for 600MB's in my system.
>
> $ git checkout origin/ilk-wa-pile

cool thanks..(not so good at git over here).
>
> ... and you have the right branch checked out. No need for pitchforks
> and witch hunts ;-)
> -Daniel
>


alright.. putting the pitchfork away for now.

Justin P. Mattock


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
  2012-10-23 17:06 Justin P. Mattock
@ 2012-10-23 17:40 ` Daniel Vetter
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2012-10-23 17:40 UTC (permalink / raw)
  To: Justin P. Mattock; +Cc: dri-devel, linux-kernel

On Tue, Oct 23, 2012 at 10:06:52AM -0700, Justin P. Mattock wrote:
> This is happening both with MAINLINE and NEXT.
> 
> basically system is running fine, then under load system becomes
> really sluggish and unresponsive. I was able to get dmesg of the
> error..:
> 
> [ 7745.007008] ath9k 0000:05:00.0 wlan0: disabling VHT as WMM/QoS is
> not supported by the AP
> [ 7745.007736] wlan0: associate with 68:7f:74:b8:05:82 (try 1/3)
> [ 7745.011456] wlan0: RX AssocResp from 68:7f:74:b8:05:82
> (capab=0x411 status=0 aid=5)
> [ 7745.011529] wlan0: associated
> [ 8120.812482] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [ 8120.812642] [drm] capturing error event; look for more
> information in /debug/dri/0/i915_error_state
> [ 8122.328682] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [ 8122.328845] [drm:i915_reset] *ERROR* GPU hanging too fast,
> declaring wedged!
> [ 8122.328850] [drm:i915_reset] *ERROR* Failed to reset chip.
> 
> full log is here: http://fpaste.org/7xH8/
> 
> as for good kernels from what I remember 3.6.0-rc1. I can try a
> bisect on this once I get the time. or if anybody has a patch I can
> test.

Can you please rehand your machine, and then grab the i915_error_state
from debugfs? That contains the gpu hang dump we need to diagnose things.

And the bisect would obviously be awesome.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
@ 2012-10-23 17:06 Justin P. Mattock
  2012-10-23 17:40 ` Daniel Vetter
  0 siblings, 1 reply; 11+ messages in thread
From: Justin P. Mattock @ 2012-10-23 17:06 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-kernel, airlied

This is happening both with MAINLINE and NEXT.

basically system is running fine, then under load system becomes really 
sluggish and unresponsive. I was able to get dmesg of the error..:

[ 7745.007008] ath9k 0000:05:00.0 wlan0: disabling VHT as WMM/QoS is not 
supported by the AP
[ 7745.007736] wlan0: associate with 68:7f:74:b8:05:82 (try 1/3)
[ 7745.011456] wlan0: RX AssocResp from 68:7f:74:b8:05:82 (capab=0x411 
status=0 aid=5)
[ 7745.011529] wlan0: associated
[ 8120.812482] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer 
elapsed... GPU hung
[ 8120.812642] [drm] capturing error event; look for more information in 
/debug/dri/0/i915_error_state
[ 8122.328682] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer 
elapsed... GPU hung
[ 8122.328845] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring 
wedged!
[ 8122.328850] [drm:i915_reset] *ERROR* Failed to reset chip.

full log is here: http://fpaste.org/7xH8/

as for good kernels from what I remember 3.6.0-rc1. I can try a bisect 
on this once I get the time. or if anybody has a patch I can test.

Justin P. Mattock

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-10-27 19:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2b9aabdc-6f44-40cc-ad0b-4180101570c7@mail.android.htc.com>
2012-10-25  5:22 ` [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung Justin P. Mattock
2012-10-25  8:16   ` Daniel Vetter
2012-10-25  8:47     ` Chris Wilson
2012-10-26  4:43       ` Justin P. Mattock
2012-10-26  8:05         ` Daniel Vetter
2012-10-26 17:35           ` Justin P. Mattock
2012-10-26 20:57           ` Justin P. Mattock
2012-10-27 13:56             ` Daniel Vetter
2012-10-27 19:11               ` Justin P. Mattock
2012-10-23 17:06 Justin P. Mattock
2012-10-23 17:40 ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).