All of lore.kernel.org
 help / color / mirror / Atom feed
* On SNB: Hangcheck timer elapsed... GPU hung
@ 2011-02-14 13:18 Ted Phelps
       [not found] ` <m2d3mue1a2.fsf@firstfloor.org>
  2011-02-15  9:32 ` Jin, Gordon
  0 siblings, 2 replies; 11+ messages in thread
From: Ted Phelps @ 2011-02-14 13:18 UTC (permalink / raw)
  To: intel-gfx


Apologies if this is a known issue, but I haven't been able to convince
myself that someone is looking after it.  I've been seeing this issue
with Linux kernel 2.6.37, 2.6.38-rc4 and the most recent merge of Linus's
git tree and drm-intel-fixes.  I'm happy to provide more information,
apply patches, run tools, read code, as requested.

I have a Core i7 2600K CPU (yay me!) on a DH67CL motherboard, and I'm trying
to use the on-board graphics.  Typically this set-up works well for about
10-30 minutes.  Then I get an error of the form:

    [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung

after which, my cursor no longer changes shape, anti-aliased fonts misbehave
(mostly they just aren't't rendered) and 3-D applications no longer start up.
I haven't found any obvious trigger for this -- I don't need to be
interacting with the machine for it to happen -- but it happens pretty
reliably.

I've just now managed to catch this with the drm.debug parameter set to 7.
Things seem ok initially...

    ... various cmd/nr/dev/auth entries logged ...
    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0xc010645b, nr=0x5b, dev 0xe200, auth=1
    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1

Then there's this worrying-looking message:

    Feb 14 22:47:50 orpheus kernel: [drm:intel_prepare_page_flip], preparing flip with no unpin work?
    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], ret = fffffe00

The cmd/nr/dev/auth fields are then repeated for a couple of seconds:

    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1
    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], ret = fffffe00
    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1
    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], ret = fffffe00
    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1
    Feb 14 22:47:50 orpheus kernel: [drm:drm_ioctl], ret = fffffe00
    ... repeat many, many times ...
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], ret = fffffe00
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1

And finally, the hangcheck timer expires:

    Feb 14 22:47:52 orpheus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], ret = fffffe00
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1
    Feb 14 22:47:52 orpheus kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 300039 at 300025, next 300040)
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], ret = fffffff5
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1
    Feb 14 22:47:52 orpheus kernel: [drm:i915_error_work_func], resetting chip
    Feb 14 22:47:52 orpheus kernel: [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
    Feb 14 22:47:52 orpheus kernel: [drm:i915_reset] *ERROR* Failed to reset chip.
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], ret = fffffffb
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], pid=2180, cmd=0x4020645d, nr=0x5d, dev 0xe200, auth=1
    Feb 14 22:47:52 orpheus kernel: [drm:drm_ioctl], ret = fffffffb

After that, the system limps along as described above.

I haven't delved into trying to understand what this means; I'm hoping that
the above trace rings bells for someone.  I can provide a more complete log
if someone reckons that'd be useful.

Thanks for your time!

-Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
       [not found] ` <m2d3mue1a2.fsf@firstfloor.org>
@ 2011-02-15  1:46   ` Ted Phelps
  2011-02-16 13:13     ` Ivan Bulatovic
  0 siblings, 1 reply; 11+ messages in thread
From: Ted Phelps @ 2011-02-15  1:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: intel-gfx


Hi Andy,

Andi Kleen writes:
> Ted Phelps <phelps@gnusto.com> writes:
> 
> > Apologies if this is a known issue, but I haven't been able to convince
> > myself that someone is looking after it.  I've been seeing this issue
> > with Linux kernel 2.6.37, 2.6.38-rc4 and the most recent merge of Linus's
> > git tree and drm-intel-fixes.  I'm happy to provide more information,
> > apply patches, run tools, read code, as requested.
> 
> Do you use displayport dual-head? I had this problem with dual head.
> No such issue with only a single monitor.

I'm using only one head attached to the DVI connector.

Thanks,
-Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
  2011-02-14 13:18 On SNB: Hangcheck timer elapsed... GPU hung Ted Phelps
       [not found] ` <m2d3mue1a2.fsf@firstfloor.org>
@ 2011-02-15  9:32 ` Jin, Gordon
  1 sibling, 0 replies; 11+ messages in thread
From: Jin, Gordon @ 2011-02-15  9:32 UTC (permalink / raw)
  To: Ted Phelps, intel-gfx

> -----Original Message-----
> From: intel-gfx-bounces+gordon.jin=intel.com@lists.freedesktop.org
> [mailto:intel-gfx-bounces+gordon.jin=intel.com@lists.freedesktop.org] On
> Behalf Of Ted Phelps
> Sent: Monday, February 14, 2011 9:19 PM
> To: intel-gfx@lists.freedesktop.org
> Subject: [Intel-gfx] On SNB: Hangcheck timer elapsed... GPU hung
> 
> 
> Apologies if this is a known issue, but I haven't been able to convince
> myself that someone is looking after it.  I've been seeing this issue
> with Linux kernel 2.6.37, 2.6.38-rc4 and the most recent merge of Linus's
> git tree and drm-intel-fixes.  I'm happy to provide more information,
> apply patches, run tools, read code, as requested.
> 
> I have a Core i7 2600K CPU (yay me!) on a DH67CL motherboard, and I'm
> trying
> to use the on-board graphics.  Typically this set-up works well for about
> 10-30 minutes.  Then I get an error of the form:

Thanks for the report (probably the first report with 2600K). Could you file a bug so we can make sure this will be tracked?

Gordon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
  2011-02-15  1:46   ` Ted Phelps
@ 2011-02-16 13:13     ` Ivan Bulatovic
  2011-02-16 13:20       ` Andrew Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Ivan Bulatovic @ 2011-02-16 13:13 UTC (permalink / raw)
  To: Ted Phelps; +Cc: intel-gfx, Andi Kleen

On Tue, 2011-02-15 at 11:46 +1000, Ted Phelps wrote: 
> Hi Andy,
> 
> Andi Kleen writes:
> > Ted Phelps <phelps@gnusto.com> writes:
> > 
> > > Apologies if this is a known issue, but I haven't been able to convince
> > > myself that someone is looking after it.  I've been seeing this issue
> > > with Linux kernel 2.6.37, 2.6.38-rc4 and the most recent merge of Linus's
> > > git tree and drm-intel-fixes.  I'm happy to provide more information,
> > > apply patches, run tools, read code, as requested.
> > 
> > Do you use displayport dual-head? I had this problem with dual head.
> > No such issue with only a single monitor.
> 
> I'm using only one head attached to the DVI connector.
> 
> Thanks,
> -Ted
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

I have this problem also on 2.6.38-rc5 (and before, but I havent saw any
dmesg printk's regarding this problem until I've added a few more
debugging options in .config).

[   52.053633] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   52.054122] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -11 (awaiting 5459 at 5443, next 5460)
[   52.054338] [drm:init_ring_common] *ERROR* render ring initialization
failed ctl 0001f003 head 00000000 tail 00000000 start 00000000
[   58.195594] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   58.195606] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[   62.697289] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   62.697300] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[   67.198995] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   67.199005] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[   71.700722] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   71.700733] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[   76.202425] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   76.202436] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[   80.903635] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   80.903662] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -11 (awaiting 5477 at 5031, next 6205)
[   88.677879] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   88.678366] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -11 (awaiting 7544 at 7492, next 7545)
[   88.679558] [drm:init_ring_common] *ERROR* render ring initialization
failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[   95.099161] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   95.099173] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[   96.599722] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[   96.599736] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -11 (awaiting 9177 at 7538, next 9178)

i5 2400 here.

Also I have only one monitor connected via HDMI. 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
  2011-02-16 13:13     ` Ivan Bulatovic
@ 2011-02-16 13:20       ` Andrew Lutomirski
  2011-02-16 13:33         ` Ivan Bulatovic
  2011-02-21  5:12         ` Andrew Lutomirski
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Lutomirski @ 2011-02-16 13:20 UTC (permalink / raw)
  To: Ivan Bulatovic; +Cc: intel-gfx, Andi Kleen

On Wed, Feb 16, 2011 at 8:13 AM, Ivan Bulatovic <combuster@gmx.com> wrote:
> On Tue, 2011-02-15 at 11:46 +1000, Ted Phelps wrote:
>> Hi Andy,
>>
>> Andi Kleen writes:
>> > Ted Phelps <phelps@gnusto.com> writes:
>> >
>> > > Apologies if this is a known issue, but I haven't been able to convince
>> > > myself that someone is looking after it.  I've been seeing this issue
>> > > with Linux kernel 2.6.37, 2.6.38-rc4 and the most recent merge of Linus's
>> > > git tree and drm-intel-fixes.  I'm happy to provide more information,
>> > > apply patches, run tools, read code, as requested.
>> >
>> > Do you use displayport dual-head? I had this problem with dual head.
>> > No such issue with only a single monitor.
>>
>> I'm using only one head attached to the DVI connector.

I'm curious what userspace you're all running.  I'm using
xf86-video-intel from git on Feb 4 with 2.6.37 (completely unpatched!)
and my i7-2600 is quite stable.  The only problem I have is that
compiz has hung twice since I got the machine.  In both cases, killall
-9 compiz rescued the system.

Fedora 14's xf86-video-intel didn't work so well (compiz dropped
enough frames that the effects weren't really visible).

I'm single head on DisplayPort.

(Even firefox 4 webgl works nicely.)

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
  2011-02-16 13:20       ` Andrew Lutomirski
@ 2011-02-16 13:33         ` Ivan Bulatovic
  2011-02-16 13:45           ` Andrew Lutomirski
  2011-02-21  5:12         ` Andrew Lutomirski
  1 sibling, 1 reply; 11+ messages in thread
From: Ivan Bulatovic @ 2011-02-16 13:33 UTC (permalink / raw)
  To: Andrew Lutomirski; +Cc: Andi, intel-gfx, Kleen

On Wed, 2011-02-16 at 08:20 -0500, Andrew Lutomirski wrote:
> I'm curious what userspace you're all running.  I'm using
> xf86-video-intel from git on Feb 4 with 2.6.37 (completely unpatched!)
> and my i7-2600 is quite stable.  The only problem I have is that
> compiz has hung twice since I got the machine.  In both cases, killall
> -9 compiz rescued the system.
> 
> Fedora 14's xf86-video-intel didn't work so well (compiz dropped
> enough frames that the effects weren't really visible).
> 
> I'm single head on DisplayPort.
> 
> (Even firefox 4 webgl works nicely.)
> 
> --Andy

xf86-video-intel 2.14
OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Desktop GEM
20100330 DEVELOPMENT 
OpenGL version string: 2.1 Mesa 7.10.1-devel
X.Org X Server 1.9.4
Release Date: 2011-02-04
Current Operating System: Linux silverstone 2.6.38-rc5-RC #1 SMP PREEMPT
Wed Feb 16 13:32:11 CET 2011 x86_64

ArchLinux packages from extra repository, I still have to compile X and
Mesa from git for testing purposes.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
  2011-02-16 13:33         ` Ivan Bulatovic
@ 2011-02-16 13:45           ` Andrew Lutomirski
  2011-02-16 14:02             ` Ivan Bulatovic
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Lutomirski @ 2011-02-16 13:45 UTC (permalink / raw)
  To: Ivan Bulatovic; +Cc: intel-gfx, Andi Kleen

On Wed, Feb 16, 2011 at 8:33 AM, Ivan Bulatovic <combuster@gmx.com> wrote:
> On Wed, 2011-02-16 at 08:20 -0500, Andrew Lutomirski wrote:
>> I'm curious what userspace you're all running.  I'm using
>> xf86-video-intel from git on Feb 4 with 2.6.37 (completely unpatched!)
>> and my i7-2600 is quite stable.  The only problem I have is that
>> compiz has hung twice since I got the machine.  In both cases, killall
>> -9 compiz rescued the system.
>>
>> Fedora 14's xf86-video-intel didn't work so well (compiz dropped
>> enough frames that the effects weren't really visible).
>>
>> I'm single head on DisplayPort.
>>
>> (Even firefox 4 webgl works nicely.)
>>
>> --Andy
>
> xf86-video-intel 2.14
> OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Desktop GEM
> 20100330 DEVELOPMENT
> OpenGL version string: 2.1 Mesa 7.10.1-devel
> X.Org X Server 1.9.4
> Release Date: 2011-02-04
> Current Operating System: Linux silverstone 2.6.38-rc5-RC #1 SMP PREEMPT
> Wed Feb 16 13:32:11 CET 2011 x86_64
>

You have a newer mesa than I do.  I'll leave debugging this to the
experts (and not update my mesa until then).

I'm running 7.10-devel.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
  2011-02-16 13:45           ` Andrew Lutomirski
@ 2011-02-16 14:02             ` Ivan Bulatovic
       [not found]               ` <19567.1297946336@orpheus.gnusto.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Ivan Bulatovic @ 2011-02-16 14:02 UTC (permalink / raw)
  To: Andrew Lutomirski; +Cc: Andi, intel-gfx, Kleen

On Wed, 2011-02-16 at 08:45 -0500, Andrew Lutomirski wrote:
> On Wed, Feb 16, 2011 at 8:33 AM, Ivan Bulatovic <combuster@gmx.com> wrote:
> > On Wed, 2011-02-16 at 08:20 -0500, Andrew Lutomirski wrote:
> >> I'm curious what userspace you're all running.  I'm using
> >> xf86-video-intel from git on Feb 4 with 2.6.37 (completely unpatched!)
> >> and my i7-2600 is quite stable.  The only problem I have is that
> >> compiz has hung twice since I got the machine.  In both cases, killall
> >> -9 compiz rescued the system.
> >>
> >> Fedora 14's xf86-video-intel didn't work so well (compiz dropped
> >> enough frames that the effects weren't really visible).
> >>
> >> I'm single head on DisplayPort.
> >>
> >> (Even firefox 4 webgl works nicely.)
> >>
> >> --Andy
> >
> > xf86-video-intel 2.14
> > OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Desktop GEM
> > 20100330 DEVELOPMENT
> > OpenGL version string: 2.1 Mesa 7.10.1-devel
> > X.Org X Server 1.9.4
> > Release Date: 2011-02-04
> > Current Operating System: Linux silverstone 2.6.38-rc5-RC #1 SMP PREEMPT
> > Wed Feb 16 13:32:11 CET 2011 x86_64
> >
> 
> You have a newer mesa than I do.  I'll leave debugging this to the
> experts (and not update my mesa until then).
> 
> I'm running 7.10-devel.
> 
> --Andy

I've done a little digging and maybe this could be related ?

https://patchwork.kernel.org/patch/296822/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
  2011-02-16 13:20       ` Andrew Lutomirski
  2011-02-16 13:33         ` Ivan Bulatovic
@ 2011-02-21  5:12         ` Andrew Lutomirski
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Lutomirski @ 2011-02-21  5:12 UTC (permalink / raw)
  To: Ivan Bulatovic; +Cc: intel-gfx, Andi Kleen

On Wed, Feb 16, 2011 at 8:20 AM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Wed, Feb 16, 2011 at 8:13 AM, Ivan Bulatovic <combuster@gmx.com> wrote:
>> On Tue, 2011-02-15 at 11:46 +1000, Ted Phelps wrote:
>>> Hi Andy,
>>>
>>> Andi Kleen writes:
>>> > Ted Phelps <phelps@gnusto.com> writes:
>>> >
>>> > > Apologies if this is a known issue, but I haven't been able to convince
>>> > > myself that someone is looking after it.  I've been seeing this issue
>>> > > with Linux kernel 2.6.37, 2.6.38-rc4 and the most recent merge of Linus's
>>> > > git tree and drm-intel-fixes.  I'm happy to provide more information,
>>> > > apply patches, run tools, read code, as requested.
>>> >
>>> > Do you use displayport dual-head? I had this problem with dual head.
>>> > No such issue with only a single monitor.
>>>
>>> I'm using only one head attached to the DVI connector.
>
> I'm curious what userspace you're all running.  I'm using
> xf86-video-intel from git on Feb 4 with 2.6.37 (completely unpatched!)
> and my i7-2600 is quite stable.  The only problem I have is that
> compiz has hung twice since I got the machine.  In both cases, killall
> -9 compiz rescued the system.
>
> Fedora 14's xf86-video-intel didn't work so well (compiz dropped
> enough frames that the effects weren't really visible).
>
> I'm single head on DisplayPort.
>
> (Even firefox 4 webgl works nicely.)

I spoke too soon.  I ran 2.6.38-rc5 (actually 6f576d5 from Linus'
tree) for a few hours, and X hung.  I could still move the cursor, but
the cursor icon didn't change and everything was frozen.  I could
still switch VTs, though.

Xorg.0.log said:

[ 26219.458] (EE) intel(0): failed to set cursor: Input/output error
[ 26219.488] (EE) intel(0): failed to set cursor: Input/output error
[ 26219.518] (EE) intel(0): failed to set cursor: Input/output error
[ 26219.523] (WW) intel(0): intel_uxa_prepare_access: bo map failed:
Input/output error
[ 26219.529] (EE) intel(0): failed to set cursor: Input/output error
[ 26219.587] (WW) intel(0): intel_uxa_prepare_access: bo map failed:
Input/output error
[ 26219.633] (WW) intel(0): intel_uxa_prepare_access: bo map failed:
Input/output error
[ 26219.633] (WW) intel(0): intel_uxa_prepare_access: bo map failed:
Input/output error

and the kernel said:

[25918.930325] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[25918.931803] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -11 (awaiting 2652998 at 2652995, next 2653000)
[25919.218729] compiz[1949]: segfault at 0 ip 00007fd5957f5ea6 sp
00007fff9d0394d0 error 6 in i965_dri.so[7fd595770000+372000]
abrt[2932]: saved core dump of pid 1949 (/usr/bin/compiz) to
/var/spool/abrt/ccpp-1298262996-1949.new/coredump (77180928 bytes)

lspci says:
00:02.0 VGA compatible controller [0300]: Intel Corporation Sandy
Bridge Integrated Graphics Controller [8086:0102] (rev 09)
so I don't think the other patch in this thread will do anything.

2.6.37 has been stable and the userspace is identical.

gdb on the core file gives garbage (the faulting address looks legit
but does not correspond to any module).  Curiously, "info shared"
doesn't show libdrm or any of the intel_drv stuff.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
       [not found]               ` <19567.1297946336@orpheus.gnusto.com>
@ 2011-02-22 17:26                 ` Ivan Bulatovic
  2011-02-22 18:23                   ` Ivan Bulatovic
  0 siblings, 1 reply; 11+ messages in thread
From: Ivan Bulatovic @ 2011-02-22 17:26 UTC (permalink / raw)
  To: Ted Phelps; +Cc: Kleen, Andi, intel-gfx

On Thu, 2011-02-17 at 22:38 +1000, Ted Phelps wrote:
> Ivan Bulatovic writes:
> > I've done a little digging and maybe this could be related ?
> > 
> > https://patchwork.kernel.org/patch/296822/
> 
> That looks promising, but I've tried explicitly disabling and enabling
> that patch (#define NEED_BLT_WORKAROUND(dev) 1/0) without any noticeable
> change in behavior.  But it is always the BLT ring that needs kicking.
> 
> Thanks,
> -Ted

Here is the i915_error_state

Time: 1298394907 s 500663 us
PCI ID: 0x0102
EIR: 0x00000000
PGTBL_ER: 0x00000000
ERROR: 0x00000000
Blitter command stream:
  ACTHD:    0x00000000
  IPEIR:    0x00000000
  IPEHR:    0x00000000
  INSTDONE: 0x00000000
  seqno:    0x0014494e
Video (BSD) command stream:
  ACTHD:    0x00000000
  IPEIR:    0x00000000
  IPEHR:    0x00000000
  INSTDONE: 0x00000000
  seqno:    0x00000000
Render command stream:
  ACTHD: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x00000000
  INSTDONE: 0x00000000
  INSTDONE1: 0x00000000
  INSTPS: 0x00000000
  INSTPM: 0x00000000
  seqno: 0x0014494c
  fence[0] = 104f03b00850001
  fence[1] = 356c03b0353d001
  fence[2] = 00000000
  fence[3] = fd970030fd88001
  fence[4] = 00000000
  fence[5] = fc1000f0f991001
  fence[6] = e2f200f0e073001
  fence[7] = 00000000
  fence[8] = 00000000
  fence[9] = 00000000
  fence[10] = 00000000
  fence[11] = 00000000
  fence[12] = 00000000
  fence[13] = 00000000
  fence[14] = 00000000
  fence[15] = 00000000
Active [52]:
  0fdf2000    16384 0048 0000 0014493d dirty purgeable render uncached
  0ff1d000    16384 0050 0000 0014493d dirty purgeable render uncached
  0fe06000    16384 0060 0000 0014493d dirty purgeable render uncached
  0fe26000     4096 0044 0000 0014493d dirty purgeable render uncached
  0fe27000     4096 0044 0000 0014493d dirty purgeable render uncached
  0fee6000     4096 0006 0000 0014493e dirty purgeable render uncached
  0f635000    16384 0048 0000 0014493f dirty purgeable blt uncached
  0fe28000     4096 0042 0000 0014493f dirty purgeable blt uncached
  0f5dc000    16384 0048 0000 00144941 dirty purgeable render uncached
  0ff21000    16384 0050 0000 00144941 dirty purgeable render uncached
  0fdf6000    16384 0060 0000 00144941 dirty purgeable render uncached
  0f5d8000    16384 0060 0000 00144941 dirty purgeable render uncached
  0fdde000     8192 0006 0000 00144942 X dirty purgeable render uncached
  0feca000    16384 0048 0000 00144943 dirty purgeable blt uncached
  0fed2000    16384 0048 0000 00144945 dirty purgeable render uncached
  0fee2000    16384 0050 0000 00144945 dirty purgeable render uncached
  0f5c8000    16384 0060 0000 00144945 dirty purgeable render uncached
  0fe4c000    16384 0048 0000 00144946 dirty purgeable blt uncached
  0fd64000    16384 0048 0000 00144948 dirty purgeable render uncached
  0f5ec000    16384 0050 0000 00144948 dirty purgeable render uncached
  0ffed000    16384 0060 0000 00144948 dirty purgeable render uncached
  0fec6000    16384 0060 0000 00144948 dirty purgeable render uncached
  02899000     4096 0011 0000 00144948 dirty render uncached
  0f64f000     4096 0044 0000 00144948 dirty purgeable render uncached
  0fe2a000     4096 0044 0000 00144948 dirty purgeable render uncached
  0fe29000     4096 0006 0000 00144949 dirty purgeable render uncached
  0ff4f000    16384 0048 0000 0014494a dirty purgeable blt uncached
  0f639000    16384 0048 0000 0014494c dirty purgeable render uncached
  0ff38000    16384 0050 0000 0014494c dirty purgeable render uncached
  0fe22000    16384 0060 0000 0014494c dirty purgeable render uncached
  0fe89000    16384 0048 0000 0014494d dirty purgeable blt uncached
  0fece000    16384 0048 0000 0014494f dirty purgeable render uncached
  0f78a000    16384 0050 0000 0014494f dirty purgeable render uncached
  02a98000     4096 0011 0000 0014494f render uncached
  02939000     4096 0011 0000 0014494f render uncached
  0288e000     4096 0011 0000 0014494f dirty render uncached
  0288f000    28672 0011 0000 0014494f render uncached
  02896000     4096 0011 0000 0014494f render uncached
  02897000     4096 0011 0000 0014494f render uncached
  02898000     4096 0005 0000 0014494f dirty render uncached
  0293a000     4096 0011 0000 0014494f dirty render uncached
  0f5c0000    16384 0060 0000 0014494f dirty purgeable render uncached
  0399d000     4096 0011 0000 0014494f dirty render uncached
  0fee7000    16384 0060 0000 0014494f dirty purgeable render uncached
  0399e000     4096 0011 0000 0014494f dirty render uncached
  0f5d4000    16384 0060 0000 0014494f dirty purgeable render uncached
  0359d000  4194304 0006 0000 0014494f X dirty render uncached
  0fda8000     4096 0006 0000 0014494f dirty render uncached
  06431000     4096 0044 0000 0014494f dirty render uncached
  0fe38000     4096 0044 0000 0014494f dirty render uncached
  0fe39000     4096 0006 0000 00144950 dirty purgeable render uncached
  12a16000  8388608 0002 0000 00144950 X dirty render uncached
Pinned [9]:
  00000000     4096 0001 0001 00000000 P snooped
  00001000   131072 0001 0001 00000000 P uncached
  00021000     4096 0001 0001 00000000 P snooped
  00022000   131072 0001 0001 00000000 P uncached
  00042000     4096 0001 0001 00000000 P snooped
  00043000   131072 0001 0001 00000000 P uncached
  00063000  8294400 0041 0000 00000000 P uncached
  0106c000    16384 0040 0040 00000000 P dirty uncached
  00850000  8388608 0002 0000 00000000 P X dirty uncached (fence: 0)
render ring --- gtt_offset = 0x0fece000
---------------------------------------
Pipe [0]:
  CONF: c0000000
  SRC: 077f0437
  HTOTAL: 0897077f
  HBLANK: 0897077f
  HSYNC: 080307d7
  VTOTAL: 04640437
  VBLANK: 04640437
  VSYNC: 0440043b
Plane [0]:
  CNTR: d8004400
  STRIDE: 00001e00
  SIZE: 00000000
  POS: 00000000
  ADDR: 00000000
  SURF: 00850000
  TILEOFF: 00000000
Cursor [0]:
  CNTR: 04000027
  POS: 00c50089
  BASE: 0106c000
Pipe [1]:
  CONF: 00000000
  SRC: 00000000
  HTOTAL: 00000000
  HBLANK: 00000000
  HSYNC: 00000000
  VTOTAL: 00000000
  VBLANK: 00000000
  VSYNC: 00000000
Plane [1]:
  CNTR: 00004000
  STRIDE: 00000000
  SIZE: 00000000
  POS: 00000000
  ADDR: 00000000
  SURF: 00000000
  TILEOFF: 00000000
Cursor [1]:
  CNTR: 00000000
  POS: 00000000
  BASE: 00000000

dmesg

[ 2505.904377] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 2505.906540] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -11 (awaiting 1329488 at 1329484, next 1329489)
[ 2505.907179] [drm:init_ring_common] *ERROR* render ring initialization
failed ctl 0001f003 head 00000000 tail 00000000 start 00000000
[ 2512.066276] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 2512.066288] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[ 2516.567981] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 2516.567992] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[ 2521.069682] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 2521.069693] [drm:kick_ring] *ERROR* Kicking stuck semaphore on blt
ring
[ 2525.834788] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 2525.834855] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -11 (awaiting 1329498 at 1326106, next 1329534)

I didn't want to send both logs as an compressed attachement as it
weighs 125KB and I don't know what's the policy on attachements here on
mailing list (I've cut down bunch of 0001ff44 :  0b240001 lines from the
i915_error_state). If you need those I can attach them.

This problem occurs with the latest 2.6.38-rc6 and with xorg stack from
git a week old.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: On SNB: Hangcheck timer elapsed... GPU hung
  2011-02-22 17:26                 ` Ivan Bulatovic
@ 2011-02-22 18:23                   ` Ivan Bulatovic
  0 siblings, 0 replies; 11+ messages in thread
From: Ivan Bulatovic @ 2011-02-22 18:23 UTC (permalink / raw)
  To: Ted Phelps; +Cc: intel-gfx, Kleen, Andi

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

On Tue, 2011-02-22 at 18:26 +0100, Ivan Bulatovic wrote:

> I didn't want to send both logs as an compressed attachement as it
> weighs 125KB and I don't know what's the policy on attachements here on
> mailing list (I've cut down bunch of 0001ff44 :  0b240001 lines from the
> i915_error_state). If you need those I can attach them.
> 
> This problem occurs with the latest 2.6.38-rc6 and with xorg stack from
> git a week old.

Here is intel_error_decode output (I took the liberty of sending tar.bz2
since its 1.5MB raw output).

[-- Attachment #2: error_decode.log.tar.bz2 --]
[-- Type: application/x-bzip-compressed-tar, Size: 99062 bytes --]

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-02-22 18:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-14 13:18 On SNB: Hangcheck timer elapsed... GPU hung Ted Phelps
     [not found] ` <m2d3mue1a2.fsf@firstfloor.org>
2011-02-15  1:46   ` Ted Phelps
2011-02-16 13:13     ` Ivan Bulatovic
2011-02-16 13:20       ` Andrew Lutomirski
2011-02-16 13:33         ` Ivan Bulatovic
2011-02-16 13:45           ` Andrew Lutomirski
2011-02-16 14:02             ` Ivan Bulatovic
     [not found]               ` <19567.1297946336@orpheus.gnusto.com>
2011-02-22 17:26                 ` Ivan Bulatovic
2011-02-22 18:23                   ` Ivan Bulatovic
2011-02-21  5:12         ` Andrew Lutomirski
2011-02-15  9:32 ` Jin, Gordon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.