intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* 2.6.38-rc8 regressions
@ 2011-03-14  9:08 Jan Niehusmann
  2011-03-14  9:53 ` Chris Wilson
  2011-04-10 12:25 ` Jan Niehusmann
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Niehusmann @ 2011-03-14  9:08 UTC (permalink / raw)
  To: intel-gfx

Hello,

just to provide some testing feedback. I didn't have time (and probably
not even the necessary skills) to further diagnose these issues. But
as I don't remember seeing these problems with 2.6.37, maybe the
observations are interesting to you:

With 2.6.38-rc8 I see the following graphics related regressions
(relative to 2.6.37) on a Thinkpad X61s with "Intel Corporation Mobile
GM965/GL960 Integrated Graphics Controller" (PCI ID 8086:2a02).
Userspace is debian/squeeze.

1) Every now and then, terminal windows (urxvt) do not properly update
their contents. After issueing a command like 'ls', which writes
several lines of text at once, some lines are completely missing. It's
not garbled glyphs, but full lines of text completely missing.

Interestingly, sometimes the correct contents of the full window become visible
for a split second. So they seem to be 'somewhere' accessible to the GPU, just
not shown on the screen.

When a single character in an affected line gets updated (e.g. by marking
it with the cursor - or even by an update in a different console window
next to the one affected) the correct contents of the full line become
and stay visible.

When this problem occurs, it does so reproducibly: About every third
command writing to the terminal shows the behaviour. In that situation,
just closing and reopening the lid solves the problem: Console windows
work as expected again, and AFAICT the problem doesn't reoccur until the
next reboot or suspend/resume cycle.

2) I just had a full hang of the GPU (black screen) after opening a web
page containing a video. kernel.log contains the following messages:

Mar 14 09:09:37 x61s kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 14 09:09:37 x61s kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 10172 at 10171, next 10173)
Mar 14 09:09:37 x61s kernel: [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
Mar 14 09:09:44 x61s kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 14 09:09:44 x61s kernel: [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
Mar 14 09:09:44 x61s kernel: [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
Mar 14 09:09:44 x61s kernel: [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
Mar 14 09:09:44 x61s kernel: [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
Mar 14 09:09:44 x61s kernel: [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
Mar 14 09:09:44 x61s kernel: [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[...]
Mar 14 09:09:45 x61s kernel: [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
Mar 14 09:09:45 x61s kernel: [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
Mar 14 09:09:45 x61s kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 14 09:09:45 x61s kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 10180 at 10171, next 10181)
Mar 14 09:09:45 x61s kernel: [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Mar 14 09:09:45 x61s kernel: [drm:i915_reset] *ERROR* Failed to reset chip.

The same (probably) happened ten days ago with 2.6.38-rc6:

Mar  4 10:06:22 x61s kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar  4 10:06:22 x61s kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 50785 at 50784, next 50786)
Mar  4 10:06:22 x61s kernel: [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
Mar  4 10:06:28 x61s kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar  4 10:06:28 x61s kernel: [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
Mar  4 10:06:31 x61s kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar  4 10:06:31 x61s kernel: [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Mar  4 10:06:31 x61s kernel: [drm:i915_reset] *ERROR* Failed to reset chip.

So it doesn't happen particularly often, but still too often to just ignore it.

Regards,
Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.38-rc8 regressions
  2011-03-14  9:08 2.6.38-rc8 regressions Jan Niehusmann
@ 2011-03-14  9:53 ` Chris Wilson
  2011-03-16 17:46   ` Jan Niehusmann
  2011-04-10 12:25 ` Jan Niehusmann
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2011-03-14  9:53 UTC (permalink / raw)
  To: Jan Niehusmann, intel-gfx

On Mon, 14 Mar 2011 10:08:12 +0100, Jan Niehusmann <jan@gondor.com> wrote:
> Hello,
> 
> just to provide some testing feedback. I didn't have time (and probably
> not even the necessary skills) to further diagnose these issues. But
> as I don't remember seeing these problems with 2.6.37, maybe the
> observations are interesting to you:
> 
> With 2.6.38-rc8 I see the following graphics related regressions
> (relative to 2.6.37) on a Thinkpad X61s with "Intel Corporation Mobile
> GM965/GL960 Integrated Graphics Controller" (PCI ID 8086:2a02).
> Userspace is debian/squeeze.
> 
> 1) Every now and then, terminal windows (urxvt) do not properly update
> their contents. After issueing a command like 'ls', which writes
> several lines of text at once, some lines are completely missing. It's
> not garbled glyphs, but full lines of text completely missing.
> 
> Interestingly, sometimes the correct contents of the full window become visible
> for a split second. So they seem to be 'somewhere' accessible to the GPU, just
> not shown on the screen.
> 
> When a single character in an affected line gets updated (e.g. by marking
> it with the cursor - or even by an update in a different console window
> next to the one affected) the correct contents of the full line become
> and stay visible.
> 
> When this problem occurs, it does so reproducibly: About every third
> command writing to the terminal shows the behaviour. In that situation,
> just closing and reopening the lid solves the problem: Console windows
> work as expected again, and AFAICT the problem doesn't reoccur until the
> next reboot or suspend/resume cycle.

There was a bug in the DDX where we missing a flush (for precisely this
style of bug): 

commit 4a186a612376bdd6f86c026e8b8b442108868a0a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 7 16:56:57 2010 +0000

    Always flush the batch before blocking for new X requests
    
    This should prevent any lag when waiting upon user input, for example
    whilst logging in with gdm.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

> 2) I just had a full hang of the GPU (black screen) after opening a web
> page containing a video. kernel.log contains the following messages:

The next kernel also includes the hint to look in
/sys/kernel/debug/dri/0/i915_error_state, or at least to provide that file
for us for debugging GPU hangs. Outside of initialisation, suspend and
resume, and modesetting the cause of a GPU hang is usually an invalid
batch buffer submitted by userspace. The i915_error_state should capture
that erroneous batch buffer.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.38-rc8 regressions
  2011-03-14  9:53 ` Chris Wilson
@ 2011-03-16 17:46   ` Jan Niehusmann
  2011-03-16 18:34     ` Chris Wilson
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Niehusmann @ 2011-03-16 17:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Hi Chris,

On Mon, Mar 14, 2011 at 09:53:36AM +0000, Chris Wilson wrote:
> On Mon, 14 Mar 2011 10:08:12 +0100, Jan Niehusmann <jan@gondor.com> wrote:

> > 1) Every now and then, terminal windows (urxvt) do not properly update
> > their contents. After issueing a command like 'ls', which writes
> > several lines of text at once, some lines are completely missing. It's
> > not garbled glyphs, but full lines of text completely missing.
[...]
> There was a bug in the DDX where we missing a flush (for precisely this
> style of bug): 
> 
> commit 4a186a612376bdd6f86c026e8b8b442108868a0a
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Dec 7 16:56:57 2010 +0000
> 
>     Always flush the batch before blocking for new X requests
>     
>     This should prevent any lag when waiting upon user input, for example
>     whilst logging in with gdm.
>     
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 

I cherry-picked this patch into the source code of
2.13.0, as provided by the debian package
xserver-xorg-video-intel version 2.13.0-5.

With that patch applied, I still observed the described behaviour.
Additionally, some java application had display update problems. (But
java generally has some problems because I'm using a non-reparenting
window manager, 'awesome', which java doesn't like).

What do you think, would it be worthwhile to try a more recent version
of xf86-video-intel?

Regards,
Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.38-rc8 regressions
  2011-03-16 17:46   ` Jan Niehusmann
@ 2011-03-16 18:34     ` Chris Wilson
  2011-03-16 21:19       ` Jan Niehusmann
  2011-03-20 15:48       ` Jan Niehusmann
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Wilson @ 2011-03-16 18:34 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: intel-gfx

On Wed, 16 Mar 2011 18:46:55 +0100, Jan Niehusmann <jan@gondor.com> wrote:
> With that patch applied, I still observed the described behaviour.
> Additionally, some java application had display update problems. (But
> java generally has some problems because I'm using a non-reparenting
> window manager, 'awesome', which java doesn't like).
> 
> What do you think, would it be worthwhile to try a more recent version
> of xf86-video-intel?

Ok, that's more worrying. That bug certainly matched what you describe,
and I don't offhand know of another commit since 2.13 that is relevant. It
would be good to double-check first though.

Have you harvested any i915_error_states?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.38-rc8 regressions
  2011-03-16 18:34     ` Chris Wilson
@ 2011-03-16 21:19       ` Jan Niehusmann
  2011-03-20 15:48       ` Jan Niehusmann
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Niehusmann @ 2011-03-16 21:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Mar 16, 2011 at 06:34:57PM +0000, Chris Wilson wrote:
> On Wed, 16 Mar 2011 18:46:55 +0100, Jan Niehusmann <jan@gondor.com> wrote:
> > With that patch applied, I still observed the described behaviour.
> > Additionally, some java application had display update problems. (But
> > java generally has some problems because I'm using a non-reparenting
> > window manager, 'awesome', which java doesn't like).
> > 
> > What do you think, would it be worthwhile to try a more recent version
> > of xf86-video-intel?
> 
> Ok, that's more worrying. That bug certainly matched what you describe,
> and I don't offhand know of another commit since 2.13 that is relevant. It
> would be good to double-check first though.

Same with 2.14.901 from debian/experimental.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.38-rc8 regressions
  2011-03-16 18:34     ` Chris Wilson
  2011-03-16 21:19       ` Jan Niehusmann
@ 2011-03-20 15:48       ` Jan Niehusmann
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Niehusmann @ 2011-03-20 15:48 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Mar 16, 2011 at 06:34:57PM +0000, Chris Wilson wrote:
> On Wed, 16 Mar 2011 18:46:55 +0100, Jan Niehusmann <jan@gondor.com> wrote:
> > With that patch applied, I still observed the described behaviour.
> > Additionally, some java application had display update problems. (But
> > java generally has some problems because I'm using a non-reparenting
> > window manager, 'awesome', which java doesn't like).
> > 
> > What do you think, would it be worthwhile to try a more recent version
> > of xf86-video-intel?
> 
> Ok, that's more worrying. That bug certainly matched what you describe,
> and I don't offhand know of another commit since 2.13 that is relevant. It
> would be good to double-check first though.

I bisected the display update issues to start between 0b0b053a and 
c64f7ba5. fe669bf8 doesn't work at all (black screen after X startup), 
and I didn't try 1b6064d7 yet:

bad:		c64f7ba agp/intel: Remove confusion of stolen entries not stolen memory
not tried:	1b6064d agp/intel: Remove the artificial cap on stolen size
crashes:	fe669bf drm/i915: Compute physical addresses from base of stolen memory
good:		0b0b053 drm/i915/panel: Restore saved value of BLC_PWM_CTL

Another observation I made is that everything is fine while I am using a
dual-screen setup. As dual-screen disables frame buffer compression, I
tried to i915.powersave=0, and indeed, with this parameter I was unable
to reproduce the issues, as well.

Is there anything else I could try? Unfortunately, because of
conflicting changes, it's not easily possible to just revert these
commits from 2.6.38.

Regards,
Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.38-rc8 regressions
  2011-03-14  9:08 2.6.38-rc8 regressions Jan Niehusmann
  2011-03-14  9:53 ` Chris Wilson
@ 2011-04-10 12:25 ` Jan Niehusmann
  2011-04-11 13:48   ` Jan Niehusmann
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Niehusmann @ 2011-04-10 12:25 UTC (permalink / raw)
  To: intel-gfx

On Mon, Mar 14, 2011 at 10:08:12AM +0100, Jan Niehusmann wrote:
> 1) Every now and then, terminal windows (urxvt) do not properly update
> their contents. After issueing a command like 'ls', which writes
> several lines of text at once, some lines are completely missing. It's
> not garbled glyphs, but full lines of text completely missing.

Just a quick update on the issue I reported a while ago: While I still
see this problem with 2.6.38.2, it seems like it's fixed with current
2.6.39 development tree at commit 40bd8ea1.

Out of curiosity, I reverted 36d527de ("Restore missing command flush
before interrupt on BLT ring") from that, and was able to reproduce the
bug, again.

So it looks like this is fixed now.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.38-rc8 regressions
  2011-04-10 12:25 ` Jan Niehusmann
@ 2011-04-11 13:48   ` Jan Niehusmann
  2011-07-05 14:20     ` Moritz Heidkamp
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Niehusmann @ 2011-04-11 13:48 UTC (permalink / raw)
  To: intel-gfx

On Sun, Apr 10, 2011 at 02:25:36PM +0200, Jan Niehusmann wrote:
> Just a quick update on the issue I reported a while ago: While I still
> see this problem with 2.6.38.2, it seems like it's fixed with current
> 2.6.39 development tree at commit 40bd8ea1.
[...]
> So it looks like this is fixed now.

Sorry, I have to take this back: I just observed the screen corruption
with 2.6.39-rc2. So it may just be more difficult to trigger. (Or it's
just coincidence that I wasn't able to reproduce it at first.)

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.38-rc8 regressions
  2011-04-11 13:48   ` Jan Niehusmann
@ 2011-07-05 14:20     ` Moritz Heidkamp
  0 siblings, 0 replies; 9+ messages in thread
From: Moritz Heidkamp @ 2011-07-05 14:20 UTC (permalink / raw)
  To: intel-gfx

Hello,

Jan Niehusmann <jan <at> gondor.com> writes:
> Sorry, I have to take this back: I just observed the screen corruption
> with 2.6.39-rc2. So it may just be more difficult to trigger. (Or it's
> just coincidence that I wasn't able to reproduce it at first.)

I am also still seeing the exact same problem you describe with
2.6.39 (Arch Linux) and the xorg intel driver built from git
master HEAD today (98f2e38), both with and without SNA. See this
thread on the Arch Linux Forums for more information:
https://bbs.archlinux.org/viewtopic.php?id=117202

Moritz

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-07-05 14:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-14  9:08 2.6.38-rc8 regressions Jan Niehusmann
2011-03-14  9:53 ` Chris Wilson
2011-03-16 17:46   ` Jan Niehusmann
2011-03-16 18:34     ` Chris Wilson
2011-03-16 21:19       ` Jan Niehusmann
2011-03-20 15:48       ` Jan Niehusmann
2011-04-10 12:25 ` Jan Niehusmann
2011-04-11 13:48   ` Jan Niehusmann
2011-07-05 14:20     ` Moritz Heidkamp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).