All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible i915 regression with 4.4-rc
@ 2015-12-03 20:00 Takashi Iwai
  2015-12-03 20:33 ` Ville Syrjälä
  0 siblings, 1 reply; 17+ messages in thread
From: Takashi Iwai @ 2015-12-03 20:00 UTC (permalink / raw)
  To: intel-gfx

Hi,

I've experienced a few graphics issues recently, and I tend to believe
that it has happened since 4.4-rc.  Namely, after some long time usage
on my HSW laptop (two or three days), the mouse cursor vanished
suddenly.  It kept pointing but just became invisible.  Also, after
some S3 cycles, some glyphs on a console or on Firefox became
invisible, too.  The windows and graphics were shown well, and X core
fonts were still shown properly, too.  Switching to VT1 and back
didn't change the situation.

There were no obvious errors or GPU hang in both kernel and X logs
when these happened.  Also, the issue was gone once when I restarted
X even without reboot.  (But then it seems re-triggering the bug more
shortly.)

I've had this a few times and never had this before 4.4-rc1.  It's
intermittent and very hard to reproduce.  Possibly triggered by a high
memory usage or such.

Also today I experienced a similar mouse pointer vanish on an IVY
desktop.  Though, I cannot judge whether this is the same cause.

Since I'm using openSUSE Tumbleweed that is a rolling release, I can't
blame only kernel.  But, my gut feeling tells it's probably a kernel
regression, something like a cache issue.  Alas, it's impossible to
bisect due to the difficulty of bug reproduction.

Can anyone give some hint for debugging once when this problem happens
again?


thanks,

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 20:00 Possible i915 regression with 4.4-rc Takashi Iwai
@ 2015-12-03 20:33 ` Ville Syrjälä
  2015-12-03 21:08   ` Takashi Iwai
  2015-12-04  8:49   ` Jani Nikula
  0 siblings, 2 replies; 17+ messages in thread
From: Ville Syrjälä @ 2015-12-03 20:33 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: intel-gfx

On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> Hi,
> 
> I've experienced a few graphics issues recently, and I tend to believe
> that it has happened since 4.4-rc.  Namely, after some long time usage
> on my HSW laptop (two or three days), the mouse cursor vanished
> suddenly.  It kept pointing but just became invisible.  Also, after
> some S3 cycles, some glyphs on a console or on Firefox became
> invisible, too.  The windows and graphics were shown well, and X core
> fonts were still shown properly, too.  Switching to VT1 and back
> didn't change the situation.

I think I have a fix for this *very* annoying problem. I'v been cursing
on irc for weeks about it, until I finally got off my arse and debugged
it.

I pushed out my my cursor branch:
git://github.com/vsyrjala/linux.git disappearing_cursor_fix

It has lots of other junk too, but it should be just there two that fix it:
59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")

Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
send them out yet. Maybe tomorrow...

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 20:33 ` Ville Syrjälä
@ 2015-12-03 21:08   ` Takashi Iwai
  2015-12-03 21:25     ` Ville Syrjälä
  2015-12-04  8:49   ` Jani Nikula
  1 sibling, 1 reply; 17+ messages in thread
From: Takashi Iwai @ 2015-12-03 21:08 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Thu, 03 Dec 2015 21:33:29 +0100,
Ville Syrjälä wrote:
> 
> On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> > Hi,
> > 
> > I've experienced a few graphics issues recently, and I tend to believe
> > that it has happened since 4.4-rc.  Namely, after some long time usage
> > on my HSW laptop (two or three days), the mouse cursor vanished
> > suddenly.  It kept pointing but just became invisible.  Also, after
> > some S3 cycles, some glyphs on a console or on Firefox became
> > invisible, too.  The windows and graphics were shown well, and X core
> > fonts were still shown properly, too.  Switching to VT1 and back
> > didn't change the situation.
> 
> I think I have a fix for this *very* annoying problem. I'v been cursing
> on irc for weeks about it, until I finally got off my arse and debugged
> it.
> 
> I pushed out my my cursor branch:
> git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> 
> It has lots of other junk too, but it should be just there two that fix it:
> 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
>
> Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> send them out yet. Maybe tomorrow...

Great, I'll try them out now.  But these look like fixing only the
cursor issue.  Would they cover also the missing glyphs I experienced?


Thanks!

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 21:08   ` Takashi Iwai
@ 2015-12-03 21:25     ` Ville Syrjälä
  2015-12-03 21:35       ` Chris Wilson
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ville Syrjälä @ 2015-12-03 21:25 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: intel-gfx

On Thu, Dec 03, 2015 at 10:08:05PM +0100, Takashi Iwai wrote:
> On Thu, 03 Dec 2015 21:33:29 +0100,
> Ville Syrjälä wrote:
> > 
> > On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> > > Hi,
> > > 
> > > I've experienced a few graphics issues recently, and I tend to believe
> > > that it has happened since 4.4-rc.  Namely, after some long time usage
> > > on my HSW laptop (two or three days), the mouse cursor vanished
> > > suddenly.  It kept pointing but just became invisible.  Also, after
> > > some S3 cycles, some glyphs on a console or on Firefox became
> > > invisible, too.  The windows and graphics were shown well, and X core
> > > fonts were still shown properly, too.  Switching to VT1 and back
> > > didn't change the situation.
> > 
> > I think I have a fix for this *very* annoying problem. I'v been cursing
> > on irc for weeks about it, until I finally got off my arse and debugged
> > it.
> > 
> > I pushed out my my cursor branch:
> > git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> > 
> > It has lots of other junk too, but it should be just there two that fix it:
> > 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> > 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> >
> > Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> > send them out yet. Maybe tomorrow...
> 
> Great, I'll try them out now.  But these look like fixing only the
> cursor issue.  Would they cover also the missing glyphs I experienced?

No. That's either userland, or some object/context/etc. getting corrupted
I think. I've had something like that occasionally too after some number of
suspend cycles, and usually fbcon is dead at that point too (just get a
black screen on VT switch).

I think we had some bug with not properly pinning the fbdev buffer which
could explain things getting corrupted. Chris had a fix I think, but I'm
not sure if that went anywhere. Chris?

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 21:25     ` Ville Syrjälä
@ 2015-12-03 21:35       ` Chris Wilson
  2015-12-04  8:44         ` Jani Nikula
  2015-12-04 12:00         ` Dave Gordon
  2015-12-03 21:38       ` Lukas Wunner
  2015-12-08  7:03       ` Takashi Iwai
  2 siblings, 2 replies; 17+ messages in thread
From: Chris Wilson @ 2015-12-03 21:35 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Takashi Iwai, intel-gfx

On Thu, Dec 03, 2015 at 11:25:48PM +0200, Ville Syrjälä wrote:
> On Thu, Dec 03, 2015 at 10:08:05PM +0100, Takashi Iwai wrote:
> > On Thu, 03 Dec 2015 21:33:29 +0100,
> > Ville Syrjälä wrote:
> > > 
> > > On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> > > > Hi,
> > > > 
> > > > I've experienced a few graphics issues recently, and I tend to believe
> > > > that it has happened since 4.4-rc.  Namely, after some long time usage
> > > > on my HSW laptop (two or three days), the mouse cursor vanished
> > > > suddenly.  It kept pointing but just became invisible.  Also, after
> > > > some S3 cycles, some glyphs on a console or on Firefox became
> > > > invisible, too.  The windows and graphics were shown well, and X core
> > > > fonts were still shown properly, too.  Switching to VT1 and back
> > > > didn't change the situation.
> > > 
> > > I think I have a fix for this *very* annoying problem. I'v been cursing
> > > on irc for weeks about it, until I finally got off my arse and debugged
> > > it.
> > > 
> > > I pushed out my my cursor branch:
> > > git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> > > 
> > > It has lots of other junk too, but it should be just there two that fix it:
> > > 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> > > 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> > >
> > > Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> > > send them out yet. Maybe tomorrow...
> > 
> > Great, I'll try them out now.  But these look like fixing only the
> > cursor issue.  Would they cover also the missing glyphs I experienced?
> 
> No. That's either userland, or some object/context/etc. getting corrupted
> I think. I've had something like that occasionally too after some number of
> suspend cycles, and usually fbcon is dead at that point too (just get a
> black screen on VT switch).
> 
> I think we had some bug with not properly pinning the fbdev buffer which
> could explain things getting corrupted. Chris had a fix I think, but I'm
> not sure if that went anywhere. Chris?

Jani keeps refusing it :). But it's not the issue with the missing
glyphs. The missing glyphs is the kernel dropping rendering, or that
rendering not being flushed out to memory across the suspend as it is just
texture corruption. The glyph cache only slowly changes, so corruption
tends to be visible for some time.  An alternative explanation would be
that GPU state is not restored upon resume that only (visibly) effects
glyph rendering (and portions thereof). Lost rendering is a simpler
explanation.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 21:25     ` Ville Syrjälä
  2015-12-03 21:35       ` Chris Wilson
@ 2015-12-03 21:38       ` Lukas Wunner
  2015-12-08  7:03       ` Takashi Iwai
  2 siblings, 0 replies; 17+ messages in thread
From: Lukas Wunner @ 2015-12-03 21:38 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Takashi Iwai, intel-gfx

Hi,

On Thu, Dec 03, 2015 at 11:25:48PM +0200, Ville Syrjälä wrote:
> On Thu, Dec 03, 2015 at 10:08:05PM +0100, Takashi Iwai wrote:
> > On Thu, 03 Dec 2015 21:33:29 +0100,
> > Ville Syrjälä wrote:
> > > 
> > > On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> > > > Hi,
> > > > 
> > > > I've experienced a few graphics issues recently, and I tend to believe
> > > > that it has happened since 4.4-rc.  Namely, after some long time usage
> > > > on my HSW laptop (two or three days), the mouse cursor vanished
> > > > suddenly.  It kept pointing but just became invisible.  Also, after
> > > > some S3 cycles, some glyphs on a console or on Firefox became
> > > > invisible, too.  The windows and graphics were shown well, and X core
> > > > fonts were still shown properly, too.  Switching to VT1 and back
> > > > didn't change the situation.
> > > 
> > > I think I have a fix for this *very* annoying problem. I'v been cursing
> > > on irc for weeks about it, until I finally got off my arse and debugged
> > > it.
> > > 
> > > I pushed out my my cursor branch:
> > > git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> > > 
> > > It has lots of other junk too, but it should be just there two that fix it:
> > > 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> > > 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> > >
> > > Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> > > send them out yet. Maybe tomorrow...
> > 
> > Great, I'll try them out now.  But these look like fixing only the
> > cursor issue.  Would they cover also the missing glyphs I experienced?
> 
> No. That's either userland, or some object/context/etc. getting corrupted
> I think. I've had something like that occasionally too after some number of
> suspend cycles, and usually fbcon is dead at that point too (just get a
> black screen on VT switch).
> 
> I think we had some bug with not properly pinning the fbdev buffer which
> could explain things getting corrupted. Chris had a fix I think, but I'm
> not sure if that went anywhere. Chris?

Last version was http://patchwork.freedesktop.org/patch/65552/
but unfortunately it had issues with error path handling
(see my comment on patchwork).

Kind regards,

Lukas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 21:35       ` Chris Wilson
@ 2015-12-04  8:44         ` Jani Nikula
  2015-12-04 15:57           ` Daniel Vetter
  2015-12-04 12:00         ` Dave Gordon
  1 sibling, 1 reply; 17+ messages in thread
From: Jani Nikula @ 2015-12-04  8:44 UTC (permalink / raw)
  To: Chris Wilson, Ville Syrjälä; +Cc: Takashi Iwai, intel-gfx

On Thu, 03 Dec 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Dec 03, 2015 at 11:25:48PM +0200, Ville Syrjälä wrote:
>> I think we had some bug with not properly pinning the fbdev buffer which
>> could explain things getting corrupted. Chris had a fix I think, but I'm
>> not sure if that went anywhere. Chris?
>
> Jani keeps refusing it :)

Which one? Was I being a boring pedant, requiring it gets review or
compiles or something...? :p

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 20:33 ` Ville Syrjälä
  2015-12-03 21:08   ` Takashi Iwai
@ 2015-12-04  8:49   ` Jani Nikula
  2015-12-04  9:40     ` Ville Syrjälä
  1 sibling, 1 reply; 17+ messages in thread
From: Jani Nikula @ 2015-12-04  8:49 UTC (permalink / raw)
  To: Ville Syrjälä, Takashi Iwai; +Cc: intel-gfx

On Thu, 03 Dec 2015, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
>> Hi,
>> 
>> I've experienced a few graphics issues recently, and I tend to believe
>> that it has happened since 4.4-rc.  Namely, after some long time usage
>> on my HSW laptop (two or three days), the mouse cursor vanished
>> suddenly.  It kept pointing but just became invisible.  Also, after
>> some S3 cycles, some glyphs on a console or on Firefox became
>> invisible, too.  The windows and graphics were shown well, and X core
>> fonts were still shown properly, too.  Switching to VT1 and back
>> didn't change the situation.
>
> I think I have a fix for this *very* annoying problem. I'v been cursing
> on irc for weeks about it, until I finally got off my arse and debugged
> it.
>
> I pushed out my my cursor branch:
> git://github.com/vsyrjala/linux.git disappearing_cursor_fix
>
> It has lots of other junk too, but it should be just there two that fix it:
> 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
>
> Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> send them out yet. Maybe tomorrow...

So I've hit this too, albeit very rarely, on a Haswell running Debian
stable with the stock v3.16 kernel. Haven't seen it on any other
machine. It's really too rare to even debug or verify a fix. Is it
possible we just happened to make an old bug occur more frequently now?

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-04  8:49   ` Jani Nikula
@ 2015-12-04  9:40     ` Ville Syrjälä
  2015-12-04 16:02       ` Daniel Vetter
  0 siblings, 1 reply; 17+ messages in thread
From: Ville Syrjälä @ 2015-12-04  9:40 UTC (permalink / raw)
  To: Jani Nikula; +Cc: Takashi Iwai, intel-gfx

On Fri, Dec 04, 2015 at 10:49:48AM +0200, Jani Nikula wrote:
> On Thu, 03 Dec 2015, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> >> Hi,
> >> 
> >> I've experienced a few graphics issues recently, and I tend to believe
> >> that it has happened since 4.4-rc.  Namely, after some long time usage
> >> on my HSW laptop (two or three days), the mouse cursor vanished
> >> suddenly.  It kept pointing but just became invisible.  Also, after
> >> some S3 cycles, some glyphs on a console or on Firefox became
> >> invisible, too.  The windows and graphics were shown well, and X core
> >> fonts were still shown properly, too.  Switching to VT1 and back
> >> didn't change the situation.
> >
> > I think I have a fix for this *very* annoying problem. I'v been cursing
> > on irc for weeks about it, until I finally got off my arse and debugged
> > it.
> >
> > I pushed out my my cursor branch:
> > git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> >
> > It has lots of other junk too, but it should be just there two that fix it:
> > 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> > 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> >
> > Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> > send them out yet. Maybe tomorrow...
> 
> So I've hit this too, albeit very rarely, on a Haswell running Debian
> stable with the stock v3.16 kernel. Haven't seen it on any other
> machine. It's really too rare to even debug or verify a fix. Is it
> possible we just happened to make an old bug occur more frequently now?

The potential for it has definitely been there for a long time.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 21:35       ` Chris Wilson
  2015-12-04  8:44         ` Jani Nikula
@ 2015-12-04 12:00         ` Dave Gordon
  2015-12-04 12:06           ` Chris Wilson
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Gordon @ 2015-12-04 12:00 UTC (permalink / raw)
  To: Chris Wilson, Ville Syrjälä, Takashi Iwai, intel-gfx

On 03/12/15 21:35, Chris Wilson wrote:
> On Thu, Dec 03, 2015 at 11:25:48PM +0200, Ville Syrjälä wrote:
>> On Thu, Dec 03, 2015 at 10:08:05PM +0100, Takashi Iwai wrote:
>>> On Thu, 03 Dec 2015 21:33:29 +0100,
>>> Ville Syrjälä wrote:
>>>>
>>>> On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
>>>>> Hi,
>>>>>
>>>>> I've experienced a few graphics issues recently, and I tend to believe
>>>>> that it has happened since 4.4-rc.  Namely, after some long time usage
>>>>> on my HSW laptop (two or three days), the mouse cursor vanished
>>>>> suddenly.  It kept pointing but just became invisible.  Also, after
>>>>> some S3 cycles, some glyphs on a console or on Firefox became
>>>>> invisible, too.  The windows and graphics were shown well, and X core
>>>>> fonts were still shown properly, too.  Switching to VT1 and back
>>>>> didn't change the situation.
>>>>
>>>> I think I have a fix for this *very* annoying problem. I'v been cursing
>>>> on irc for weeks about it, until I finally got off my arse and debugged
>>>> it.
>>>>
>>>> I pushed out my my cursor branch:
>>>> git://github.com/vsyrjala/linux.git disappearing_cursor_fix
>>>>
>>>> It has lots of other junk too, but it should be just there two that fix it:
>>>> 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
>>>> 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
>>>>
>>>> Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
>>>> send them out yet. Maybe tomorrow...
>>>
>>> Great, I'll try them out now.  But these look like fixing only the
>>> cursor issue.  Would they cover also the missing glyphs I experienced?
>>
>> No. That's either userland, or some object/context/etc. getting corrupted
>> I think. I've had something like that occasionally too after some number of
>> suspend cycles, and usually fbcon is dead at that point too (just get a
>> black screen on VT switch).
>>
>> I think we had some bug with not properly pinning the fbdev buffer which
>> could explain things getting corrupted. Chris had a fix I think, but I'm
>> not sure if that went anywhere. Chris?
>
> Jani keeps refusing it :). But it's not the issue with the missing
> glyphs. The missing glyphs is the kernel dropping rendering, or that
> rendering not being flushed out to memory across the suspend as it is just
> texture corruption. The glyph cache only slowly changes, so corruption
> tends to be visible for some time.  An alternative explanation would be
> that GPU state is not restored upon resume that only (visibly) effects
> glyph rendering (and portions thereof). Lost rendering is a simpler
> explanation.
> -Chris

Could also be down to certain objects getting their contents discarded 
when evicted (due to not being marked dirty), for which I posted a fix 
"Always mark GEM objects as dirty when written by the CPU" a few days ago?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-04 12:00         ` Dave Gordon
@ 2015-12-04 12:06           ` Chris Wilson
  2015-12-04 12:16             ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2015-12-04 12:06 UTC (permalink / raw)
  To: Dave Gordon; +Cc: Takashi Iwai, intel-gfx

On Fri, Dec 04, 2015 at 12:00:08PM +0000, Dave Gordon wrote:
> On 03/12/15 21:35, Chris Wilson wrote:
> >On Thu, Dec 03, 2015 at 11:25:48PM +0200, Ville Syrjälä wrote:
> >>On Thu, Dec 03, 2015 at 10:08:05PM +0100, Takashi Iwai wrote:
> >>>On Thu, 03 Dec 2015 21:33:29 +0100,
> >>>Ville Syrjälä wrote:
> >>>>
> >>>>On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> >>>>>Hi,
> >>>>>
> >>>>>I've experienced a few graphics issues recently, and I tend to believe
> >>>>>that it has happened since 4.4-rc.  Namely, after some long time usage
> >>>>>on my HSW laptop (two or three days), the mouse cursor vanished
> >>>>>suddenly.  It kept pointing but just became invisible.  Also, after
> >>>>>some S3 cycles, some glyphs on a console or on Firefox became
> >>>>>invisible, too.  The windows and graphics were shown well, and X core
> >>>>>fonts were still shown properly, too.  Switching to VT1 and back
> >>>>>didn't change the situation.
> >>>>
> >>>>I think I have a fix for this *very* annoying problem. I'v been cursing
> >>>>on irc for weeks about it, until I finally got off my arse and debugged
> >>>>it.
> >>>>
> >>>>I pushed out my my cursor branch:
> >>>>git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> >>>>
> >>>>It has lots of other junk too, but it should be just there two that fix it:
> >>>>59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> >>>>25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> >>>>
> >>>>Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> >>>>send them out yet. Maybe tomorrow...
> >>>
> >>>Great, I'll try them out now.  But these look like fixing only the
> >>>cursor issue.  Would they cover also the missing glyphs I experienced?
> >>
> >>No. That's either userland, or some object/context/etc. getting corrupted
> >>I think. I've had something like that occasionally too after some number of
> >>suspend cycles, and usually fbcon is dead at that point too (just get a
> >>black screen on VT switch).
> >>
> >>I think we had some bug with not properly pinning the fbdev buffer which
> >>could explain things getting corrupted. Chris had a fix I think, but I'm
> >>not sure if that went anywhere. Chris?
> >
> >Jani keeps refusing it :). But it's not the issue with the missing
> >glyphs. The missing glyphs is the kernel dropping rendering, or that
> >rendering not being flushed out to memory across the suspend as it is just
> >texture corruption. The glyph cache only slowly changes, so corruption
> >tends to be visible for some time.  An alternative explanation would be
> >that GPU state is not restored upon resume that only (visibly) effects
> >glyph rendering (and portions thereof). Lost rendering is a simpler
> >explanation.
> >-Chris
> 
> Could also be down to certain objects getting their contents
> discarded when evicted (due to not being marked dirty), for which I
> posted a fix "Always mark GEM objects as dirty when written by the
> CPU" a few days ago?

Grasping at straws?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-04 12:06           ` Chris Wilson
@ 2015-12-04 12:16             ` Chris Wilson
  2015-12-04 12:45               ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2015-12-04 12:16 UTC (permalink / raw)
  To: Dave Gordon, Ville Syrjälä, Takashi Iwai, intel-gfx

On Fri, Dec 04, 2015 at 12:06:59PM +0000, Chris Wilson wrote:
> > Could also be down to certain objects getting their contents
> > discarded when evicted (due to not being marked dirty), for which I
> > posted a fix "Always mark GEM objects as dirty when written by the
> > CPU" a few days ago?
> 
> Grasping at straws?

On reflection, rather than the object->dirty patch, you want

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem
_gtt.c
index 1f7e6b9df45d..033df035a066 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -346,6 +346,7 @@ static void cleanup_page_dma(struct drm_device *dev, struct 
i915_page_dma *p)
 
 static void *kmap_page_dma(struct i915_page_dma *p)
 {
+       set_page_dirty(p->page);
        return kmap_atomic(p->page);
 }
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-04 12:16             ` Chris Wilson
@ 2015-12-04 12:45               ` Chris Wilson
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2015-12-04 12:45 UTC (permalink / raw)
  To: Dave Gordon, Ville Syrjälä, Takashi Iwai, intel-gfx

On Fri, Dec 04, 2015 at 12:16:40PM +0000, Chris Wilson wrote:
> On Fri, Dec 04, 2015 at 12:06:59PM +0000, Chris Wilson wrote:
> > > Could also be down to certain objects getting their contents
> > > discarded when evicted (due to not being marked dirty), for which I
> > > posted a fix "Always mark GEM objects as dirty when written by the
> > > CPU" a few days ago?
> > 
> > Grasping at straws?
> 
> On reflection, rather than the object->dirty patch, you want
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem
> _gtt.c
> index 1f7e6b9df45d..033df035a066 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -346,6 +346,7 @@ static void cleanup_page_dma(struct drm_device *dev, struct 
> i915_page_dma *p)
>  
>  static void *kmap_page_dma(struct i915_page_dma *p)
>  {
> +       set_page_dirty(p->page);
>         return kmap_atomic(p->page);
>  }

Or not? These pages are not swappable and remain allocated, so I would
expect the hibernation process to also make a copy of them and restore
them. Besides we would get outright GPU hangs and massive memory
corruption if the PTE were absent.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-04  8:44         ` Jani Nikula
@ 2015-12-04 15:57           ` Daniel Vetter
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2015-12-04 15:57 UTC (permalink / raw)
  To: Jani Nikula; +Cc: Takashi Iwai, intel-gfx

On Fri, Dec 04, 2015 at 10:44:17AM +0200, Jani Nikula wrote:
> On Thu, 03 Dec 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Thu, Dec 03, 2015 at 11:25:48PM +0200, Ville Syrjälä wrote:
> >> I think we had some bug with not properly pinning the fbdev buffer which
> >> could explain things getting corrupted. Chris had a fix I think, but I'm
> >> not sure if that went anywhere. Chris?
> >
> > Jani keeps refusing it :)
> 
> Which one? Was I being a boring pedant, requiring it gets review or
> compiles or something...? :p

It had review, but didnt really compile iirc. Chris, can you pls kick that
can a bit more?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-04  9:40     ` Ville Syrjälä
@ 2015-12-04 16:02       ` Daniel Vetter
  2015-12-04 16:12         ` Takashi Iwai
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Vetter @ 2015-12-04 16:02 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Takashi Iwai, intel-gfx

On Fri, Dec 04, 2015 at 11:40:59AM +0200, Ville Syrjälä wrote:
> On Fri, Dec 04, 2015 at 10:49:48AM +0200, Jani Nikula wrote:
> > On Thu, 03 Dec 2015, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > > On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> > >> Hi,
> > >> 
> > >> I've experienced a few graphics issues recently, and I tend to believe
> > >> that it has happened since 4.4-rc.  Namely, after some long time usage
> > >> on my HSW laptop (two or three days), the mouse cursor vanished
> > >> suddenly.  It kept pointing but just became invisible.  Also, after
> > >> some S3 cycles, some glyphs on a console or on Firefox became
> > >> invisible, too.  The windows and graphics were shown well, and X core
> > >> fonts were still shown properly, too.  Switching to VT1 and back
> > >> didn't change the situation.
> > >
> > > I think I have a fix for this *very* annoying problem. I'v been cursing
> > > on irc for weeks about it, until I finally got off my arse and debugged
> > > it.
> > >
> > > I pushed out my my cursor branch:
> > > git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> > >
> > > It has lots of other junk too, but it should be just there two that fix it:
> > > 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> > > 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> > >
> > > Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> > > send them out yet. Maybe tomorrow...
> > 
> > So I've hit this too, albeit very rarely, on a Haswell running Debian
> > stable with the stock v3.16 kernel. Haven't seen it on any other
> > machine. It's really too rare to even debug or verify a fix. Is it
> > possible we just happened to make an old bug occur more frequently now?
> 
> The potential for it has definitely been there for a long time.

Oh dear, let's have fun and look at some awful history.

commit e568af1c626031925465a5caaab7cca1303d55c7
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Mar 26 20:08:20 2014 +0100

    drm/i915: Undo gtt scratch pte unmapping again

Which essentially reverted

commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Wed Oct 16 09:21:30 2013 -0700

    drm/i915: Disable GGTT PTEs on GEN6+ suspend
    
    Once the machine gets to a certain point in the suspend process, we
    expect the GPU to be idle. If it is not, we might corrupt memory.
    Empirically (with an early version of this patch) we have seen this is
    not the case. We cannot currently explain why the latent GPU writes
    occur.
    
    In the technical sense, this patch is a workaround in that we have an
    issue we can't explain, and the patch indirectly solves the issue.
    However, it's really better than a workaround because we understand why
    it works, and it really should be a safe thing to do in all cases.
    
    The noticeable effect other than the debug messages would be an increase
    in the suspend time. I have not measure how expensive it actually is.
    
    I think it would be good to spend further time to root cause why we're
    seeing these latent writes, but it shouldn't preclude preventing the
    fallout.
    
    NOTE: It should be safe (and makes some sense IMO) to also keep the
    VALID bit unset on resume when we clear_range(). I've opted not to do
    this as properly clearing those bits at some later point would be extra
    work.
    
    v2: Fix bugzilla link
    
    Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=65496
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=59321
    Tested-by: Takashi Iwai <tiwai@suse.de>
    Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Tested-By: Todd Previte <tprevite@gmail.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

This was a regression in a regression right before I ragequit the entire
bug handling deal because no one cared any more and management was all
"why is this important".

Would be interesting if these issues magically disapper when changing that
back again. Doesn't mean that we're any closer to figuring out what's
corrupting what exactly here, but at least we'd have a reason to digg out
this old sob story of mine.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-04 16:02       ` Daniel Vetter
@ 2015-12-04 16:12         ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2015-12-04 16:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, 04 Dec 2015 17:02:52 +0100,
Daniel Vetter wrote:
> 
> On Fri, Dec 04, 2015 at 11:40:59AM +0200, Ville Syrjälä wrote:
> > On Fri, Dec 04, 2015 at 10:49:48AM +0200, Jani Nikula wrote:
> > > On Thu, 03 Dec 2015, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > > > On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> > > >> Hi,
> > > >> 
> > > >> I've experienced a few graphics issues recently, and I tend to believe
> > > >> that it has happened since 4.4-rc.  Namely, after some long time usage
> > > >> on my HSW laptop (two or three days), the mouse cursor vanished
> > > >> suddenly.  It kept pointing but just became invisible.  Also, after
> > > >> some S3 cycles, some glyphs on a console or on Firefox became
> > > >> invisible, too.  The windows and graphics were shown well, and X core
> > > >> fonts were still shown properly, too.  Switching to VT1 and back
> > > >> didn't change the situation.
> > > >
> > > > I think I have a fix for this *very* annoying problem. I'v been cursing
> > > > on irc for weeks about it, until I finally got off my arse and debugged
> > > > it.
> > > >
> > > > I pushed out my my cursor branch:
> > > > git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> > > >
> > > > It has lots of other junk too, but it should be just there two that fix it:
> > > > 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> > > > 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> > > >
> > > > Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> > > > send them out yet. Maybe tomorrow...
> > > 
> > > So I've hit this too, albeit very rarely, on a Haswell running Debian
> > > stable with the stock v3.16 kernel. Haven't seen it on any other
> > > machine. It's really too rare to even debug or verify a fix. Is it
> > > possible we just happened to make an old bug occur more frequently now?
> > 
> > The potential for it has definitely been there for a long time.
> 
> Oh dear, let's have fun and look at some awful history.
> 
> commit e568af1c626031925465a5caaab7cca1303d55c7
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Wed Mar 26 20:08:20 2014 +0100
> 
>     drm/i915: Undo gtt scratch pte unmapping again
> 
> Which essentially reverted
> 
> commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
> Author: Ben Widawsky <benjamin.widawsky@intel.com>
> Date:   Wed Oct 16 09:21:30 2013 -0700
> 
>     drm/i915: Disable GGTT PTEs on GEN6+ suspend
>     
>     Once the machine gets to a certain point in the suspend process, we
>     expect the GPU to be idle. If it is not, we might corrupt memory.
>     Empirically (with an early version of this patch) we have seen this is
>     not the case. We cannot currently explain why the latent GPU writes
>     occur.
>     
>     In the technical sense, this patch is a workaround in that we have an
>     issue we can't explain, and the patch indirectly solves the issue.
>     However, it's really better than a workaround because we understand why
>     it works, and it really should be a safe thing to do in all cases.
>     
>     The noticeable effect other than the debug messages would be an increase
>     in the suspend time. I have not measure how expensive it actually is.
>     
>     I think it would be good to spend further time to root cause why we're
>     seeing these latent writes, but it shouldn't preclude preventing the
>     fallout.
>     
>     NOTE: It should be safe (and makes some sense IMO) to also keep the
>     VALID bit unset on resume when we clear_range(). I've opted not to do
>     this as properly clearing those bits at some later point would be extra
>     work.
>     
>     v2: Fix bugzilla link
>     
>     Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=65496
>     Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=59321
>     Tested-by: Takashi Iwai <tiwai@suse.de>
>     Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
>     Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>     Tested-By: Todd Previte <tprevite@gmail.com>
>     Cc: stable@vger.kernel.org
>     Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> This was a regression in a regression right before I ragequit the entire
> bug handling deal because no one cared any more and management was all
> "why is this important".
> 
> Would be interesting if these issues magically disapper when changing that
> back again. Doesn't mean that we're any closer to figuring out what's
> corrupting what exactly here, but at least we'd have a reason to digg out
> this old sob story of mine.

Hm, but this revert was also fairly ago, and I don't remember of the
similar breakage until 4.4-rc.  Might be just a (bad) luck, though.

(And no surprise, I was already in the party above!  Everyone must
 have smoked badly there.)


thanks,

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Possible i915 regression with 4.4-rc
  2015-12-03 21:25     ` Ville Syrjälä
  2015-12-03 21:35       ` Chris Wilson
  2015-12-03 21:38       ` Lukas Wunner
@ 2015-12-08  7:03       ` Takashi Iwai
  2 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2015-12-08  7:03 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Thu, 03 Dec 2015 22:25:48 +0100,
Ville Syrjälä wrote:
> 
> On Thu, Dec 03, 2015 at 10:08:05PM +0100, Takashi Iwai wrote:
> > On Thu, 03 Dec 2015 21:33:29 +0100,
> > Ville Syrjälä wrote:
> > > 
> > > On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> > > > Hi,
> > > > 
> > > > I've experienced a few graphics issues recently, and I tend to believe
> > > > that it has happened since 4.4-rc.  Namely, after some long time usage
> > > > on my HSW laptop (two or three days), the mouse cursor vanished
> > > > suddenly.  It kept pointing but just became invisible.  Also, after
> > > > some S3 cycles, some glyphs on a console or on Firefox became
> > > > invisible, too.  The windows and graphics were shown well, and X core
> > > > fonts were still shown properly, too.  Switching to VT1 and back
> > > > didn't change the situation.
> > > 
> > > I think I have a fix for this *very* annoying problem. I'v been cursing
> > > on irc for weeks about it, until I finally got off my arse and debugged
> > > it.
> > > 
> > > I pushed out my my cursor branch:
> > > git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> > > 
> > > It has lots of other junk too, but it should be just there two that fix it:
> > > 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> > > 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> > >
> > > Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> > > send them out yet. Maybe tomorrow...
> > 
> > Great, I'll try them out now.  But these look like fixing only the
> > cursor issue.  Would they cover also the missing glyphs I experienced?
> 
> No. That's either userland, or some object/context/etc. getting corrupted
> I think. I've had something like that occasionally too after some number of
> suspend cycles, and usually fbcon is dead at that point too (just get a
> black screen on VT switch).

I hit this S3 problem this morning again, and indeed fbcon is dead,
too.  Re-login cured X.

If any patch is available for testing, let me know.  It seems that 4.4
series really can show this problem more often.


thanks,

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-12-08  7:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 20:00 Possible i915 regression with 4.4-rc Takashi Iwai
2015-12-03 20:33 ` Ville Syrjälä
2015-12-03 21:08   ` Takashi Iwai
2015-12-03 21:25     ` Ville Syrjälä
2015-12-03 21:35       ` Chris Wilson
2015-12-04  8:44         ` Jani Nikula
2015-12-04 15:57           ` Daniel Vetter
2015-12-04 12:00         ` Dave Gordon
2015-12-04 12:06           ` Chris Wilson
2015-12-04 12:16             ` Chris Wilson
2015-12-04 12:45               ` Chris Wilson
2015-12-03 21:38       ` Lukas Wunner
2015-12-08  7:03       ` Takashi Iwai
2015-12-04  8:49   ` Jani Nikula
2015-12-04  9:40     ` Ville Syrjälä
2015-12-04 16:02       ` Daniel Vetter
2015-12-04 16:12         ` Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.