All of lore.kernel.org
 help / color / mirror / Atom feed
* [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-08 20:38 ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-08 20:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, stable, Chris Wilson, intel-gfx

Hi All,

We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
reporters report that adding i915.mitigations=off to the cmdline fixes things, see:

https://bugzilla.redhat.com/show_bug.cgi?id=1925346

Which should be fully visible without a bugzilla account.

I noticed that 5.10.13 had one more related i915 patch, so I've asked the reporters
to retest with 5.10.13, 5.10.13 is better, but things are not fixed there, it just
takes longer for the problems to show up.

Greg, I can prepare a Fedora test-kernel build for the reporters to test with
the following 3 commits reverted:

520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")
ecca0c675bdecebd ("drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail")
48b8c6689efa7cd6 ("drm/i915/gt: Limit VFE threads based on GT")
(Note this are the 5.10.y hashes)

Reverting these 3 is not ideal, but it is probably the fastest way to get
this resolved for the 5.10.y series.

Greg, do you want me to have the reporters test a 5.10.y series kernel
with these 3 reverts ?

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-08 20:38 ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-08 20:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, stable, Chris Wilson, intel-gfx

Hi All,

We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
reporters report that adding i915.mitigations=off to the cmdline fixes things, see:

https://bugzilla.redhat.com/show_bug.cgi?id=1925346

Which should be fully visible without a bugzilla account.

I noticed that 5.10.13 had one more related i915 patch, so I've asked the reporters
to retest with 5.10.13, 5.10.13 is better, but things are not fixed there, it just
takes longer for the problems to show up.

Greg, I can prepare a Fedora test-kernel build for the reporters to test with
the following 3 commits reverted:

520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")
ecca0c675bdecebd ("drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail")
48b8c6689efa7cd6 ("drm/i915/gt: Limit VFE threads based on GT")
(Note this are the 5.10.y hashes)

Reverting these 3 is not ideal, but it is probably the fastest way to get
this resolved for the 5.10.y series.

Greg, do you want me to have the reporters test a 5.10.y series kernel
with these 3 reverts ?

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-08 20:38 ` [Intel-gfx] " Hans de Goede
@ 2021-02-08 23:27   ` Chris Wilson
  -1 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-08 23:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-08 20:38:58)
> Hi All,
> 
> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:

I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
did not report any differences with and without mitigations. I have yet
to test other platforms. So I don't yet have an alternative. Though note
that v5.11 and v5.12 will behave similarly, so we need to urgently find
a fix for Linus's tree anyway.
-Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-08 23:27   ` Chris Wilson
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-08 23:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-08 20:38:58)
> Hi All,
> 
> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:

I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
did not report any differences with and without mitigations. I have yet
to test other platforms. So I don't yet have an alternative. Though note
that v5.11 and v5.12 will behave similarly, so we need to urgently find
a fix for Linus's tree anyway.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-08 23:27   ` Chris Wilson
@ 2021-02-09 11:46     ` Hans de Goede
  -1 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-09 11:46 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/9/21 12:27 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-08 20:38:58)
>> Hi All,
>>
>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> 
> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> did not report any differences with and without mitigations. I have yet
> to test other platforms. So I don't yet have an alternative.

Note the original / first reporter of:

https://bugzilla.redhat.com/show_bug.cgi?id=1925346

Is using hsw-gt2, so it seems that the problem is not just the enabling of
the mitigations on ivy-bridge / bay-trail but that there actually is
a regression on devices where the WA worked fine before...

If you have access to a hsw-gt2 device then testing there might help?

Also note that this reproduces more easily on 5.10.10, which does not have:

520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")

Not sure if that helps though.

> Though note
> that v5.11 and v5.12 will behave similarly, so we need to urgently find
> a fix for Linus's tree anyway.

Ack.

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-09 11:46     ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-09 11:46 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/9/21 12:27 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-08 20:38:58)
>> Hi All,
>>
>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> 
> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> did not report any differences with and without mitigations. I have yet
> to test other platforms. So I don't yet have an alternative.

Note the original / first reporter of:

https://bugzilla.redhat.com/show_bug.cgi?id=1925346

Is using hsw-gt2, so it seems that the problem is not just the enabling of
the mitigations on ivy-bridge / bay-trail but that there actually is
a regression on devices where the WA worked fine before...

If you have access to a hsw-gt2 device then testing there might help?

Also note that this reproduces more easily on 5.10.10, which does not have:

520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")

Not sure if that helps though.

> Though note
> that v5.11 and v5.12 will behave similarly, so we need to urgently find
> a fix for Linus's tree anyway.

Ack.

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-09 11:46     ` Hans de Goede
@ 2021-02-09 11:55       ` Chris Wilson
  -1 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-09 11:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-09 11:46:46)
> Hi,
> 
> On 2/9/21 12:27 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-08 20:38:58)
> >> Hi All,
> >>
> >> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> > 
> > I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> > did not report any differences with and without mitigations. I have yet
> > to test other platforms. So I don't yet have an alternative.
> 
> Note the original / first reporter of:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> 
> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> the mitigations on ivy-bridge / bay-trail but that there actually is
> a regression on devices where the WA worked fine before...
> 
> If you have access to a hsw-gt2 device then testing there might help?

The current one is headless, I'm trying to get a laptop with gt2 setup
again so that I can do more than test with piglit.

> Also note that this reproduces more easily on 5.10.10, which does not have:
> 
> 520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")
> 
> Not sure if that helps though.

It gives a clue that it's still a problem with the pipe state. (Which is
believable as there can't be much else!)
-Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-09 11:55       ` Chris Wilson
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-09 11:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-09 11:46:46)
> Hi,
> 
> On 2/9/21 12:27 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-08 20:38:58)
> >> Hi All,
> >>
> >> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> > 
> > I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> > did not report any differences with and without mitigations. I have yet
> > to test other platforms. So I don't yet have an alternative.
> 
> Note the original / first reporter of:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> 
> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> the mitigations on ivy-bridge / bay-trail but that there actually is
> a regression on devices where the WA worked fine before...
> 
> If you have access to a hsw-gt2 device then testing there might help?

The current one is headless, I'm trying to get a laptop with gt2 setup
again so that I can do more than test with piglit.

> Also note that this reproduces more easily on 5.10.10, which does not have:
> 
> 520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")
> 
> Not sure if that helps though.

It gives a clue that it's still a problem with the pipe state. (Which is
believable as there can't be much else!)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-08 23:27   ` Chris Wilson
@ 2021-02-09 16:43     ` Hans de Goede
  -1 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-09 16:43 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/9/21 12:27 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-08 20:38:58)
>> Hi All,
>>
>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> 
> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> did not report any differences with and without mitigations. I have yet
> to test other platforms. So I don't yet have an alternative. Though note
> that v5.11 and v5.12 will behave similarly, so we need to urgently find
> a fix for Linus's tree anyway.

Note I've gone ahead and prepared a test kernel for the Fedora bug reports
with the following 3 commits reverted from 5.10.y :

520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")
ecca0c675bdecebd ("drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail")
48b8c6689efa7cd6 ("drm/i915/gt: Limit VFE threads based on GT")
(Note this are the 5.10.y hashes)

I know going this route is not ideal but it might be best for 5.10.y for now.

I will let you know if reverting these 3 actually helps once I hear back
from the reporters of the issue.

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-09 16:43     ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-09 16:43 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/9/21 12:27 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-08 20:38:58)
>> Hi All,
>>
>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> 
> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> did not report any differences with and without mitigations. I have yet
> to test other platforms. So I don't yet have an alternative. Though note
> that v5.11 and v5.12 will behave similarly, so we need to urgently find
> a fix for Linus's tree anyway.

Note I've gone ahead and prepared a test kernel for the Fedora bug reports
with the following 3 commits reverted from 5.10.y :

520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")
ecca0c675bdecebd ("drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail")
48b8c6689efa7cd6 ("drm/i915/gt: Limit VFE threads based on GT")
(Note this are the 5.10.y hashes)

I know going this route is not ideal but it might be best for 5.10.y for now.

I will let you know if reverting these 3 actually helps once I hear back
from the reporters of the issue.

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-09 11:46     ` Hans de Goede
@ 2021-02-09 23:07       ` Chris Wilson
  -1 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-09 23:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-09 11:46:46)
> Hi,
> 
> On 2/9/21 12:27 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-08 20:38:58)
> >> Hi All,
> >>
> >> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> > 
> > I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> > did not report any differences with and without mitigations. I have yet
> > to test other platforms. So I don't yet have an alternative.
> 
> Note the original / first reporter of:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> 
> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> the mitigations on ivy-bridge / bay-trail but that there actually is
> a regression on devices where the WA worked fine before...

There have been 3 crashes uploaded related to v5.10.9, and in all 3
cases the ACTHD has been in the first page. This strongly suggests that
the w/a is scribbling over address 0. And there's then a very good
chance that

commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 25 12:50:33 2021 +0000

    drm/i915/gt: Always try to reserve GGTT address 0x0
    
    commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.

in v5.10.14 is sufficient to hide the issue.
-Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-09 23:07       ` Chris Wilson
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-09 23:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-09 11:46:46)
> Hi,
> 
> On 2/9/21 12:27 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-08 20:38:58)
> >> Hi All,
> >>
> >> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> > 
> > I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> > did not report any differences with and without mitigations. I have yet
> > to test other platforms. So I don't yet have an alternative.
> 
> Note the original / first reporter of:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> 
> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> the mitigations on ivy-bridge / bay-trail but that there actually is
> a regression on devices where the WA worked fine before...

There have been 3 crashes uploaded related to v5.10.9, and in all 3
cases the ACTHD has been in the first page. This strongly suggests that
the w/a is scribbling over address 0. And there's then a very good
chance that

commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 25 12:50:33 2021 +0000

    drm/i915/gt: Always try to reserve GGTT address 0x0
    
    commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.

in v5.10.14 is sufficient to hide the issue.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-09 23:07       ` Chris Wilson
@ 2021-02-10 10:37         ` Hans de Goede
  -1 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-10 10:37 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/10/21 12:07 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-09 11:46:46)
>> Hi,
>>
>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>> Hi All,
>>>>
>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>
>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>> did not report any differences with and without mitigations. I have yet
>>> to test other platforms. So I don't yet have an alternative.
>>
>> Note the original / first reporter of:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>
>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>> the mitigations on ivy-bridge / bay-trail but that there actually is
>> a regression on devices where the WA worked fine before...
> 
> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> cases the ACTHD has been in the first page. This strongly suggests that
> the w/a is scribbling over address 0. And there's then a very good
> chance that
> 
> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Jan 25 12:50:33 2021 +0000
> 
>     drm/i915/gt: Always try to reserve GGTT address 0x0
>     
>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> 
> in v5.10.14 is sufficient to hide the issue.

That one actually is already in v5.10.13 and the various reportes of these
issues have already tested 5.10.13. They did mention that it took longer
to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:

"drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-10 10:37         ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-10 10:37 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/10/21 12:07 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-09 11:46:46)
>> Hi,
>>
>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>> Hi All,
>>>>
>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>
>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>> did not report any differences with and without mitigations. I have yet
>>> to test other platforms. So I don't yet have an alternative.
>>
>> Note the original / first reporter of:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>
>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>> the mitigations on ivy-bridge / bay-trail but that there actually is
>> a regression on devices where the WA worked fine before...
> 
> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> cases the ACTHD has been in the first page. This strongly suggests that
> the w/a is scribbling over address 0. And there's then a very good
> chance that
> 
> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Jan 25 12:50:33 2021 +0000
> 
>     drm/i915/gt: Always try to reserve GGTT address 0x0
>     
>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> 
> in v5.10.14 is sufficient to hide the issue.

That one actually is already in v5.10.13 and the various reportes of these
issues have already tested 5.10.13. They did mention that it took longer
to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:

"drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-10 10:37         ` Hans de Goede
@ 2021-02-10 12:48           ` Chris Wilson
  -1 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-10 12:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-10 10:37:19)
> Hi,
> 
> On 2/10/21 12:07 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-09 11:46:46)
> >> Hi,
> >>
> >> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>> Hi All,
> >>>>
> >>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> >>>
> >>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>> did not report any differences with and without mitigations. I have yet
> >>> to test other platforms. So I don't yet have an alternative.
> >>
> >> Note the original / first reporter of:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>
> >> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >> the mitigations on ivy-bridge / bay-trail but that there actually is
> >> a regression on devices where the WA worked fine before...
> > 
> > There have been 3 crashes uploaded related to v5.10.9, and in all 3
> > cases the ACTHD has been in the first page. This strongly suggests that
> > the w/a is scribbling over address 0. And there's then a very good
> > chance that
> > 
> > commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Mon Jan 25 12:50:33 2021 +0000
> > 
> >     drm/i915/gt: Always try to reserve GGTT address 0x0
> >     
> >     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> > 
> > in v5.10.14 is sufficient to hide the issue.
> 
> That one actually is already in v5.10.13 and the various reportes of these
> issues have already tested 5.10.13. They did mention that it took longer
> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> 
> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5

Started looking for scratch page overwrites, and found this little gem:
https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1

Looks promising wrt the cause of overwriting random addresses -- and
I hope that is the explanation for the glitches/hangs. I have a hsw gt2
with gnome shell, piglit is happy, but I suspect it is all due to
placement and so will only occur at random.
-Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-10 12:48           ` Chris Wilson
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-10 12:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-10 10:37:19)
> Hi,
> 
> On 2/10/21 12:07 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-09 11:46:46)
> >> Hi,
> >>
> >> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>> Hi All,
> >>>>
> >>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> >>>
> >>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>> did not report any differences with and without mitigations. I have yet
> >>> to test other platforms. So I don't yet have an alternative.
> >>
> >> Note the original / first reporter of:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>
> >> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >> the mitigations on ivy-bridge / bay-trail but that there actually is
> >> a regression on devices where the WA worked fine before...
> > 
> > There have been 3 crashes uploaded related to v5.10.9, and in all 3
> > cases the ACTHD has been in the first page. This strongly suggests that
> > the w/a is scribbling over address 0. And there's then a very good
> > chance that
> > 
> > commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Mon Jan 25 12:50:33 2021 +0000
> > 
> >     drm/i915/gt: Always try to reserve GGTT address 0x0
> >     
> >     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> > 
> > in v5.10.14 is sufficient to hide the issue.
> 
> That one actually is already in v5.10.13 and the various reportes of these
> issues have already tested 5.10.13. They did mention that it took longer
> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> 
> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5

Started looking for scratch page overwrites, and found this little gem:
https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1

Looks promising wrt the cause of overwriting random addresses -- and
I hope that is the explanation for the glitches/hangs. I have a hsw gt2
with gnome shell, piglit is happy, but I suspect it is all due to
placement and so will only occur at random.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-10 10:37         ` Hans de Goede
@ 2021-02-11  0:00           ` Chris Wilson
  -1 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-11  0:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-10 10:37:19)
> Hi,
> 
> On 2/10/21 12:07 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-09 11:46:46)
> >> Hi,
> >>
> >> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>> Hi All,
> >>>>
> >>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> >>>
> >>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>> did not report any differences with and without mitigations. I have yet
> >>> to test other platforms. So I don't yet have an alternative.
> >>
> >> Note the original / first reporter of:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>
> >> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >> the mitigations on ivy-bridge / bay-trail but that there actually is
> >> a regression on devices where the WA worked fine before...
> > 
> > There have been 3 crashes uploaded related to v5.10.9, and in all 3
> > cases the ACTHD has been in the first page. This strongly suggests that
> > the w/a is scribbling over address 0. And there's then a very good
> > chance that
> > 
> > commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Mon Jan 25 12:50:33 2021 +0000
> > 
> >     drm/i915/gt: Always try to reserve GGTT address 0x0
> >     
> >     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> > 
> > in v5.10.14 is sufficient to hide the issue.
> 
> That one actually is already in v5.10.13 and the various reportes of these
> issues have already tested 5.10.13. They did mention that it took longer
> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> 
> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5

There's also another pair of pipecontrols required for that:

d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")

which didn't get picked up for stable.

https://intel-gfx-ci.01.org/tree/linux-stable/index-alt.html?

shows that we are still missing a couple of fixes for the w/a, at least
compare to BAT on drm-tip.
-Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-11  0:00           ` Chris Wilson
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-11  0:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-10 10:37:19)
> Hi,
> 
> On 2/10/21 12:07 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-09 11:46:46)
> >> Hi,
> >>
> >> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>> Hi All,
> >>>>
> >>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> >>>
> >>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>> did not report any differences with and without mitigations. I have yet
> >>> to test other platforms. So I don't yet have an alternative.
> >>
> >> Note the original / first reporter of:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>
> >> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >> the mitigations on ivy-bridge / bay-trail but that there actually is
> >> a regression on devices where the WA worked fine before...
> > 
> > There have been 3 crashes uploaded related to v5.10.9, and in all 3
> > cases the ACTHD has been in the first page. This strongly suggests that
> > the w/a is scribbling over address 0. And there's then a very good
> > chance that
> > 
> > commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Mon Jan 25 12:50:33 2021 +0000
> > 
> >     drm/i915/gt: Always try to reserve GGTT address 0x0
> >     
> >     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> > 
> > in v5.10.14 is sufficient to hide the issue.
> 
> That one actually is already in v5.10.13 and the various reportes of these
> issues have already tested 5.10.13. They did mention that it took longer
> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> 
> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5

There's also another pair of pipecontrols required for that:

d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")

which didn't get picked up for stable.

https://intel-gfx-ci.01.org/tree/linux-stable/index-alt.html?

shows that we are still missing a couple of fixes for the w/a, at least
compare to BAT on drm-tip.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-10 12:48           ` Chris Wilson
@ 2021-02-11 10:36             ` Hans de Goede
  -1 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-11 10:36 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/10/21 1:48 PM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-10 10:37:19)
>> Hi,
>>
>> On 2/10/21 12:07 AM, Chris Wilson wrote:
>>> Quoting Hans de Goede (2021-02-09 11:46:46)
>>>> Hi,
>>>>
>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>>>> Hi All,
>>>>>>
>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>>>
>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>>>> did not report any differences with and without mitigations. I have yet
>>>>> to test other platforms. So I don't yet have an alternative.
>>>>
>>>> Note the original / first reporter of:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>>>
>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
>>>> a regression on devices where the WA worked fine before...
>>>
>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
>>> cases the ACTHD has been in the first page. This strongly suggests that
>>> the w/a is scribbling over address 0. And there's then a very good
>>> chance that
>>>
>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>> Date:   Mon Jan 25 12:50:33 2021 +0000
>>>
>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
>>>     
>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
>>>
>>> in v5.10.14 is sufficient to hide the issue.
>>
>> That one actually is already in v5.10.13 and the various reportes of these
>> issues have already tested 5.10.13. They did mention that it took longer
>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
>>
>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> 
> Started looking for scratch page overwrites, and found this little gem:
> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
> 
> Looks promising wrt the cause of overwriting random addresses -- and
> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> with gnome shell, piglit is happy, but I suspect it is all due to
> placement and so will only occur at random.

If you can give me a list of commits to cherry-pick then I can prepare
a Fedora 5.10.y kernel which those added for the group of Fedora users
who are hitting this to test.

FWIW those users have reported back that the 3 reverts which I mentioned
earlier do indeed restore normal functionality (without glitches) for them.

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-11 10:36             ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-11 10:36 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/10/21 1:48 PM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-10 10:37:19)
>> Hi,
>>
>> On 2/10/21 12:07 AM, Chris Wilson wrote:
>>> Quoting Hans de Goede (2021-02-09 11:46:46)
>>>> Hi,
>>>>
>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>>>> Hi All,
>>>>>>
>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>>>
>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>>>> did not report any differences with and without mitigations. I have yet
>>>>> to test other platforms. So I don't yet have an alternative.
>>>>
>>>> Note the original / first reporter of:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>>>
>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
>>>> a regression on devices where the WA worked fine before...
>>>
>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
>>> cases the ACTHD has been in the first page. This strongly suggests that
>>> the w/a is scribbling over address 0. And there's then a very good
>>> chance that
>>>
>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>> Date:   Mon Jan 25 12:50:33 2021 +0000
>>>
>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
>>>     
>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
>>>
>>> in v5.10.14 is sufficient to hide the issue.
>>
>> That one actually is already in v5.10.13 and the various reportes of these
>> issues have already tested 5.10.13. They did mention that it took longer
>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
>>
>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> 
> Started looking for scratch page overwrites, and found this little gem:
> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
> 
> Looks promising wrt the cause of overwriting random addresses -- and
> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> with gnome shell, piglit is happy, but I suspect it is all due to
> placement and so will only occur at random.

If you can give me a list of commits to cherry-pick then I can prepare
a Fedora 5.10.y kernel which those added for the group of Fedora users
who are hitting this to test.

FWIW those users have reported back that the 3 reverts which I mentioned
earlier do indeed restore normal functionality (without glitches) for them.

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-11 10:36             ` Hans de Goede
@ 2021-02-11 10:49               ` Chris Wilson
  -1 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-11 10:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-11 10:36:13)
> Hi,
> 
> On 2/10/21 1:48 PM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-10 10:37:19)
> >> Hi,
> >>
> >> On 2/10/21 12:07 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-09 11:46:46)
> >>>> Hi,
> >>>>
> >>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>>>> Hi All,
> >>>>>>
> >>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> >>>>>
> >>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>>>> did not report any differences with and without mitigations. I have yet
> >>>>> to test other platforms. So I don't yet have an alternative.
> >>>>
> >>>> Note the original / first reporter of:
> >>>>
> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>>>
> >>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >>>> the mitigations on ivy-bridge / bay-trail but that there actually is
> >>>> a regression on devices where the WA worked fine before...
> >>>
> >>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> >>> cases the ACTHD has been in the first page. This strongly suggests that
> >>> the w/a is scribbling over address 0. And there's then a very good
> >>> chance that
> >>>
> >>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> >>> Author: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Date:   Mon Jan 25 12:50:33 2021 +0000
> >>>
> >>>     drm/i915/gt: Always try to reserve GGTT address 0x0
> >>>     
> >>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> >>>
> >>> in v5.10.14 is sufficient to hide the issue.
> >>
> >> That one actually is already in v5.10.13 and the various reportes of these
> >> issues have already tested 5.10.13. They did mention that it took longer
> >> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> >>
> >> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> > 
> > Started looking for scratch page overwrites, and found this little gem:
> > https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
> > 
> > Looks promising wrt the cause of overwriting random addresses -- and
> > I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> > with gnome shell, piglit is happy, but I suspect it is all due to
> > placement and so will only occur at random.
> 
> If you can give me a list of commits to cherry-pick then I can prepare
> a Fedora 5.10.y kernel which those added for the group of Fedora users
> who are hitting this to test.

e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

are missing from v5.10.15
-Chris

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-11 10:49               ` Chris Wilson
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Wilson @ 2021-02-11 10:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Hans de Goede, intel-gfx, stable

Quoting Hans de Goede (2021-02-11 10:36:13)
> Hi,
> 
> On 2/10/21 1:48 PM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-10 10:37:19)
> >> Hi,
> >>
> >> On 2/10/21 12:07 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-09 11:46:46)
> >>>> Hi,
> >>>>
> >>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>>>> Hi All,
> >>>>>>
> >>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> >>>>>
> >>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>>>> did not report any differences with and without mitigations. I have yet
> >>>>> to test other platforms. So I don't yet have an alternative.
> >>>>
> >>>> Note the original / first reporter of:
> >>>>
> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>>>
> >>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >>>> the mitigations on ivy-bridge / bay-trail but that there actually is
> >>>> a regression on devices where the WA worked fine before...
> >>>
> >>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> >>> cases the ACTHD has been in the first page. This strongly suggests that
> >>> the w/a is scribbling over address 0. And there's then a very good
> >>> chance that
> >>>
> >>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> >>> Author: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Date:   Mon Jan 25 12:50:33 2021 +0000
> >>>
> >>>     drm/i915/gt: Always try to reserve GGTT address 0x0
> >>>     
> >>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> >>>
> >>> in v5.10.14 is sufficient to hide the issue.
> >>
> >> That one actually is already in v5.10.13 and the various reportes of these
> >> issues have already tested 5.10.13. They did mention that it took longer
> >> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> >>
> >> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> > 
> > Started looking for scratch page overwrites, and found this little gem:
> > https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
> > 
> > Looks promising wrt the cause of overwriting random addresses -- and
> > I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> > with gnome shell, piglit is happy, but I suspect it is all due to
> > placement and so will only occur at random.
> 
> If you can give me a list of commits to cherry-pick then I can prepare
> a Fedora 5.10.y kernel which those added for the group of Fedora users
> who are hitting this to test.

e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

are missing from v5.10.15
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-11 10:49               ` Chris Wilson
@ 2021-02-11 12:26                 ` Hans de Goede
  -1 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-11 12:26 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/11/21 11:49 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-11 10:36:13)
>> Hi,
>>
>> On 2/10/21 1:48 PM, Chris Wilson wrote:
>>> Quoting Hans de Goede (2021-02-10 10:37:19)
>>>> Hi,
>>>>
>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
>>>>>> Hi,
>>>>>>
>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>>>>>
>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>>>>>> did not report any differences with and without mitigations. I have yet
>>>>>>> to test other platforms. So I don't yet have an alternative.
>>>>>>
>>>>>> Note the original / first reporter of:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>>>>>
>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
>>>>>> a regression on devices where the WA worked fine before...
>>>>>
>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
>>>>> cases the ACTHD has been in the first page. This strongly suggests that
>>>>> the w/a is scribbling over address 0. And there's then a very good
>>>>> chance that
>>>>>
>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
>>>>>
>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
>>>>>     
>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
>>>>>
>>>>> in v5.10.14 is sufficient to hide the issue.
>>>>
>>>> That one actually is already in v5.10.13 and the various reportes of these
>>>> issues have already tested 5.10.13. They did mention that it took longer
>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
>>>>
>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
>>>
>>> Started looking for scratch page overwrites, and found this little gem:
>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
>>>
>>> Looks promising wrt the cause of overwriting random addresses -- and
>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
>>> with gnome shell, piglit is happy, but I suspect it is all due to
>>> placement and so will only occur at random.
>>
>> If you can give me a list of commits to cherry-pick then I can prepare
>> a Fedora 5.10.y kernel which those added for the group of Fedora users
>> who are hitting this to test.
> 
> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

Thanks, the test-kernel is building now. I will let you know when I have
heard back from the Fedora users (this will likely take 1-2 days).

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-11 12:26                 ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-11 12:26 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/11/21 11:49 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-11 10:36:13)
>> Hi,
>>
>> On 2/10/21 1:48 PM, Chris Wilson wrote:
>>> Quoting Hans de Goede (2021-02-10 10:37:19)
>>>> Hi,
>>>>
>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
>>>>>> Hi,
>>>>>>
>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>>>>>
>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>>>>>> did not report any differences with and without mitigations. I have yet
>>>>>>> to test other platforms. So I don't yet have an alternative.
>>>>>>
>>>>>> Note the original / first reporter of:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>>>>>
>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
>>>>>> a regression on devices where the WA worked fine before...
>>>>>
>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
>>>>> cases the ACTHD has been in the first page. This strongly suggests that
>>>>> the w/a is scribbling over address 0. And there's then a very good
>>>>> chance that
>>>>>
>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
>>>>>
>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
>>>>>     
>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
>>>>>
>>>>> in v5.10.14 is sufficient to hide the issue.
>>>>
>>>> That one actually is already in v5.10.13 and the various reportes of these
>>>> issues have already tested 5.10.13. They did mention that it took longer
>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
>>>>
>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
>>>
>>> Started looking for scratch page overwrites, and found this little gem:
>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
>>>
>>> Looks promising wrt the cause of overwriting random addresses -- and
>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
>>> with gnome shell, piglit is happy, but I suspect it is all due to
>>> placement and so will only occur at random.
>>
>> If you can give me a list of commits to cherry-pick then I can prepare
>> a Fedora 5.10.y kernel which those added for the group of Fedora users
>> who are hitting this to test.
> 
> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

Thanks, the test-kernel is building now. I will let you know when I have
heard back from the Fedora users (this will likely take 1-2 days).

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-11 12:26                 ` Hans de Goede
@ 2021-02-14 16:00                   ` Hans de Goede
  -1 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-14 16:00 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/11/21 1:26 PM, Hans de Goede wrote:
> Hi,
> 
> On 2/11/21 11:49 AM, Chris Wilson wrote:
>> Quoting Hans de Goede (2021-02-11 10:36:13)
>>> Hi,
>>>
>>> On 2/10/21 1:48 PM, Chris Wilson wrote:
>>>> Quoting Hans de Goede (2021-02-10 10:37:19)
>>>>> Hi,
>>>>>
>>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
>>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>>>>>>
>>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>>>>>>> did not report any differences with and without mitigations. I have yet
>>>>>>>> to test other platforms. So I don't yet have an alternative.
>>>>>>>
>>>>>>> Note the original / first reporter of:
>>>>>>>
>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>>>>>>
>>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
>>>>>>> a regression on devices where the WA worked fine before...
>>>>>>
>>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
>>>>>> cases the ACTHD has been in the first page. This strongly suggests that
>>>>>> the w/a is scribbling over address 0. And there's then a very good
>>>>>> chance that
>>>>>>
>>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
>>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
>>>>>>
>>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
>>>>>>     
>>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
>>>>>>
>>>>>> in v5.10.14 is sufficient to hide the issue.
>>>>>
>>>>> That one actually is already in v5.10.13 and the various reportes of these
>>>>> issues have already tested 5.10.13. They did mention that it took longer
>>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
>>>>>
>>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
>>>>
>>>> Started looking for scratch page overwrites, and found this little gem:
>>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
>>>>
>>>> Looks promising wrt the cause of overwriting random addresses -- and
>>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
>>>> with gnome shell, piglit is happy, but I suspect it is all due to
>>>> placement and so will only occur at random.
>>>
>>> If you can give me a list of commits to cherry-pick then I can prepare
>>> a Fedora 5.10.y kernel which those added for the group of Fedora users
>>> who are hitting this to test.
>>
>> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
>> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
>> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")
> 
> Thanks, the test-kernel is building now. I will let you know when I have
> heard back from the Fedora users (this will likely take 1-2 days).

I've heard back from 2 of the reporters who were seeing issues with 5.10.9+

And I'm happy to report 5.10.15 + the 3 commits mentioned above cherry-picked
on top fixes the graphics glitches for them.

So if we can get these 3 commits into 5.10.y and 5.11.y then this should be
resolved.

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-14 16:00                   ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-14 16:00 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/11/21 1:26 PM, Hans de Goede wrote:
> Hi,
> 
> On 2/11/21 11:49 AM, Chris Wilson wrote:
>> Quoting Hans de Goede (2021-02-11 10:36:13)
>>> Hi,
>>>
>>> On 2/10/21 1:48 PM, Chris Wilson wrote:
>>>> Quoting Hans de Goede (2021-02-10 10:37:19)
>>>>> Hi,
>>>>>
>>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
>>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>>>>>>
>>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>>>>>>> did not report any differences with and without mitigations. I have yet
>>>>>>>> to test other platforms. So I don't yet have an alternative.
>>>>>>>
>>>>>>> Note the original / first reporter of:
>>>>>>>
>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>>>>>>
>>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
>>>>>>> a regression on devices where the WA worked fine before...
>>>>>>
>>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
>>>>>> cases the ACTHD has been in the first page. This strongly suggests that
>>>>>> the w/a is scribbling over address 0. And there's then a very good
>>>>>> chance that
>>>>>>
>>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
>>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
>>>>>>
>>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
>>>>>>     
>>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
>>>>>>
>>>>>> in v5.10.14 is sufficient to hide the issue.
>>>>>
>>>>> That one actually is already in v5.10.13 and the various reportes of these
>>>>> issues have already tested 5.10.13. They did mention that it took longer
>>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
>>>>>
>>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
>>>>
>>>> Started looking for scratch page overwrites, and found this little gem:
>>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
>>>>
>>>> Looks promising wrt the cause of overwriting random addresses -- and
>>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
>>>> with gnome shell, piglit is happy, but I suspect it is all due to
>>>> placement and so will only occur at random.
>>>
>>> If you can give me a list of commits to cherry-pick then I can prepare
>>> a Fedora 5.10.y kernel which those added for the group of Fedora users
>>> who are hitting this to test.
>>
>> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
>> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
>> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")
> 
> Thanks, the test-kernel is building now. I will let you know when I have
> heard back from the Fedora users (this will likely take 1-2 days).

I've heard back from 2 of the reporters who were seeing issues with 5.10.9+

And I'm happy to report 5.10.15 + the 3 commits mentioned above cherry-picked
on top fixes the graphics glitches for them.

So if we can get these 3 commits into 5.10.y and 5.11.y then this should be
resolved.

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-14 16:00                   ` Hans de Goede
@ 2021-02-15 14:26                     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-02-15 14:26 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Chris Wilson, intel-gfx, stable

On Sun, Feb 14, 2021 at 05:00:44PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 2/11/21 1:26 PM, Hans de Goede wrote:
> > Hi,
> > 
> > On 2/11/21 11:49 AM, Chris Wilson wrote:
> >> Quoting Hans de Goede (2021-02-11 10:36:13)
> >>> Hi,
> >>>
> >>> On 2/10/21 1:48 PM, Chris Wilson wrote:
> >>>> Quoting Hans de Goede (2021-02-10 10:37:19)
> >>>>> Hi,
> >>>>>
> >>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
> >>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>>>>>>> Hi All,
> >>>>>>>>>
> >>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> >>>>>>>>
> >>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>>>>>>> did not report any differences with and without mitigations. I have yet
> >>>>>>>> to test other platforms. So I don't yet have an alternative.
> >>>>>>>
> >>>>>>> Note the original / first reporter of:
> >>>>>>>
> >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>>>>>>
> >>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
> >>>>>>> a regression on devices where the WA worked fine before...
> >>>>>>
> >>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> >>>>>> cases the ACTHD has been in the first page. This strongly suggests that
> >>>>>> the w/a is scribbling over address 0. And there's then a very good
> >>>>>> chance that
> >>>>>>
> >>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> >>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
> >>>>>>
> >>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
> >>>>>>     
> >>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> >>>>>>
> >>>>>> in v5.10.14 is sufficient to hide the issue.
> >>>>>
> >>>>> That one actually is already in v5.10.13 and the various reportes of these
> >>>>> issues have already tested 5.10.13. They did mention that it took longer
> >>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> >>>>>
> >>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> >>>>
> >>>> Started looking for scratch page overwrites, and found this little gem:
> >>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
> >>>>
> >>>> Looks promising wrt the cause of overwriting random addresses -- and
> >>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> >>>> with gnome shell, piglit is happy, but I suspect it is all due to
> >>>> placement and so will only occur at random.
> >>>
> >>> If you can give me a list of commits to cherry-pick then I can prepare
> >>> a Fedora 5.10.y kernel which those added for the group of Fedora users
> >>> who are hitting this to test.
> >>
> >> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
> >> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
> >> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")
> > 
> > Thanks, the test-kernel is building now. I will let you know when I have
> > heard back from the Fedora users (this will likely take 1-2 days).
> 
> I've heard back from 2 of the reporters who were seeing issues with 5.10.9+
> 
> And I'm happy to report 5.10.15 + the 3 commits mentioned above cherry-picked
> on top fixes the graphics glitches for them.
> 
> So if we can get these 3 commits into 5.10.y and 5.11.y then this should be
> resolved.

Great!

Hopefully these will show up in Linus's tree soon...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-15 14:26                     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-02-15 14:26 UTC (permalink / raw)
  To: Hans de Goede; +Cc: intel-gfx, stable, Chris Wilson

On Sun, Feb 14, 2021 at 05:00:44PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 2/11/21 1:26 PM, Hans de Goede wrote:
> > Hi,
> > 
> > On 2/11/21 11:49 AM, Chris Wilson wrote:
> >> Quoting Hans de Goede (2021-02-11 10:36:13)
> >>> Hi,
> >>>
> >>> On 2/10/21 1:48 PM, Chris Wilson wrote:
> >>>> Quoting Hans de Goede (2021-02-10 10:37:19)
> >>>>> Hi,
> >>>>>
> >>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
> >>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>>>>>>> Hi All,
> >>>>>>>>>
> >>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> >>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> >>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> >>>>>>>>
> >>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>>>>>>> did not report any differences with and without mitigations. I have yet
> >>>>>>>> to test other platforms. So I don't yet have an alternative.
> >>>>>>>
> >>>>>>> Note the original / first reporter of:
> >>>>>>>
> >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>>>>>>
> >>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
> >>>>>>> a regression on devices where the WA worked fine before...
> >>>>>>
> >>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> >>>>>> cases the ACTHD has been in the first page. This strongly suggests that
> >>>>>> the w/a is scribbling over address 0. And there's then a very good
> >>>>>> chance that
> >>>>>>
> >>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> >>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
> >>>>>>
> >>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
> >>>>>>     
> >>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> >>>>>>
> >>>>>> in v5.10.14 is sufficient to hide the issue.
> >>>>>
> >>>>> That one actually is already in v5.10.13 and the various reportes of these
> >>>>> issues have already tested 5.10.13. They did mention that it took longer
> >>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> >>>>>
> >>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> >>>>
> >>>> Started looking for scratch page overwrites, and found this little gem:
> >>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
> >>>>
> >>>> Looks promising wrt the cause of overwriting random addresses -- and
> >>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> >>>> with gnome shell, piglit is happy, but I suspect it is all due to
> >>>> placement and so will only occur at random.
> >>>
> >>> If you can give me a list of commits to cherry-pick then I can prepare
> >>> a Fedora 5.10.y kernel which those added for the group of Fedora users
> >>> who are hitting this to test.
> >>
> >> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
> >> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
> >> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")
> > 
> > Thanks, the test-kernel is building now. I will let you know when I have
> > heard back from the Fedora users (this will likely take 1-2 days).
> 
> I've heard back from 2 of the reporters who were seeing issues with 5.10.9+
> 
> And I'm happy to report 5.10.15 + the 3 commits mentioned above cherry-picked
> on top fixes the graphics glitches for them.
> 
> So if we can get these 3 commits into 5.10.y and 5.11.y then this should be
> resolved.

Great!

Hopefully these will show up in Linus's tree soon...
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-14 16:00                   ` Hans de Goede
@ 2021-02-18 14:04                     ` Hans de Goede
  -1 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-18 14:04 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/14/21 5:00 PM, Hans de Goede wrote:
> Hi,
> 
> On 2/11/21 1:26 PM, Hans de Goede wrote:
>> Hi,
>>
>> On 2/11/21 11:49 AM, Chris Wilson wrote:

<snip>

>>>>> Started looking for scratch page overwrites, and found this little gem:
>>>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
>>>>>
>>>>> Looks promising wrt the cause of overwriting random addresses -- and
>>>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
>>>>> with gnome shell, piglit is happy, but I suspect it is all due to
>>>>> placement and so will only occur at random.
>>>>
>>>> If you can give me a list of commits to cherry-pick then I can prepare
>>>> a Fedora 5.10.y kernel which those added for the group of Fedora users
>>>> who are hitting this to test.
>>>
>>> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
>>> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
>>> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")
>>
>> Thanks, the test-kernel is building now. I will let you know when I have
>> heard back from the Fedora users (this will likely take 1-2 days).
> 
> I've heard back from 2 of the reporters who were seeing issues with 5.10.9+
> 
> And I'm happy to report 5.10.15 + the 3 commits mentioned above cherry-picked
> on top fixes the graphics glitches for them.
> 
> So if we can get these 3 commits into 5.10.y and 5.11.y then this should be
> resolved.

Unfortunately I just got a report that 5.10.15 + the 3 extra fixes mentioned
above is still causing issues for one user with a
"thinkpad x230 with i5-3320M (HD Graphics 4000)"

The user descibes the problem as: "still have some minor black squares popping
up while scrolling on Firefox."

I've asked this user to test 5.10.14 + the 3 reverts mentioned earlier in the
thread and that kernel does not have this issue.

Chris, any ideas / more fixes to cherry pick for testing ?

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-18 14:04                     ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-18 14:04 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi,

On 2/14/21 5:00 PM, Hans de Goede wrote:
> Hi,
> 
> On 2/11/21 1:26 PM, Hans de Goede wrote:
>> Hi,
>>
>> On 2/11/21 11:49 AM, Chris Wilson wrote:

<snip>

>>>>> Started looking for scratch page overwrites, and found this little gem:
>>>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
>>>>>
>>>>> Looks promising wrt the cause of overwriting random addresses -- and
>>>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
>>>>> with gnome shell, piglit is happy, but I suspect it is all due to
>>>>> placement and so will only occur at random.
>>>>
>>>> If you can give me a list of commits to cherry-pick then I can prepare
>>>> a Fedora 5.10.y kernel which those added for the group of Fedora users
>>>> who are hitting this to test.
>>>
>>> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
>>> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
>>> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")
>>
>> Thanks, the test-kernel is building now. I will let you know when I have
>> heard back from the Fedora users (this will likely take 1-2 days).
> 
> I've heard back from 2 of the reporters who were seeing issues with 5.10.9+
> 
> And I'm happy to report 5.10.15 + the 3 commits mentioned above cherry-picked
> on top fixes the graphics glitches for them.
> 
> So if we can get these 3 commits into 5.10.y and 5.11.y then this should be
> resolved.

Unfortunately I just got a report that 5.10.15 + the 3 extra fixes mentioned
above is still causing issues for one user with a
"thinkpad x230 with i5-3320M (HD Graphics 4000)"

The user descibes the problem as: "still have some minor black squares popping
up while scrolling on Firefox."

I've asked this user to test 5.10.14 + the 3 reverts mentioned earlier in the
thread and that kernel does not have this issue.

Chris, any ideas / more fixes to cherry pick for testing ?

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-11 10:49               ` Chris Wilson
@ 2021-02-25 11:52                 ` Hans de Goede
  -1 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-25 11:52 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi Chris,

On 2/11/21 11:49 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-11 10:36:13)
>> Hi,
>>
>> On 2/10/21 1:48 PM, Chris Wilson wrote:
>>> Quoting Hans de Goede (2021-02-10 10:37:19)
>>>> Hi,
>>>>
>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
>>>>>> Hi,
>>>>>>
>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>>>>>
>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>>>>>> did not report any differences with and without mitigations. I have yet
>>>>>>> to test other platforms. So I don't yet have an alternative.
>>>>>>
>>>>>> Note the original / first reporter of:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>>>>>
>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
>>>>>> a regression on devices where the WA worked fine before...
>>>>>
>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
>>>>> cases the ACTHD has been in the first page. This strongly suggests that
>>>>> the w/a is scribbling over address 0. And there's then a very good
>>>>> chance that
>>>>>
>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
>>>>>
>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
>>>>>     
>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
>>>>>
>>>>> in v5.10.14 is sufficient to hide the issue.
>>>>
>>>> That one actually is already in v5.10.13 and the various reportes of these
>>>> issues have already tested 5.10.13. They did mention that it took longer
>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
>>>>
>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
>>>
>>> Started looking for scratch page overwrites, and found this little gem:
>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
>>>
>>> Looks promising wrt the cause of overwriting random addresses -- and
>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
>>> with gnome shell, piglit is happy, but I suspect it is all due to
>>> placement and so will only occur at random.
>>
>> If you can give me a list of commits to cherry-pick then I can prepare
>> a Fedora 5.10.y kernel which those added for the group of Fedora users
>> who are hitting this to test.
> 
> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

The bug reports for this keep coming in here is the full lists of bugs which I'm
aware of which all have this as root cause (the ones with out links are all
bugzilla.redhat.com bugs):

   1843274 - i915 GPU Hang with kernel 5.7 on Haswell (Acer C720P Chromebook)
   1922511 - Recent upgrades caused smearing/tearing
   1925346 - Screen glitches after updating to Kernel 5.10.10
   1925903 - Flickering UI elements, screen instability (Wayland)
   1931065 - Frequent i915 hangs
   https://gitlab.freedesktop.org/drm/intel/-/issues/3099

Testing by various reporters shows that this appears to be fully resolved for all
reporters except one by the quoted 3 commits from -next above.

For the one reporter who is still seeing some rendering glitches things are
much improved, so I think they are also hitting a different issue.

I wanted to send cherry-picks of the 3 quoted commits to stable@vger, but
2 of the 3 are not in Linus' master yet; and I'm also not seeing these
in any drm -fixes branches yet :(

Chris, can you please get d30bbd62b1bf and 1914911f4aa0 on their way to Linus?

Regards,

Hans

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-02-25 11:52                 ` Hans de Goede
  0 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2021-02-25 11:52 UTC (permalink / raw)
  To: Chris Wilson, Greg Kroah-Hartman, intel-gfx, stable

Hi Chris,

On 2/11/21 11:49 AM, Chris Wilson wrote:
> Quoting Hans de Goede (2021-02-11 10:36:13)
>> Hi,
>>
>> On 2/10/21 1:48 PM, Chris Wilson wrote:
>>> Quoting Hans de Goede (2021-02-10 10:37:19)
>>>> Hi,
>>>>
>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
>>>>>> Hi,
>>>>>>
>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
>>>>>>>
>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
>>>>>>> did not report any differences with and without mitigations. I have yet
>>>>>>> to test other platforms. So I don't yet have an alternative.
>>>>>>
>>>>>> Note the original / first reporter of:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
>>>>>>
>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
>>>>>> a regression on devices where the WA worked fine before...
>>>>>
>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
>>>>> cases the ACTHD has been in the first page. This strongly suggests that
>>>>> the w/a is scribbling over address 0. And there's then a very good
>>>>> chance that
>>>>>
>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
>>>>>
>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
>>>>>     
>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
>>>>>
>>>>> in v5.10.14 is sufficient to hide the issue.
>>>>
>>>> That one actually is already in v5.10.13 and the various reportes of these
>>>> issues have already tested 5.10.13. They did mention that it took longer
>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
>>>>
>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
>>>
>>> Started looking for scratch page overwrites, and found this little gem:
>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
>>>
>>> Looks promising wrt the cause of overwriting random addresses -- and
>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
>>> with gnome shell, piglit is happy, but I suspect it is all due to
>>> placement and so will only occur at random.
>>
>> If you can give me a list of commits to cherry-pick then I can prepare
>> a Fedora 5.10.y kernel which those added for the group of Fedora users
>> who are hitting this to test.
> 
> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

The bug reports for this keep coming in here is the full lists of bugs which I'm
aware of which all have this as root cause (the ones with out links are all
bugzilla.redhat.com bugs):

   1843274 - i915 GPU Hang with kernel 5.7 on Haswell (Acer C720P Chromebook)
   1922511 - Recent upgrades caused smearing/tearing
   1925346 - Screen glitches after updating to Kernel 5.10.10
   1925903 - Flickering UI elements, screen instability (Wayland)
   1931065 - Frequent i915 hangs
   https://gitlab.freedesktop.org/drm/intel/-/issues/3099

Testing by various reporters shows that this appears to be fully resolved for all
reporters except one by the quoted 3 commits from -next above.

For the one reporter who is still seeing some rendering glitches things are
much improved, so I think they are also hitting a different issue.

I wanted to send cherry-picks of the 3 quoted commits to stable@vger, but
2 of the 3 are not in Linus' master yet; and I'm also not seeing these
in any drm -fixes branches yet :(

Chris, can you please get d30bbd62b1bf and 1914911f4aa0 on their way to Linus?

Regards,

Hans
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
  2021-02-15 14:26                     ` Greg Kroah-Hartman
@ 2021-03-01 14:10                       ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-01 14:10 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Chris Wilson, intel-gfx, stable

On Mon, Feb 15, 2021 at 03:26:59PM +0100, Greg Kroah-Hartman wrote:
> On Sun, Feb 14, 2021 at 05:00:44PM +0100, Hans de Goede wrote:
> > Hi,
> > 
> > On 2/11/21 1:26 PM, Hans de Goede wrote:
> > > Hi,
> > > 
> > > On 2/11/21 11:49 AM, Chris Wilson wrote:
> > >> Quoting Hans de Goede (2021-02-11 10:36:13)
> > >>> Hi,
> > >>>
> > >>> On 2/10/21 1:48 PM, Chris Wilson wrote:
> > >>>> Quoting Hans de Goede (2021-02-10 10:37:19)
> > >>>>> Hi,
> > >>>>>
> > >>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
> > >>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
> > >>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
> > >>>>>>>>> Hi All,
> > >>>>>>>>>
> > >>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> > >>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> > >>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> > >>>>>>>>
> > >>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> > >>>>>>>> did not report any differences with and without mitigations. I have yet
> > >>>>>>>> to test other platforms. So I don't yet have an alternative.
> > >>>>>>>
> > >>>>>>> Note the original / first reporter of:
> > >>>>>>>
> > >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> > >>>>>>>
> > >>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> > >>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
> > >>>>>>> a regression on devices where the WA worked fine before...
> > >>>>>>
> > >>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> > >>>>>> cases the ACTHD has been in the first page. This strongly suggests that
> > >>>>>> the w/a is scribbling over address 0. And there's then a very good
> > >>>>>> chance that
> > >>>>>>
> > >>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> > >>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
> > >>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
> > >>>>>>
> > >>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
> > >>>>>>     
> > >>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> > >>>>>>
> > >>>>>> in v5.10.14 is sufficient to hide the issue.
> > >>>>>
> > >>>>> That one actually is already in v5.10.13 and the various reportes of these
> > >>>>> issues have already tested 5.10.13. They did mention that it took longer
> > >>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> > >>>>>
> > >>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> > >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> > >>>>
> > >>>> Started looking for scratch page overwrites, and found this little gem:
> > >>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
> > >>>>
> > >>>> Looks promising wrt the cause of overwriting random addresses -- and
> > >>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> > >>>> with gnome shell, piglit is happy, but I suspect it is all due to
> > >>>> placement and so will only occur at random.
> > >>>
> > >>> If you can give me a list of commits to cherry-pick then I can prepare
> > >>> a Fedora 5.10.y kernel which those added for the group of Fedora users
> > >>> who are hitting this to test.
> > >>
> > >> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
> > >> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
> > >> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")
> > > 
> > > Thanks, the test-kernel is building now. I will let you know when I have
> > > heard back from the Fedora users (this will likely take 1-2 days).
> > 
> > I've heard back from 2 of the reporters who were seeing issues with 5.10.9+
> > 
> > And I'm happy to report 5.10.15 + the 3 commits mentioned above cherry-picked
> > on top fixes the graphics glitches for them.
> > 
> > So if we can get these 3 commits into 5.10.y and 5.11.y then this should be
> > resolved.
> 
> Great!
> 
> Hopefully these will show up in Linus's tree soon...

I think I have the needed 3 commits now.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
@ 2021-03-01 14:10                       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-01 14:10 UTC (permalink / raw)
  To: Hans de Goede; +Cc: intel-gfx, stable, Chris Wilson

On Mon, Feb 15, 2021 at 03:26:59PM +0100, Greg Kroah-Hartman wrote:
> On Sun, Feb 14, 2021 at 05:00:44PM +0100, Hans de Goede wrote:
> > Hi,
> > 
> > On 2/11/21 1:26 PM, Hans de Goede wrote:
> > > Hi,
> > > 
> > > On 2/11/21 11:49 AM, Chris Wilson wrote:
> > >> Quoting Hans de Goede (2021-02-11 10:36:13)
> > >>> Hi,
> > >>>
> > >>> On 2/10/21 1:48 PM, Chris Wilson wrote:
> > >>>> Quoting Hans de Goede (2021-02-10 10:37:19)
> > >>>>> Hi,
> > >>>>>
> > >>>>> On 2/10/21 12:07 AM, Chris Wilson wrote:
> > >>>>>> Quoting Hans de Goede (2021-02-09 11:46:46)
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
> > >>>>>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
> > >>>>>>>>> Hi All,
> > >>>>>>>>>
> > >>>>>>>>> We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
> > >>>>>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
> > >>>>>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
> > >>>>>>>>
> > >>>>>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> > >>>>>>>> did not report any differences with and without mitigations. I have yet
> > >>>>>>>> to test other platforms. So I don't yet have an alternative.
> > >>>>>>>
> > >>>>>>> Note the original / first reporter of:
> > >>>>>>>
> > >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> > >>>>>>>
> > >>>>>>> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> > >>>>>>> the mitigations on ivy-bridge / bay-trail but that there actually is
> > >>>>>>> a regression on devices where the WA worked fine before...
> > >>>>>>
> > >>>>>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> > >>>>>> cases the ACTHD has been in the first page. This strongly suggests that
> > >>>>>> the w/a is scribbling over address 0. And there's then a very good
> > >>>>>> chance that
> > >>>>>>
> > >>>>>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> > >>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
> > >>>>>> Date:   Mon Jan 25 12:50:33 2021 +0000
> > >>>>>>
> > >>>>>>     drm/i915/gt: Always try to reserve GGTT address 0x0
> > >>>>>>     
> > >>>>>>     commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> > >>>>>>
> > >>>>>> in v5.10.14 is sufficient to hide the issue.
> > >>>>>
> > >>>>> That one actually is already in v5.10.13 and the various reportes of these
> > >>>>> issues have already tested 5.10.13. They did mention that it took longer
> > >>>>> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> > >>>>>
> > >>>>> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> > >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> > >>>>
> > >>>> Started looking for scratch page overwrites, and found this little gem:
> > >>>> https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1
> > >>>>
> > >>>> Looks promising wrt the cause of overwriting random addresses -- and
> > >>>> I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> > >>>> with gnome shell, piglit is happy, but I suspect it is all due to
> > >>>> placement and so will only occur at random.
> > >>>
> > >>> If you can give me a list of commits to cherry-pick then I can prepare
> > >>> a Fedora 5.10.y kernel which those added for the group of Fedora users
> > >>> who are hitting this to test.
> > >>
> > >> e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
> > >> d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
> > >> 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")
> > > 
> > > Thanks, the test-kernel is building now. I will let you know when I have
> > > heard back from the Fedora users (this will likely take 1-2 days).
> > 
> > I've heard back from 2 of the reporters who were seeing issues with 5.10.9+
> > 
> > And I'm happy to report 5.10.15 + the 3 commits mentioned above cherry-picked
> > on top fixes the graphics glitches for them.
> > 
> > So if we can get these 3 commits into 5.10.y and 5.11.y then this should be
> > resolved.
> 
> Great!
> 
> Hopefully these will show up in Linus's tree soon...

I think I have the needed 3 commits now.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-03-01 14:12 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-08 20:38 [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues Hans de Goede
2021-02-08 20:38 ` [Intel-gfx] " Hans de Goede
2021-02-08 23:27 ` Chris Wilson
2021-02-08 23:27   ` Chris Wilson
2021-02-09 11:46   ` Hans de Goede
2021-02-09 11:46     ` Hans de Goede
2021-02-09 11:55     ` Chris Wilson
2021-02-09 11:55       ` Chris Wilson
2021-02-09 23:07     ` Chris Wilson
2021-02-09 23:07       ` Chris Wilson
2021-02-10 10:37       ` Hans de Goede
2021-02-10 10:37         ` Hans de Goede
2021-02-10 12:48         ` Chris Wilson
2021-02-10 12:48           ` Chris Wilson
2021-02-11 10:36           ` Hans de Goede
2021-02-11 10:36             ` Hans de Goede
2021-02-11 10:49             ` Chris Wilson
2021-02-11 10:49               ` Chris Wilson
2021-02-11 12:26               ` Hans de Goede
2021-02-11 12:26                 ` Hans de Goede
2021-02-14 16:00                 ` Hans de Goede
2021-02-14 16:00                   ` Hans de Goede
2021-02-15 14:26                   ` Greg Kroah-Hartman
2021-02-15 14:26                     ` Greg Kroah-Hartman
2021-03-01 14:10                     ` Greg Kroah-Hartman
2021-03-01 14:10                       ` Greg Kroah-Hartman
2021-02-18 14:04                   ` Hans de Goede
2021-02-18 14:04                     ` Hans de Goede
2021-02-25 11:52               ` Hans de Goede
2021-02-25 11:52                 ` Hans de Goede
2021-02-11  0:00         ` Chris Wilson
2021-02-11  0:00           ` Chris Wilson
2021-02-09 16:43   ` Hans de Goede
2021-02-09 16:43     ` Hans de Goede

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.