linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.13 i915/PAT regression on Brasswell, adding nopat to the kernel commandline worksaround this
@ 2021-05-12  9:57 Hans de Goede
  2021-05-12 11:15 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Hans de Goede @ 2021-05-12  9:57 UTC (permalink / raw)
  To: intel-gfx; +Cc: Linux Kernel Mailing List, x86

Hi All,

I'm not sure if this is a i915 bug, or caused by changes elsewhere in the kernel,
so I thought it would be best to just send out an email and then see from there.

With 5.13-rc1 gdm fails to show and dmesg contains:

[   38.504613] x86/PAT: Xwayland:683 map pfn RAM range req write-combining for [mem 0x23883000-0x23883fff], got write-back
<repeated lots of times for different ranges>
[   39.484766] x86/PAT: gnome-shell:632 map pfn RAM range req write-combining for [mem 0x1c6a3000-0x1c6a3fff], got write-back
<repeated lots of times for different ranges>
[   54.314858] Asynchronous wait on fence 0000:00:02.0:gnome-shell[632]:a timed out (hint:intel_cursor_plane_create [i915])
[   58.339769] i915 0000:00:02.0: [drm] GPU HANG: ecode 8:1:86dfdffb, in gnome-shell [632]
[   58.341161] i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
[   58.341267] i915 0000:00:02.0: [drm] gnome-shell[632] context reset due to GPU hang

Because of the PAT errors I tried adding "nopat" to the kernel commandline
and I'm happy to report that that works around this.

Any hints on how to debug this further (without doing a full git bisect) would be
appreciated.

Regards,

Hans


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5.13 i915/PAT regression on Brasswell, adding nopat to the kernel commandline worksaround this
  2021-05-12  9:57 5.13 i915/PAT regression on Brasswell, adding nopat to the kernel commandline worksaround this Hans de Goede
@ 2021-05-12 11:15 ` Peter Zijlstra
  2021-05-12 11:57   ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2021-05-12 11:15 UTC (permalink / raw)
  To: Hans de Goede; +Cc: intel-gfx, Linux Kernel Mailing List, x86, hch

On Wed, May 12, 2021 at 11:57:02AM +0200, Hans de Goede wrote:
> Hi All,
> 
> I'm not sure if this is a i915 bug, or caused by changes elsewhere in the kernel,
> so I thought it would be best to just send out an email and then see from there.
> 
> With 5.13-rc1 gdm fails to show and dmesg contains:
> 
> [   38.504613] x86/PAT: Xwayland:683 map pfn RAM range req write-combining for [mem 0x23883000-0x23883fff], got write-back
> <repeated lots of times for different ranges>
> [   39.484766] x86/PAT: gnome-shell:632 map pfn RAM range req write-combining for [mem 0x1c6a3000-0x1c6a3fff], got write-back
> <repeated lots of times for different ranges>
> [   54.314858] Asynchronous wait on fence 0000:00:02.0:gnome-shell[632]:a timed out (hint:intel_cursor_plane_create [i915])
> [   58.339769] i915 0000:00:02.0: [drm] GPU HANG: ecode 8:1:86dfdffb, in gnome-shell [632]
> [   58.341161] i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
> [   58.341267] i915 0000:00:02.0: [drm] gnome-shell[632] context reset due to GPU hang
> 
> Because of the PAT errors I tried adding "nopat" to the kernel commandline
> and I'm happy to report that that works around this.
> 
> Any hints on how to debug this further (without doing a full git bisect) would be
> appreciated.

IIRC it's because of 74ffa5a3e685 ("mm: add remap_pfn_range_notrack"),
which added a sanity check to make sure expectations were met. It turns
out they were not.

The bug is not new, the warning is. AFAIK the i915 team is aware, but
other than that I've not followed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5.13 i915/PAT regression on Brasswell, adding nopat to the kernel commandline worksaround this
  2021-05-12 11:15 ` Peter Zijlstra
@ 2021-05-12 11:57   ` Christoph Hellwig
  2021-05-12 12:36     ` Hans de Goede
  2021-05-18  9:28     ` [Intel-gfx] " Jani Nikula
  0 siblings, 2 replies; 5+ messages in thread
From: Christoph Hellwig @ 2021-05-12 11:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hans de Goede, intel-gfx, Linux Kernel Mailing List, x86, hch

On Wed, May 12, 2021 at 01:15:03PM +0200, Peter Zijlstra wrote:
> IIRC it's because of 74ffa5a3e685 ("mm: add remap_pfn_range_notrack"),
> which added a sanity check to make sure expectations were met. It turns
> out they were not.
> 
> The bug is not new, the warning is. AFAIK the i915 team is aware, but
> other than that I've not followed.


The actual culprit is b12d691ea5e0 ("i915: fix remap_io_sg to verify the
pgprot"), but otherwise agreed.  Someone the i915 maintainers all seem
to be on vacation as the previous report did not manage to trigger any
kind of reply.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5.13 i915/PAT regression on Brasswell, adding nopat to the kernel commandline worksaround this
  2021-05-12 11:57   ` Christoph Hellwig
@ 2021-05-12 12:36     ` Hans de Goede
  2021-05-18  9:28     ` [Intel-gfx] " Jani Nikula
  1 sibling, 0 replies; 5+ messages in thread
From: Hans de Goede @ 2021-05-12 12:36 UTC (permalink / raw)
  To: Christoph Hellwig, Peter Zijlstra
  Cc: intel-gfx, Linux Kernel Mailing List, x86

Hi,

On 5/12/21 1:57 PM, Christoph Hellwig wrote:
> On Wed, May 12, 2021 at 01:15:03PM +0200, Peter Zijlstra wrote:
>> IIRC it's because of 74ffa5a3e685 ("mm: add remap_pfn_range_notrack"),
>> which added a sanity check to make sure expectations were met. It turns
>> out they were not.
>>
>> The bug is not new, the warning is. AFAIK the i915 team is aware, but
>> other than that I've not followed.
> 
> 
> The actual culprit is b12d691ea5e0 ("i915: fix remap_io_sg to verify the
> pgprot"), but otherwise agreed.  Someone the i915 maintainers all seem
> to be on vacation as the previous report did not manage to trigger any
> kind of reply.

I can confirm that reverting that commit restores i915 functionality with
5.13-rc1 on the Braswell machine on which I have been testing this.

Regards,

Hans


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] 5.13 i915/PAT regression on Brasswell, adding nopat to the kernel commandline worksaround this
  2021-05-12 11:57   ` Christoph Hellwig
  2021-05-12 12:36     ` Hans de Goede
@ 2021-05-18  9:28     ` Jani Nikula
  1 sibling, 0 replies; 5+ messages in thread
From: Jani Nikula @ 2021-05-18  9:28 UTC (permalink / raw)
  To: Christoph Hellwig, Peter Zijlstra
  Cc: intel-gfx, x86, Linux Kernel Mailing List, hch

On Wed, 12 May 2021, Christoph Hellwig <hch@lst.de> wrote:
> On Wed, May 12, 2021 at 01:15:03PM +0200, Peter Zijlstra wrote:
>> IIRC it's because of 74ffa5a3e685 ("mm: add remap_pfn_range_notrack"),
>> which added a sanity check to make sure expectations were met. It turns
>> out they were not.
>> 
>> The bug is not new, the warning is. AFAIK the i915 team is aware, but
>> other than that I've not followed.
>
>
> The actual culprit is b12d691ea5e0 ("i915: fix remap_io_sg to verify the
> pgprot"), but otherwise agreed.  Someone the i915 maintainers all seem
> to be on vacation as the previous report did not manage to trigger any
> kind of reply.

We are aware. I've been rattling the cages to get more attention.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-18  9:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12  9:57 5.13 i915/PAT regression on Brasswell, adding nopat to the kernel commandline worksaround this Hans de Goede
2021-05-12 11:15 ` Peter Zijlstra
2021-05-12 11:57   ` Christoph Hellwig
2021-05-12 12:36     ` Hans de Goede
2021-05-18  9:28     ` [Intel-gfx] " Jani Nikula

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).