All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-02 16:14 ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-02 16:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: intel-gfx, dri-devel, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter, Andi Shyti,
	Andrzej Hajda, Matthew Auld, Matt Roper, Aravind Iddamsetty,
	Fei Yang, Thomas Hellström, Nathan Chancellor, Chris Wilson,
	Daniele Ceraolo Spurio

[-- Attachment #1: Type: text/plain, Size: 3497 bytes --]

Hello.

Since v6.5 kernel the following HW:

* Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
* Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)

is affected by the following crash once KDE on either X11 or Wayland is started:

i915 0000:00:02.0: enabling device (0006 -> 0007)
i915 0000:00:02.0: vgaarb: deactivate vga console
i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
[drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
fbcon: i915drmfb (fb0) is primary device
i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
…
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
BUG: unable to handle page fault for address: ffffb422c2800000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
…
Call Trace:
 <TASK>
 intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
 __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
 __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
 drm_ioctl_kernel+0xca/0x170
 drm_ioctl+0x30f/0x580
 __x64_sys_ioctl+0x94/0xd0
 do_syscall_64+0x5d/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
…
note: kwin_wayland[674] exited with irqs disabled

RIP seems to translate into this:

$ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
gen8_ggtt_insert_entries+0xc2/0x150:
writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
(inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
(inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300

Probably, recent PTE-related changes are relevant:

$ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
3532e75dfadcf drm/i915/uc: perma-pin firmwares
4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
9275277d53248 drm/i915: use pat_index instead of cache_level
5e352e32aec23 drm/i915: preparation for using PAT index
341ad0e8e2542 drm/i915/mtl: Add PTE encode function

Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.

Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.

Please help.

Thanks.

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-02 16:14 ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-02 16:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tvrtko Ursulin, Aravind Iddamsetty, Andi Shyti,
	Thomas Hellström, Matt Roper, intel-gfx, Chris Wilson,
	Nathan Chancellor, Andrzej Hajda, Daniele Ceraolo Spurio,
	dri-devel, Rodrigo Vivi, Fei Yang, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 3497 bytes --]

Hello.

Since v6.5 kernel the following HW:

* Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
* Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)

is affected by the following crash once KDE on either X11 or Wayland is started:

i915 0000:00:02.0: enabling device (0006 -> 0007)
i915 0000:00:02.0: vgaarb: deactivate vga console
i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
[drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
fbcon: i915drmfb (fb0) is primary device
i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
…
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
BUG: unable to handle page fault for address: ffffb422c2800000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
…
Call Trace:
 <TASK>
 intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
 __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
 __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
 drm_ioctl_kernel+0xca/0x170
 drm_ioctl+0x30f/0x580
 __x64_sys_ioctl+0x94/0xd0
 do_syscall_64+0x5d/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
…
note: kwin_wayland[674] exited with irqs disabled

RIP seems to translate into this:

$ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
gen8_ggtt_insert_entries+0xc2/0x150:
writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
(inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
(inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300

Probably, recent PTE-related changes are relevant:

$ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
3532e75dfadcf drm/i915/uc: perma-pin firmwares
4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
9275277d53248 drm/i915: use pat_index instead of cache_level
5e352e32aec23 drm/i915: preparation for using PAT index
341ad0e8e2542 drm/i915/mtl: Add PTE encode function

Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.

Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.

Please help.

Thanks.

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Intel-gfx] [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-02 16:14 ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-02 16:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Hellström, Matt Roper, intel-gfx, Chris Wilson,
	Nathan Chancellor, Andrzej Hajda, dri-devel, Daniel Vetter,
	Rodrigo Vivi, David Airlie, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 3497 bytes --]

Hello.

Since v6.5 kernel the following HW:

* Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
* Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)

is affected by the following crash once KDE on either X11 or Wayland is started:

i915 0000:00:02.0: enabling device (0006 -> 0007)
i915 0000:00:02.0: vgaarb: deactivate vga console
i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
[drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
fbcon: i915drmfb (fb0) is primary device
i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
…
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
BUG: unable to handle page fault for address: ffffb422c2800000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
…
Call Trace:
 <TASK>
 intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
 __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
 __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
 i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
 drm_ioctl_kernel+0xca/0x170
 drm_ioctl+0x30f/0x580
 __x64_sys_ioctl+0x94/0xd0
 do_syscall_64+0x5d/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
…
note: kwin_wayland[674] exited with irqs disabled

RIP seems to translate into this:

$ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
gen8_ggtt_insert_entries+0xc2/0x150:
writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
(inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
(inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300

Probably, recent PTE-related changes are relevant:

$ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
3532e75dfadcf drm/i915/uc: perma-pin firmwares
4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
9275277d53248 drm/i915: use pat_index instead of cache_level
5e352e32aec23 drm/i915: preparation for using PAT index
341ad0e8e2542 drm/i915/mtl: Add PTE encode function

Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.

Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.

Please help.

Thanks.

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-02 16:14 ` Oleksandr Natalenko
  (?)
@ 2023-09-19  8:26   ` Oleksandr Natalenko
  -1 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: intel-gfx, dri-devel, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter, Andi Shyti,
	Andrzej Hajda, Matthew Auld, Matt Roper, Aravind Iddamsetty,
	Fei Yang, Thomas Hellström, Nathan Chancellor, Chris Wilson,
	Daniele Ceraolo Spurio, Matthew Wilcox, Andrew Morton, linux-mm

[-- Attachment #1: Type: text/plain, Size: 4291 bytes --]

/cc Matthew Wilcox and Andrew Morton because of folios (please see below).

On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> Hello.
> 
> Since v6.5 kernel the following HW:
> 
> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> is affected by the following crash once KDE on either X11 or Wayland is started:
> 
> i915 0000:00:02.0: enabling device (0006 -> 0007)
> i915 0000:00:02.0: vgaarb: deactivate vga console
> i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> fbcon: i915drmfb (fb0) is primary device
> i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> …
> memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> BUG: unable to handle page fault for address: ffffb422c2800000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> Oops: 0002 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> …
> Call Trace:
>  <TASK>
>  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  drm_ioctl_kernel+0xca/0x170
>  drm_ioctl+0x30f/0x580
>  __x64_sys_ioctl+0x94/0xd0
>  do_syscall_64+0x5d/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> …
> note: kwin_wayland[674] exited with irqs disabled
> 
> RIP seems to translate into this:
> 
> $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> gen8_ggtt_insert_entries+0xc2/0x150:
> writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> 
> Probably, recent PTE-related changes are relevant:
> 
> $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> 9275277d53248 drm/i915: use pat_index instead of cache_level
> 5e352e32aec23 drm/i915: preparation for using PAT index
> 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> 
> Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> 
> Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> 
> Please help.
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Matthew,

Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.

Could you please check this?

Our conversation with Andrzej is available at drm-intel GitLab [1].

Thanks.

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19  8:26   ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tvrtko Ursulin, Matthew Wilcox, Aravind Iddamsetty, Andi Shyti,
	Thomas Hellström, Matt Roper, intel-gfx, Chris Wilson,
	Nathan Chancellor, linux-mm, Andrzej Hajda,
	Daniele Ceraolo Spurio, dri-devel, Rodrigo Vivi, Andrew Morton,
	Fei Yang, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 4291 bytes --]

/cc Matthew Wilcox and Andrew Morton because of folios (please see below).

On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> Hello.
> 
> Since v6.5 kernel the following HW:
> 
> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> is affected by the following crash once KDE on either X11 or Wayland is started:
> 
> i915 0000:00:02.0: enabling device (0006 -> 0007)
> i915 0000:00:02.0: vgaarb: deactivate vga console
> i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> fbcon: i915drmfb (fb0) is primary device
> i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> …
> memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> BUG: unable to handle page fault for address: ffffb422c2800000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> Oops: 0002 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> …
> Call Trace:
>  <TASK>
>  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  drm_ioctl_kernel+0xca/0x170
>  drm_ioctl+0x30f/0x580
>  __x64_sys_ioctl+0x94/0xd0
>  do_syscall_64+0x5d/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> …
> note: kwin_wayland[674] exited with irqs disabled
> 
> RIP seems to translate into this:
> 
> $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> gen8_ggtt_insert_entries+0xc2/0x150:
> writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> 
> Probably, recent PTE-related changes are relevant:
> 
> $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> 9275277d53248 drm/i915: use pat_index instead of cache_level
> 5e352e32aec23 drm/i915: preparation for using PAT index
> 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> 
> Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> 
> Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> 
> Please help.
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Matthew,

Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.

Could you please check this?

Our conversation with Andrzej is available at drm-intel GitLab [1].

Thanks.

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19  8:26   ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Matthew Wilcox, Thomas Hellström, Matt Roper, intel-gfx,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	dri-devel, Daniel Vetter, Rodrigo Vivi, Andrew Morton,
	David Airlie, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 4291 bytes --]

/cc Matthew Wilcox and Andrew Morton because of folios (please see below).

On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> Hello.
> 
> Since v6.5 kernel the following HW:
> 
> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> is affected by the following crash once KDE on either X11 or Wayland is started:
> 
> i915 0000:00:02.0: enabling device (0006 -> 0007)
> i915 0000:00:02.0: vgaarb: deactivate vga console
> i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> fbcon: i915drmfb (fb0) is primary device
> i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> …
> memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> BUG: unable to handle page fault for address: ffffb422c2800000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> Oops: 0002 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> …
> Call Trace:
>  <TASK>
>  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  drm_ioctl_kernel+0xca/0x170
>  drm_ioctl+0x30f/0x580
>  __x64_sys_ioctl+0x94/0xd0
>  do_syscall_64+0x5d/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> …
> note: kwin_wayland[674] exited with irqs disabled
> 
> RIP seems to translate into this:
> 
> $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> gen8_ggtt_insert_entries+0xc2/0x150:
> writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> 
> Probably, recent PTE-related changes are relevant:
> 
> $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> 9275277d53248 drm/i915: use pat_index instead of cache_level
> 5e352e32aec23 drm/i915: preparation for using PAT index
> 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> 
> Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> 
> Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> 
> Please help.
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Matthew,

Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.

Could you please check this?

Our conversation with Andrzej is available at drm-intel GitLab [1].

Thanks.

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19  8:26   ` Oleksandr Natalenko
  (?)
@ 2023-09-19 13:23     ` Oleksandr Natalenko
  -1 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 13:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: intel-gfx, dri-devel, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter, Andi Shyti,
	Andrzej Hajda, Matthew Auld, Matt Roper, Aravind Iddamsetty,
	Fei Yang, Thomas Hellström, Nathan Chancellor, Chris Wilson,
	Daniele Ceraolo Spurio, Matthew Wilcox, Andrew Morton, linux-mm,
	Bagas Sanjaya, Linux Regressions

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

/cc Bagas as well (see below).

On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> 
> On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > Hello.
> > 
> > Since v6.5 kernel the following HW:
> > 
> > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > 
> > is affected by the following crash once KDE on either X11 or Wayland is started:
> > 
> > i915 0000:00:02.0: enabling device (0006 -> 0007)
> > i915 0000:00:02.0: vgaarb: deactivate vga console
> > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> > i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> > [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> > fbcon: i915drmfb (fb0) is primary device
> > i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> > …
> > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > BUG: unable to handle page fault for address: ffffb422c2800000
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > Oops: 0002 [#1] PREEMPT SMP PTI
> > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > …
> > Call Trace:
> >  <TASK>
> >  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  drm_ioctl_kernel+0xca/0x170
> >  drm_ioctl+0x30f/0x580
> >  __x64_sys_ioctl+0x94/0xd0
> >  do_syscall_64+0x5d/0x90
> >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > …
> > note: kwin_wayland[674] exited with irqs disabled
> > 
> > RIP seems to translate into this:
> > 
> > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> > gen8_ggtt_insert_entries+0xc2/0x150:
> > writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > 
> > Probably, recent PTE-related changes are relevant:
> > 
> > $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > 5e352e32aec23 drm/i915: preparation for using PAT index
> > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > 
> > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> > 
> > Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> > 
> > Please help.
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Matthew,
> 
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Bagas,

would you mind adding this to the regression tracker please?

Thanks.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 13:23     ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 13:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrzej Hajda, dri-devel, Chris Wilson, linux-mm,
	Daniele Ceraolo Spurio, Bagas Sanjaya, Fei Yang,
	Linux Regressions, Matthew Wilcox, Matthew Auld, Andi Shyti,
	Thomas Hellström, intel-gfx, Nathan Chancellor,
	Aravind Iddamsetty, Rodrigo Vivi, Matt Roper, Tvrtko Ursulin,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

/cc Bagas as well (see below).

On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> 
> On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > Hello.
> > 
> > Since v6.5 kernel the following HW:
> > 
> > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > 
> > is affected by the following crash once KDE on either X11 or Wayland is started:
> > 
> > i915 0000:00:02.0: enabling device (0006 -> 0007)
> > i915 0000:00:02.0: vgaarb: deactivate vga console
> > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> > i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> > [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> > fbcon: i915drmfb (fb0) is primary device
> > i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> > …
> > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > BUG: unable to handle page fault for address: ffffb422c2800000
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > Oops: 0002 [#1] PREEMPT SMP PTI
> > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > …
> > Call Trace:
> >  <TASK>
> >  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  drm_ioctl_kernel+0xca/0x170
> >  drm_ioctl+0x30f/0x580
> >  __x64_sys_ioctl+0x94/0xd0
> >  do_syscall_64+0x5d/0x90
> >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > …
> > note: kwin_wayland[674] exited with irqs disabled
> > 
> > RIP seems to translate into this:
> > 
> > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> > gen8_ggtt_insert_entries+0xc2/0x150:
> > writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > 
> > Probably, recent PTE-related changes are relevant:
> > 
> > $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > 5e352e32aec23 drm/i915: preparation for using PAT index
> > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > 
> > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> > 
> > Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> > 
> > Please help.
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Matthew,
> 
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Bagas,

would you mind adding this to the regression tracker please?

Thanks.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 13:23     ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 13:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrzej Hajda, dri-devel, Chris Wilson, linux-mm, Bagas Sanjaya,
	David Airlie, Linux Regressions, Matthew Wilcox, Matthew Auld,
	Thomas Hellström, intel-gfx, Nathan Chancellor,
	Rodrigo Vivi, Matt Roper, Daniel Vetter, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

/cc Bagas as well (see below).

On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> 
> On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > Hello.
> > 
> > Since v6.5 kernel the following HW:
> > 
> > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > 
> > is affected by the following crash once KDE on either X11 or Wayland is started:
> > 
> > i915 0000:00:02.0: enabling device (0006 -> 0007)
> > i915 0000:00:02.0: vgaarb: deactivate vga console
> > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> > i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> > [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> > fbcon: i915drmfb (fb0) is primary device
> > i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> > …
> > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > BUG: unable to handle page fault for address: ffffb422c2800000
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > Oops: 0002 [#1] PREEMPT SMP PTI
> > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > …
> > Call Trace:
> >  <TASK>
> >  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  drm_ioctl_kernel+0xca/0x170
> >  drm_ioctl+0x30f/0x580
> >  __x64_sys_ioctl+0x94/0xd0
> >  do_syscall_64+0x5d/0x90
> >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > …
> > note: kwin_wayland[674] exited with irqs disabled
> > 
> > RIP seems to translate into this:
> > 
> > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> > gen8_ggtt_insert_entries+0xc2/0x150:
> > writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > 
> > Probably, recent PTE-related changes are relevant:
> > 
> > $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > 5e352e32aec23 drm/i915: preparation for using PAT index
> > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > 
> > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> > 
> > Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> > 
> > Please help.
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Matthew,
> 
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Bagas,

would you mind adding this to the regression tracker please?

Thanks.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19 13:23     ` Oleksandr Natalenko
  (?)
@ 2023-09-19 14:03       ` Bagas Sanjaya
  -1 siblings, 0 replies; 37+ messages in thread
From: Bagas Sanjaya @ 2023-09-19 14:03 UTC (permalink / raw)
  To: Oleksandr Natalenko, Linux Kernel Mailing List
  Cc: Andrzej Hajda, Linux DRI Development, Chris Wilson, linux-mm,
	David Airlie, Linux Regressions, Matthew Wilcox, Matthew Auld,
	Thomas Hellström, Linux Intel Graphics, Nathan Chancellor,
	Rodrigo Vivi, Matt Roper, Daniel Vetter, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 4952 bytes --]

On Tue, Sep 19, 2023 at 03:23:28PM +0200, Oleksandr Natalenko wrote:
> /cc Bagas as well (see below).
> 
> On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> > /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> > 
> > On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > > Hello.
> > > 
> > > Since v6.5 kernel the following HW:
> > > 
> > > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > > 
> > > is affected by the following crash once KDE on either X11 or Wayland is started:
> > > 
> > > i915 0000:00:02.0: enabling device (0006 -> 0007)
> > > i915 0000:00:02.0: vgaarb: deactivate vga console
> > > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> > > i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> > > [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> > > fbcon: i915drmfb (fb0) is primary device
> > > i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> > > …
> > > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > > BUG: unable to handle page fault for address: ffffb422c2800000
> > > #PF: supervisor write access in kernel mode
> > > #PF: error_code(0x0002) - not-present page
> > > PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > > Oops: 0002 [#1] PREEMPT SMP PTI
> > > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> > > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > > …
> > > Call Trace:
> > >  <TASK>
> > >  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  drm_ioctl_kernel+0xca/0x170
> > >  drm_ioctl+0x30f/0x580
> > >  __x64_sys_ioctl+0x94/0xd0
> > >  do_syscall_64+0x5d/0x90
> > >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > > …
> > > note: kwin_wayland[674] exited with irqs disabled
> > > 
> > > RIP seems to translate into this:
> > > 
> > > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> > > gen8_ggtt_insert_entries+0xc2/0x150:
> > > writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > > (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > > (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > > 
> > > Probably, recent PTE-related changes are relevant:
> > > 
> > > $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> > > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > > 5e352e32aec23 drm/i915: preparation for using PAT index
> > > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > > 
> > > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> > > 
> > > Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> > > 
> > > Please help.
> > > 
> > > Thanks.
> > > 
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> > 
> > Matthew,
> > 
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> > 
> > Could you please check this?
> > 
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Bagas,
> 
> would you mind adding this to the regression tracker please?
> 

Will add shortly, thanks!

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 14:03       ` Bagas Sanjaya
  0 siblings, 0 replies; 37+ messages in thread
From: Bagas Sanjaya @ 2023-09-19 14:03 UTC (permalink / raw)
  To: Oleksandr Natalenko, Linux Kernel Mailing List
  Cc: Thomas Hellström, Linux Regressions, Matt Roper,
	Linux Intel Graphics, Linux DRI Development, Chris Wilson,
	Nathan Chancellor, linux-mm, Matthew Wilcox, Andrzej Hajda,
	Rodrigo Vivi, Andrew Morton, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 4952 bytes --]

On Tue, Sep 19, 2023 at 03:23:28PM +0200, Oleksandr Natalenko wrote:
> /cc Bagas as well (see below).
> 
> On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> > /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> > 
> > On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > > Hello.
> > > 
> > > Since v6.5 kernel the following HW:
> > > 
> > > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > > 
> > > is affected by the following crash once KDE on either X11 or Wayland is started:
> > > 
> > > i915 0000:00:02.0: enabling device (0006 -> 0007)
> > > i915 0000:00:02.0: vgaarb: deactivate vga console
> > > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> > > i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> > > [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> > > fbcon: i915drmfb (fb0) is primary device
> > > i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> > > …
> > > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > > BUG: unable to handle page fault for address: ffffb422c2800000
> > > #PF: supervisor write access in kernel mode
> > > #PF: error_code(0x0002) - not-present page
> > > PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > > Oops: 0002 [#1] PREEMPT SMP PTI
> > > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> > > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > > …
> > > Call Trace:
> > >  <TASK>
> > >  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  drm_ioctl_kernel+0xca/0x170
> > >  drm_ioctl+0x30f/0x580
> > >  __x64_sys_ioctl+0x94/0xd0
> > >  do_syscall_64+0x5d/0x90
> > >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > > …
> > > note: kwin_wayland[674] exited with irqs disabled
> > > 
> > > RIP seems to translate into this:
> > > 
> > > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> > > gen8_ggtt_insert_entries+0xc2/0x150:
> > > writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > > (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > > (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > > 
> > > Probably, recent PTE-related changes are relevant:
> > > 
> > > $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> > > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > > 5e352e32aec23 drm/i915: preparation for using PAT index
> > > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > > 
> > > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> > > 
> > > Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> > > 
> > > Please help.
> > > 
> > > Thanks.
> > > 
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> > 
> > Matthew,
> > 
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> > 
> > Could you please check this?
> > 
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Bagas,
> 
> would you mind adding this to the regression tracker please?
> 

Will add shortly, thanks!

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 14:03       ` Bagas Sanjaya
  0 siblings, 0 replies; 37+ messages in thread
From: Bagas Sanjaya @ 2023-09-19 14:03 UTC (permalink / raw)
  To: Oleksandr Natalenko, Linux Kernel Mailing List
  Cc: Thomas Hellström, Linux Regressions, Matt Roper,
	Linux Intel Graphics, Linux DRI Development, Chris Wilson,
	Nathan Chancellor, linux-mm, Matthew Wilcox, Andrzej Hajda,
	Rodrigo Vivi, Daniel Vetter, Andrew Morton, David Airlie,
	Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 4952 bytes --]

On Tue, Sep 19, 2023 at 03:23:28PM +0200, Oleksandr Natalenko wrote:
> /cc Bagas as well (see below).
> 
> On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> > /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> > 
> > On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > > Hello.
> > > 
> > > Since v6.5 kernel the following HW:
> > > 
> > > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > > 
> > > is affected by the following crash once KDE on either X11 or Wayland is started:
> > > 
> > > i915 0000:00:02.0: enabling device (0006 -> 0007)
> > > i915 0000:00:02.0: vgaarb: deactivate vga console
> > > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> > > i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> > > [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> > > fbcon: i915drmfb (fb0) is primary device
> > > i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> > > …
> > > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > > BUG: unable to handle page fault for address: ffffb422c2800000
> > > #PF: supervisor write access in kernel mode
> > > #PF: error_code(0x0002) - not-present page
> > > PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > > Oops: 0002 [#1] PREEMPT SMP PTI
> > > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> > > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > > …
> > > Call Trace:
> > >  <TASK>
> > >  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > >  drm_ioctl_kernel+0xca/0x170
> > >  drm_ioctl+0x30f/0x580
> > >  __x64_sys_ioctl+0x94/0xd0
> > >  do_syscall_64+0x5d/0x90
> > >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > > …
> > > note: kwin_wayland[674] exited with irqs disabled
> > > 
> > > RIP seems to translate into this:
> > > 
> > > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> > > gen8_ggtt_insert_entries+0xc2/0x150:
> > > writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > > (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > > (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > > 
> > > Probably, recent PTE-related changes are relevant:
> > > 
> > > $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> > > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > > 5e352e32aec23 drm/i915: preparation for using PAT index
> > > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > > 
> > > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> > > 
> > > Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> > > 
> > > Please help.
> > > 
> > > Thanks.
> > > 
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> > 
> > Matthew,
> > 
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> > 
> > Could you please check this?
> > 
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Bagas,
> 
> would you mind adding this to the regression tracker please?
> 

Will add shortly, thanks!

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-02 16:14 ` Oleksandr Natalenko
  (?)
@ 2023-09-19 14:08   ` Bagas Sanjaya
  -1 siblings, 0 replies; 37+ messages in thread
From: Bagas Sanjaya @ 2023-09-19 14:08 UTC (permalink / raw)
  To: Oleksandr Natalenko, Linux Kernel Mailing List
  Cc: Thomas Hellström, Matt Roper, Linux Intel Graphics,
	Chris Wilson, Nathan Chancellor, Andrzej Hajda,
	Linux DRI Development, Daniel Vetter, Rodrigo Vivi, David Airlie,
	Matthew Auld, Linux Regressions

[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]

On Sat, Sep 02, 2023 at 06:14:12PM +0200, Oleksandr Natalenko wrote:
> Hello.
> 
> Since v6.5 kernel the following HW:
> 
> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> is affected by the following crash once KDE on either X11 or Wayland is started:
> 
> i915 0000:00:02.0: enabling device (0006 -> 0007)
> i915 0000:00:02.0: vgaarb: deactivate vga console
> i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> fbcon: i915drmfb (fb0) is primary device
> i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> …
> memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> BUG: unable to handle page fault for address: ffffb422c2800000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> Oops: 0002 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> …
> Call Trace:
>  <TASK>
>  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  drm_ioctl_kernel+0xca/0x170
>  drm_ioctl+0x30f/0x580
>  __x64_sys_ioctl+0x94/0xd0
>  do_syscall_64+0x5d/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> …
> note: kwin_wayland[674] exited with irqs disabled
> 
> RIP seems to translate into this:
> 
> $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> gen8_ggtt_insert_entries+0xc2/0x150:
> writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> 
> Probably, recent PTE-related changes are relevant:
> 
> $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> 9275277d53248 drm/i915: use pat_index instead of cache_level
> 5e352e32aec23 drm/i915: preparation for using PAT index
> 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> 
> Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> 
> Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 

Thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: 0b62af28f249b9
#regzbot title: gen8_ggtt_insert_entries() panic on Lenovo T14s (Tiger Lake) due to folio_batch() on shmem_sg_free_table()
#regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 14:08   ` Bagas Sanjaya
  0 siblings, 0 replies; 37+ messages in thread
From: Bagas Sanjaya @ 2023-09-19 14:08 UTC (permalink / raw)
  To: Oleksandr Natalenko, Linux Kernel Mailing List
  Cc: Thomas Hellström, Linux Intel Graphics,
	Linux DRI Development, Chris Wilson, Nathan Chancellor,
	Matthew Auld, Andrzej Hajda, Rodrigo Vivi, Matt Roper,
	Linux Regressions

[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]

On Sat, Sep 02, 2023 at 06:14:12PM +0200, Oleksandr Natalenko wrote:
> Hello.
> 
> Since v6.5 kernel the following HW:
> 
> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> is affected by the following crash once KDE on either X11 or Wayland is started:
> 
> i915 0000:00:02.0: enabling device (0006 -> 0007)
> i915 0000:00:02.0: vgaarb: deactivate vga console
> i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> fbcon: i915drmfb (fb0) is primary device
> i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> …
> memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> BUG: unable to handle page fault for address: ffffb422c2800000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> Oops: 0002 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> …
> Call Trace:
>  <TASK>
>  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  drm_ioctl_kernel+0xca/0x170
>  drm_ioctl+0x30f/0x580
>  __x64_sys_ioctl+0x94/0xd0
>  do_syscall_64+0x5d/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> …
> note: kwin_wayland[674] exited with irqs disabled
> 
> RIP seems to translate into this:
> 
> $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> gen8_ggtt_insert_entries+0xc2/0x150:
> writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> 
> Probably, recent PTE-related changes are relevant:
> 
> $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> 9275277d53248 drm/i915: use pat_index instead of cache_level
> 5e352e32aec23 drm/i915: preparation for using PAT index
> 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> 
> Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> 
> Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 

Thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: 0b62af28f249b9
#regzbot title: gen8_ggtt_insert_entries() panic on Lenovo T14s (Tiger Lake) due to folio_batch() on shmem_sg_free_table()
#regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 14:08   ` Bagas Sanjaya
  0 siblings, 0 replies; 37+ messages in thread
From: Bagas Sanjaya @ 2023-09-19 14:08 UTC (permalink / raw)
  To: Oleksandr Natalenko, Linux Kernel Mailing List
  Cc: Thomas Hellström, Daniel Vetter, David Airlie,
	Linux Intel Graphics, Linux DRI Development, Chris Wilson,
	Nathan Chancellor, Matthew Auld, Andrzej Hajda, Rodrigo Vivi,
	Matt Roper, Linux Regressions

[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]

On Sat, Sep 02, 2023 at 06:14:12PM +0200, Oleksandr Natalenko wrote:
> Hello.
> 
> Since v6.5 kernel the following HW:
> 
> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> is affected by the following crash once KDE on either X11 or Wayland is started:
> 
> i915 0000:00:02.0: enabling device (0006 -> 0007)
> i915 0000:00:02.0: vgaarb: deactivate vga console
> i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> fbcon: i915drmfb (fb0) is primary device
> i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> …
> memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> BUG: unable to handle page fault for address: ffffb422c2800000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> Oops: 0002 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> …
> Call Trace:
>  <TASK>
>  intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
>  drm_ioctl_kernel+0xca/0x170
>  drm_ioctl+0x30f/0x580
>  __x64_sys_ioctl+0x94/0xd0
>  do_syscall_64+0x5d/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> …
> note: kwin_wayland[674] exited with irqs disabled
> 
> RIP seems to translate into this:
> 
> $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> gen8_ggtt_insert_entries+0xc2/0x150:
> writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> 
> Probably, recent PTE-related changes are relevant:
> 
> $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> 9275277d53248 drm/i915: use pat_index instead of cache_level
> 5e352e32aec23 drm/i915: preparation for using PAT index
> 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> 
> Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> 
> Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 

Thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: 0b62af28f249b9
#regzbot title: gen8_ggtt_insert_entries() panic on Lenovo T14s (Tiger Lake) due to folio_batch() on shmem_sg_free_table()
#regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19 14:03       ` Bagas Sanjaya
  (?)
@ 2023-09-19 14:14         ` Oleksandr Natalenko
  -1 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 14:14 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Bagas Sanjaya
  Cc: Andrzej Hajda, Linux DRI Development, Chris Wilson, linux-mm,
	David Airlie, Linux Regressions, Matthew Wilcox, Matthew Auld,
	Thomas Hellström, Linux Intel Graphics, Nathan Chancellor,
	Rodrigo Vivi, Matt Roper, Daniel Vetter, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 479 bytes --]

On úterý 19. září 2023 16:03:03 CEST Bagas Sanjaya wrote:
> …
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> > 
> > Bagas,
> > 
> > would you mind adding this to the regression tracker please?
> > 
> 
> Will add shortly, thanks!

Thank you.

Please consider correcting the title though. Lenovo T14s (Tiger Lake) is not affected.  Affected are: Lenovo T460s (Skylake) and Lenovo T490s (WhiskeyLake)

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 14:14         ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 14:14 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Bagas Sanjaya
  Cc: Thomas Hellström, Linux Regressions, Matt Roper,
	Linux Intel Graphics, Linux DRI Development, Chris Wilson,
	Nathan Chancellor, linux-mm, Matthew Wilcox, Andrzej Hajda,
	Rodrigo Vivi, Andrew Morton, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 479 bytes --]

On úterý 19. září 2023 16:03:03 CEST Bagas Sanjaya wrote:
> …
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> > 
> > Bagas,
> > 
> > would you mind adding this to the regression tracker please?
> > 
> 
> Will add shortly, thanks!

Thank you.

Please consider correcting the title though. Lenovo T14s (Tiger Lake) is not affected.  Affected are: Lenovo T460s (Skylake) and Lenovo T490s (WhiskeyLake)

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 14:14         ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 14:14 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Bagas Sanjaya
  Cc: Thomas Hellström, Linux Regressions, Matt Roper,
	Linux Intel Graphics, Linux DRI Development, Chris Wilson,
	Nathan Chancellor, linux-mm, Matthew Wilcox, Andrzej Hajda,
	Rodrigo Vivi, Daniel Vetter, Andrew Morton, David Airlie,
	Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 479 bytes --]

On úterý 19. září 2023 16:03:03 CEST Bagas Sanjaya wrote:
> …
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> > 
> > Bagas,
> > 
> > would you mind adding this to the regression tracker please?
> > 
> 
> Will add shortly, thanks!

Thank you.

Please consider correcting the title though. Lenovo T14s (Tiger Lake) is not affected.  Affected are: Lenovo T460s (Skylake) and Lenovo T490s (WhiskeyLake)

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19  8:26   ` Oleksandr Natalenko
  (?)
@ 2023-09-19 15:43     ` Matthew Wilcox
  -1 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 15:43 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: linux-kernel, intel-gfx, dri-devel, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, David Airlie, Daniel Vetter,
	Andi Shyti, Andrzej Hajda, Matthew Auld, Matt Roper,
	Aravind Iddamsetty, Fei Yang, Thomas Hellström,
	Nathan Chancellor, Chris Wilson, Daniele Ceraolo Spurio,
	Andrew Morton, linux-mm

On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Wow, that is some great debugging.  Thanks for all the time & effort
you and others have invested.  Sorry for breaking your system.

You're almost right about the "prerequisites", but it's in the other
direction; 0b62af28f249 is a prerequisite for the later two cleanups,
so reverting all three is necessary to test 0b62af28f249.

It seems to me that you've isolated the problem to constructing overly
long sg lists.  I didn't realise that was going to be a problem, so
that's my fault.

Could I ask you to try this patch?  I'll follow up with another patch
later because I think I made another assumption that may not be valid.

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..73a4a4eb29e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 	st->nents = 0;
 	for (i = 0; i < page_count; i++) {
 		struct folio *folio;
+		unsigned long nr_pages;
 		const unsigned int shrink[] = {
 			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
 			0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 			}
 		} while (1);
 
+		nr_pages = min_t(unsigned long,
+				folio_nr_pages(folio), page_count - i);
 		if (!i ||
 		    sg->length >= max_segment ||
 		    folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 				sg = sg_next(sg);
 
 			st->nents++;
-			sg_set_folio(sg, folio, folio_size(folio), 0);
+			sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
 		} else {
 			/* XXX: could overflow? */
-			sg->length += folio_size(folio);
+			sg->length += nr_pages * PAGE_SIZE;
 		}
-		next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
-		i += folio_nr_pages(folio) - 1;
+		next_pfn = folio_pfn(folio) + nr_pages;
+		i += nr_pages - 1;
 
 		/* Check that the i965g/gm workaround works. */
 		GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 15:43     ` Matthew Wilcox
  0 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 15:43 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Tvrtko Ursulin, Aravind Iddamsetty, Andi Shyti,
	Thomas Hellström, Matt Roper, intel-gfx, linux-kernel,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	Daniele Ceraolo Spurio, dri-devel, Rodrigo Vivi, Andrew Morton,
	Fei Yang, Matthew Auld

On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Wow, that is some great debugging.  Thanks for all the time & effort
you and others have invested.  Sorry for breaking your system.

You're almost right about the "prerequisites", but it's in the other
direction; 0b62af28f249 is a prerequisite for the later two cleanups,
so reverting all three is necessary to test 0b62af28f249.

It seems to me that you've isolated the problem to constructing overly
long sg lists.  I didn't realise that was going to be a problem, so
that's my fault.

Could I ask you to try this patch?  I'll follow up with another patch
later because I think I made another assumption that may not be valid.

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..73a4a4eb29e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 	st->nents = 0;
 	for (i = 0; i < page_count; i++) {
 		struct folio *folio;
+		unsigned long nr_pages;
 		const unsigned int shrink[] = {
 			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
 			0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 			}
 		} while (1);
 
+		nr_pages = min_t(unsigned long,
+				folio_nr_pages(folio), page_count - i);
 		if (!i ||
 		    sg->length >= max_segment ||
 		    folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 				sg = sg_next(sg);
 
 			st->nents++;
-			sg_set_folio(sg, folio, folio_size(folio), 0);
+			sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
 		} else {
 			/* XXX: could overflow? */
-			sg->length += folio_size(folio);
+			sg->length += nr_pages * PAGE_SIZE;
 		}
-		next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
-		i += folio_nr_pages(folio) - 1;
+		next_pfn = folio_pfn(folio) + nr_pages;
+		i += nr_pages - 1;
 
 		/* Check that the i965g/gm workaround works. */
 		GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 15:43     ` Matthew Wilcox
  0 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 15:43 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Thomas Hellström, Matt Roper, intel-gfx, linux-kernel,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	dri-devel, Daniel Vetter, Rodrigo Vivi, Andrew Morton,
	David Airlie, Matthew Auld

On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Wow, that is some great debugging.  Thanks for all the time & effort
you and others have invested.  Sorry for breaking your system.

You're almost right about the "prerequisites", but it's in the other
direction; 0b62af28f249 is a prerequisite for the later two cleanups,
so reverting all three is necessary to test 0b62af28f249.

It seems to me that you've isolated the problem to constructing overly
long sg lists.  I didn't realise that was going to be a problem, so
that's my fault.

Could I ask you to try this patch?  I'll follow up with another patch
later because I think I made another assumption that may not be valid.

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..73a4a4eb29e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 	st->nents = 0;
 	for (i = 0; i < page_count; i++) {
 		struct folio *folio;
+		unsigned long nr_pages;
 		const unsigned int shrink[] = {
 			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
 			0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 			}
 		} while (1);
 
+		nr_pages = min_t(unsigned long,
+				folio_nr_pages(folio), page_count - i);
 		if (!i ||
 		    sg->length >= max_segment ||
 		    folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
 				sg = sg_next(sg);
 
 			st->nents++;
-			sg_set_folio(sg, folio, folio_size(folio), 0);
+			sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
 		} else {
 			/* XXX: could overflow? */
-			sg->length += folio_size(folio);
+			sg->length += nr_pages * PAGE_SIZE;
 		}
-		next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
-		i += folio_nr_pages(folio) - 1;
+		next_pfn = folio_pfn(folio) + nr_pages;
+		i += nr_pages - 1;
 
 		/* Check that the i965g/gm workaround works. */
 		GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19 15:43     ` Matthew Wilcox
  (?)
@ 2023-09-19 16:02       ` Matthew Wilcox
  -1 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 16:02 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: linux-kernel, intel-gfx, dri-devel, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, David Airlie, Daniel Vetter,
	Andi Shyti, Andrzej Hajda, Matthew Auld, Matt Roper,
	Aravind Iddamsetty, Fei Yang, Thomas Hellström,
	Nathan Chancellor, Chris Wilson, Daniele Ceraolo Spurio,
	Andrew Morton, linux-mm

On Tue, Sep 19, 2023 at 04:43:41PM +0100, Matthew Wilcox wrote:
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

Ah, no, never mind.  I thought we could start in the middle of a folio,
but we always start constructing the sg list from index 0 of the file,
so we always start at the first page of a folio.  If this patch solves
your problem, then I think we're done.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 16:02       ` Matthew Wilcox
  0 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 16:02 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Tvrtko Ursulin, Aravind Iddamsetty, Andi Shyti,
	Thomas Hellström, Matt Roper, intel-gfx, linux-kernel,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	Daniele Ceraolo Spurio, dri-devel, Rodrigo Vivi, Andrew Morton,
	Fei Yang, Matthew Auld

On Tue, Sep 19, 2023 at 04:43:41PM +0100, Matthew Wilcox wrote:
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

Ah, no, never mind.  I thought we could start in the middle of a folio,
but we always start constructing the sg list from index 0 of the file,
so we always start at the first page of a folio.  If this patch solves
your problem, then I think we're done.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 16:02       ` Matthew Wilcox
  0 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 16:02 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Thomas Hellström, Matt Roper, intel-gfx, linux-kernel,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	dri-devel, Daniel Vetter, Rodrigo Vivi, Andrew Morton,
	David Airlie, Matthew Auld

On Tue, Sep 19, 2023 at 04:43:41PM +0100, Matthew Wilcox wrote:
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

Ah, no, never mind.  I thought we could start in the middle of a folio,
but we always start constructing the sg list from index 0 of the file,
so we always start at the first page of a folio.  If this patch solves
your problem, then I think we're done.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-02 16:14 ` Oleksandr Natalenko
                   ` (3 preceding siblings ...)
  (?)
@ 2023-09-19 16:12 ` Patchwork
  2023-09-19 16:23   ` Matthew Wilcox
  -1 siblings, 1 reply; 37+ messages in thread
From: Patchwork @ 2023-09-19 16:12 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: intel-gfx

== Series Details ==

Series: Panic in gen8_ggtt_insert_entries() with v6.5
URL   : https://patchwork.freedesktop.org/series/123922/
State : warning

== Summary ==

Error: dim checkpatch failed
9c7e506f4584 Panic in gen8_ggtt_insert_entries() with v6.5
-:7: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
#7: 
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.

-:7: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")'
#7: 
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.

-:21: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")'
#21: 
direction; 0b62af28f249 is a prerequisite for the later two cleanups,

-:48: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#48: FILE: drivers/gpu/drm/i915/gem/i915_gem_shmem.c:155:
+		nr_pages = min_t(unsigned long,
+				folio_nr_pages(folio), page_count - i);

-:69: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 3 errors, 1 warnings, 1 checks, 32 lines checked



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-02 16:14 ` Oleksandr Natalenko
                   ` (4 preceding siblings ...)
  (?)
@ 2023-09-19 16:12 ` Patchwork
  -1 siblings, 0 replies; 37+ messages in thread
From: Patchwork @ 2023-09-19 16:12 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: intel-gfx

== Series Details ==

Series: Panic in gen8_ggtt_insert_entries() with v6.5
URL   : https://patchwork.freedesktop.org/series/123922/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.CHECKPATCH: warning for Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19 16:12 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
@ 2023-09-19 16:23   ` Matthew Wilcox
  0 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 16:23 UTC (permalink / raw)
  To: intel-gfx

On Tue, Sep 19, 2023 at 04:12:46PM -0000, Patchwork wrote:
> -:7: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
> #7: 
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.

This is just a parsing fail.

> -:48: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
> #48: FILE: drivers/gpu/drm/i915/gem/i915_gem_shmem.c:155:
> +		nr_pages = min_t(unsigned long,
> +				folio_nr_pages(folio), page_count - i);

This is bullshit.  Aligning to open paren is an antipattern that leads
to significant unnecessary code churn.  I will not be part of such a
stupid system.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-02 16:14 ` Oleksandr Natalenko
                   ` (5 preceding siblings ...)
  (?)
@ 2023-09-19 16:30 ` Patchwork
  -1 siblings, 0 replies; 37+ messages in thread
From: Patchwork @ 2023-09-19 16:30 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 8891 bytes --]

== Series Details ==

Series: Panic in gen8_ggtt_insert_entries() with v6.5
URL   : https://patchwork.freedesktop.org/series/123922/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_13651 -> Patchwork_123922v1
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_123922v1 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_123922v1, please notify your bug team (lgci.bug.filing@intel.com) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/index.html

Participating hosts (38 -> 38)
------------------------------

  Additional (2): bat-dg2-8 bat-rpls-2 
  Missing    (2): bat-adlm-1 fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_123922v1:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_module_load@load:
    - bat-mtlp-8:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13651/bat-mtlp-8/igt@i915_module_load@load.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-mtlp-8/igt@i915_module_load@load.html

  * igt@runner@aborted:
    - bat-rpls-2:         NOTRUN -> [FAIL][3]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-rpls-2/igt@runner@aborted.html

  
Known issues
------------

  Here are the changes found in Patchwork_123922v1 that come from known issues:

### CI changes ###

#### Issues hit ####

  * boot:
    - fi-hsw-4770:        [PASS][4] -> [FAIL][5] ([i915#8293])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13651/fi-hsw-4770/boot.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/fi-hsw-4770/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@gem_mmap@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][6] ([i915#4083])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@gem_mmap@basic.html

  * igt@gem_mmap_gtt@basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][7] ([i915#4077]) +2 other tests skip
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@gem_mmap_gtt@basic.html

  * igt@gem_tiled_pread_basic:
    - bat-dg2-8:          NOTRUN -> [SKIP][8] ([i915#4079]) +1 other test skip
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_rps@basic-api:
    - bat-dg2-8:          NOTRUN -> [SKIP][9] ([i915#6621])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@i915_pm_rps@basic-api.html

  * igt@i915_suspend@basic-s3-without-i915:
    - bat-dg2-8:          NOTRUN -> [SKIP][10] ([i915#6645])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][11] ([i915#5190])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][12] ([i915#4215] / [i915#5190])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_addfb_basic@basic-y-tiled-legacy.html

  * igt@kms_addfb_basic@framebuffer-vs-set-tiling:
    - bat-dg2-8:          NOTRUN -> [SKIP][13] ([i915#4212]) +6 other tests skip
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_addfb_basic@framebuffer-vs-set-tiling.html

  * igt@kms_addfb_basic@tile-pitch-mismatch:
    - bat-dg2-8:          NOTRUN -> [SKIP][14] ([i915#4212] / [i915#5608])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_addfb_basic@tile-pitch-mismatch.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - bat-dg2-8:          NOTRUN -> [SKIP][15] ([i915#4103] / [i915#4213] / [i915#5608]) +1 other test skip
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_force_connector_basic@force-load-detect:
    - bat-dg2-8:          NOTRUN -> [SKIP][16] ([fdo#109285])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_force_connector_basic@prune-stale-modes:
    - bat-dg2-8:          NOTRUN -> [SKIP][17] ([i915#5274])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_force_connector_basic@prune-stale-modes.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence:
    - bat-dg2-11:         NOTRUN -> [SKIP][18] ([i915#1845]) +3 other tests skip
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-11/igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence.html

  * igt@kms_psr@cursor_plane_move:
    - bat-dg2-8:          NOTRUN -> [SKIP][19] ([i915#1072]) +3 other tests skip
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_psr@cursor_plane_move.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - bat-dg2-8:          NOTRUN -> [SKIP][20] ([i915#3555])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-fence-flip:
    - bat-dg2-8:          NOTRUN -> [SKIP][21] ([i915#3708])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@prime_vgem@basic-fence-flip.html

  * igt@prime_vgem@basic-fence-mmap:
    - bat-dg2-8:          NOTRUN -> [SKIP][22] ([i915#3708] / [i915#4077]) +1 other test skip
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@prime_vgem@basic-fence-mmap.html

  * igt@prime_vgem@basic-write:
    - bat-dg2-8:          NOTRUN -> [SKIP][23] ([i915#3291] / [i915#3708]) +2 other tests skip
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-8/igt@prime_vgem@basic-write.html

  
#### Possible fixes ####

  * igt@kms_chamelium_edid@hdmi-edid-read:
    - {bat-dg2-13}:       [DMESG-WARN][24] ([i915#7952]) -> [PASS][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13651/bat-dg2-13/igt@kms_chamelium_edid@hdmi-edid-read.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/bat-dg2-13/igt@kms_chamelium_edid@hdmi-edid-read.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#5190]: https://gitlab.freedesktop.org/drm/intel/issues/5190
  [i915#5274]: https://gitlab.freedesktop.org/drm/intel/issues/5274
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#5608]: https://gitlab.freedesktop.org/drm/intel/issues/5608
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645
  [i915#7952]: https://gitlab.freedesktop.org/drm/intel/issues/7952
  [i915#8293]: https://gitlab.freedesktop.org/drm/intel/issues/8293


Build changes
-------------

  * Linux: CI_DRM_13651 -> Patchwork_123922v1

  CI-20190529: 20190529
  CI_DRM_13651: 61b71c3f061a44a6ab1dcf756918886aa03a5480 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7493: 2517e42d612e0c1ca096acf8b5f6177f7ef4bce7 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_123922v1: 61b71c3f061a44a6ab1dcf756918886aa03a5480 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

df5c6e1223ca Panic in gen8_ggtt_insert_entries() with v6.5

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_123922v1/index.html

[-- Attachment #2: Type: text/html, Size: 10407 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19 15:43     ` Matthew Wilcox
  (?)
@ 2023-09-19 18:11       ` Oleksandr Natalenko
  -1 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 18:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, intel-gfx, dri-devel, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, David Airlie, Daniel Vetter,
	Andi Shyti, Andrzej Hajda, Matthew Auld, Matt Roper,
	Aravind Iddamsetty, Fei Yang, Thomas Hellström,
	Nathan Chancellor, Chris Wilson, Daniele Ceraolo Spurio,
	Andrew Morton, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3446 bytes --]

Hello.

On úterý 19. září 2023 17:43:40 CEST Matthew Wilcox wrote:
> On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> > 
> > Could you please check this?
> > 
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Wow, that is some great debugging.  Thanks for all the time & effort
> you and others have invested.  Sorry for breaking your system.
> 
> You're almost right about the "prerequisites", but it's in the other
> direction; 0b62af28f249 is a prerequisite for the later two cleanups,
> so reverting all three is necessary to test 0b62af28f249.
> 
> It seems to me that you've isolated the problem to constructing overly
> long sg lists.  I didn't realise that was going to be a problem, so
> that's my fault.
> 
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Should you submit it, please add:

Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Cc: stable@vger.kernel.org # 6.5.x
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Link: https://lore.kernel.org/lkml/6287208.lOV4Wx5bFT@natalenko.name/
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 8f1633c3fb93..73a4a4eb29e0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  	st->nents = 0;
>  	for (i = 0; i < page_count; i++) {
>  		struct folio *folio;
> +		unsigned long nr_pages;
>  		const unsigned int shrink[] = {
>  			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
>  			0,
> @@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  			}
>  		} while (1);
>  
> +		nr_pages = min_t(unsigned long,
> +				folio_nr_pages(folio), page_count - i);
>  		if (!i ||
>  		    sg->length >= max_segment ||
>  		    folio_pfn(folio) != next_pfn) {
> @@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  				sg = sg_next(sg);
>  
>  			st->nents++;
> -			sg_set_folio(sg, folio, folio_size(folio), 0);
> +			sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
>  		} else {
>  			/* XXX: could overflow? */
> -			sg->length += folio_size(folio);
> +			sg->length += nr_pages * PAGE_SIZE;
>  		}
> -		next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
> -		i += folio_nr_pages(folio) - 1;
> +		next_pfn = folio_pfn(folio) + nr_pages;
> +		i += nr_pages - 1;
>  
>  		/* Check that the i965g/gm workaround works. */
>  		GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 18:11       ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 18:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Tvrtko Ursulin, Aravind Iddamsetty, Andi Shyti,
	Thomas Hellström, Matt Roper, intel-gfx, linux-kernel,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	Daniele Ceraolo Spurio, dri-devel, Rodrigo Vivi, Andrew Morton,
	Fei Yang, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 3446 bytes --]

Hello.

On úterý 19. září 2023 17:43:40 CEST Matthew Wilcox wrote:
> On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> > 
> > Could you please check this?
> > 
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Wow, that is some great debugging.  Thanks for all the time & effort
> you and others have invested.  Sorry for breaking your system.
> 
> You're almost right about the "prerequisites", but it's in the other
> direction; 0b62af28f249 is a prerequisite for the later two cleanups,
> so reverting all three is necessary to test 0b62af28f249.
> 
> It seems to me that you've isolated the problem to constructing overly
> long sg lists.  I didn't realise that was going to be a problem, so
> that's my fault.
> 
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Should you submit it, please add:

Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Cc: stable@vger.kernel.org # 6.5.x
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Link: https://lore.kernel.org/lkml/6287208.lOV4Wx5bFT@natalenko.name/
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 8f1633c3fb93..73a4a4eb29e0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  	st->nents = 0;
>  	for (i = 0; i < page_count; i++) {
>  		struct folio *folio;
> +		unsigned long nr_pages;
>  		const unsigned int shrink[] = {
>  			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
>  			0,
> @@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  			}
>  		} while (1);
>  
> +		nr_pages = min_t(unsigned long,
> +				folio_nr_pages(folio), page_count - i);
>  		if (!i ||
>  		    sg->length >= max_segment ||
>  		    folio_pfn(folio) != next_pfn) {
> @@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  				sg = sg_next(sg);
>  
>  			st->nents++;
> -			sg_set_folio(sg, folio, folio_size(folio), 0);
> +			sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
>  		} else {
>  			/* XXX: could overflow? */
> -			sg->length += folio_size(folio);
> +			sg->length += nr_pages * PAGE_SIZE;
>  		}
> -		next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
> -		i += folio_nr_pages(folio) - 1;
> +		next_pfn = folio_pfn(folio) + nr_pages;
> +		i += nr_pages - 1;
>  
>  		/* Check that the i965g/gm workaround works. */
>  		GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 18:11       ` Oleksandr Natalenko
  0 siblings, 0 replies; 37+ messages in thread
From: Oleksandr Natalenko @ 2023-09-19 18:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Thomas Hellström, Matt Roper, intel-gfx, linux-kernel,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	dri-devel, Daniel Vetter, Rodrigo Vivi, Andrew Morton,
	David Airlie, Matthew Auld

[-- Attachment #1: Type: text/plain, Size: 3446 bytes --]

Hello.

On úterý 19. září 2023 17:43:40 CEST Matthew Wilcox wrote:
> On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> > 
> > Could you please check this?
> > 
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Wow, that is some great debugging.  Thanks for all the time & effort
> you and others have invested.  Sorry for breaking your system.
> 
> You're almost right about the "prerequisites", but it's in the other
> direction; 0b62af28f249 is a prerequisite for the later two cleanups,
> so reverting all three is necessary to test 0b62af28f249.
> 
> It seems to me that you've isolated the problem to constructing overly
> long sg lists.  I didn't realise that was going to be a problem, so
> that's my fault.
> 
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Should you submit it, please add:

Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Cc: stable@vger.kernel.org # 6.5.x
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Link: https://lore.kernel.org/lkml/6287208.lOV4Wx5bFT@natalenko.name/
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 8f1633c3fb93..73a4a4eb29e0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  	st->nents = 0;
>  	for (i = 0; i < page_count; i++) {
>  		struct folio *folio;
> +		unsigned long nr_pages;
>  		const unsigned int shrink[] = {
>  			I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
>  			0,
> @@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  			}
>  		} while (1);
>  
> +		nr_pages = min_t(unsigned long,
> +				folio_nr_pages(folio), page_count - i);
>  		if (!i ||
>  		    sg->length >= max_segment ||
>  		    folio_pfn(folio) != next_pfn) {
> @@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
>  				sg = sg_next(sg);
>  
>  			st->nents++;
> -			sg_set_folio(sg, folio, folio_size(folio), 0);
> +			sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
>  		} else {
>  			/* XXX: could overflow? */
> -			sg->length += folio_size(folio);
> +			sg->length += nr_pages * PAGE_SIZE;
>  		}
> -		next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
> -		i += folio_nr_pages(folio) - 1;
> +		next_pfn = folio_pfn(folio) + nr_pages;
> +		i += nr_pages - 1;
>  
>  		/* Check that the i965g/gm workaround works. */
>  		GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19 18:11       ` Oleksandr Natalenko
  (?)
@ 2023-09-19 19:15         ` Matthew Wilcox
  -1 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 19:15 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: linux-kernel, intel-gfx, dri-devel, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, David Airlie, Daniel Vetter,
	Andi Shyti, Andrzej Hajda, Matthew Auld, Matt Roper,
	Aravind Iddamsetty, Fei Yang, Thomas Hellström,
	Nathan Chancellor, Chris Wilson, Daniele Ceraolo Spurio,
	Andrew Morton, linux-mm

On Tue, Sep 19, 2023 at 08:11:47PM +0200, Oleksandr Natalenko wrote:
> I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Yay!

> Should you submit it, please add:
> 
> Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")

Thanks for collecting all these; you're making my life really easy.
One minor point is that the standard format for Fixes: is 12 characters,
not 10.  eg,

Documentation/process/5.Posting.rst:    Fixes: 1f2e3d4c5b6a ("The first line of the commit specified by the first 12 characters of its SHA-1 ID")

I have this in my .gitconfig:

[pretty]
        fixes = Fixes: %h (\"%s\")

and in .git/config,

[core]
        abbrev = 12

I'm working on the commit message now.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 19:15         ` Matthew Wilcox
  0 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 19:15 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Tvrtko Ursulin, Aravind Iddamsetty, Andi Shyti,
	Thomas Hellström, Matt Roper, intel-gfx, linux-kernel,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	Daniele Ceraolo Spurio, dri-devel, Rodrigo Vivi, Andrew Morton,
	Fei Yang, Matthew Auld

On Tue, Sep 19, 2023 at 08:11:47PM +0200, Oleksandr Natalenko wrote:
> I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Yay!

> Should you submit it, please add:
> 
> Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")

Thanks for collecting all these; you're making my life really easy.
One minor point is that the standard format for Fixes: is 12 characters,
not 10.  eg,

Documentation/process/5.Posting.rst:    Fixes: 1f2e3d4c5b6a ("The first line of the commit specified by the first 12 characters of its SHA-1 ID")

I have this in my .gitconfig:

[pretty]
        fixes = Fixes: %h (\"%s\")

and in .git/config,

[core]
        abbrev = 12

I'm working on the commit message now.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-19 19:15         ` Matthew Wilcox
  0 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-09-19 19:15 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Thomas Hellström, Matt Roper, intel-gfx, linux-kernel,
	Chris Wilson, Nathan Chancellor, linux-mm, Andrzej Hajda,
	dri-devel, Daniel Vetter, Rodrigo Vivi, Andrew Morton,
	David Airlie, Matthew Auld

On Tue, Sep 19, 2023 at 08:11:47PM +0200, Oleksandr Natalenko wrote:
> I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Yay!

> Should you submit it, please add:
> 
> Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")

Thanks for collecting all these; you're making my life really easy.
One minor point is that the standard format for Fixes: is 12 characters,
not 10.  eg,

Documentation/process/5.Posting.rst:    Fixes: 1f2e3d4c5b6a ("The first line of the commit specified by the first 12 characters of its SHA-1 ID")

I have this in my .gitconfig:

[pretty]
        fixes = Fixes: %h (\"%s\")

and in .git/config,

[core]
        abbrev = 12

I'm working on the commit message now.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
  2023-09-19 14:08   ` Bagas Sanjaya
  (?)
@ 2023-09-29 13:45     ` Linux regression tracking #update (Thorsten Leemhuis)
  -1 siblings, 0 replies; 37+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-29 13:45 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linux Intel Graphics, Linux DRI Development, Linux Regressions

On 19.09.23 16:08, Bagas Sanjaya wrote:
> On Sat, Sep 02, 2023 at 06:14:12PM +0200, Oleksandr Natalenko wrote:
>>
>> Since v6.5 kernel the following HW:
>>
>> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
>> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> #regzbot ^introduced: 0b62af28f249b9
> #regzbot title: gen8_ggtt_insert_entries() panic on Lenovo T14s (Tiger Lake) due to folio_batch() on shmem_sg_free_table()
> #regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256

#regzbot fix: i915: Limit the length of an sg list to the requested length
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-29 13:45     ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 37+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-29 13:45 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linux Intel Graphics, Linux Regressions, Linux DRI Development

On 19.09.23 16:08, Bagas Sanjaya wrote:
> On Sat, Sep 02, 2023 at 06:14:12PM +0200, Oleksandr Natalenko wrote:
>>
>> Since v6.5 kernel the following HW:
>>
>> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
>> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> #regzbot ^introduced: 0b62af28f249b9
> #regzbot title: gen8_ggtt_insert_entries() panic on Lenovo T14s (Tiger Lake) due to folio_batch() on shmem_sg_free_table()
> #regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256

#regzbot fix: i915: Limit the length of an sg list to the requested length
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Intel-gfx] [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5
@ 2023-09-29 13:45     ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 37+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-29 13:45 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linux Intel Graphics, Linux Regressions, Linux DRI Development

On 19.09.23 16:08, Bagas Sanjaya wrote:
> On Sat, Sep 02, 2023 at 06:14:12PM +0200, Oleksandr Natalenko wrote:
>>
>> Since v6.5 kernel the following HW:
>>
>> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
>> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> #regzbot ^introduced: 0b62af28f249b9
> #regzbot title: gen8_ggtt_insert_entries() panic on Lenovo T14s (Tiger Lake) due to folio_batch() on shmem_sg_free_table()
> #regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256

#regzbot fix: i915: Limit the length of an sg list to the requested length
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-09-29 14:11 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-02 16:14 [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5 Oleksandr Natalenko
2023-09-02 16:14 ` [Intel-gfx] " Oleksandr Natalenko
2023-09-02 16:14 ` Oleksandr Natalenko
2023-09-19  8:26 ` [REGRESSION] [BISECTED] " Oleksandr Natalenko
2023-09-19  8:26   ` [Intel-gfx] " Oleksandr Natalenko
2023-09-19  8:26   ` Oleksandr Natalenko
2023-09-19 13:23   ` Oleksandr Natalenko
2023-09-19 13:23     ` [Intel-gfx] " Oleksandr Natalenko
2023-09-19 13:23     ` Oleksandr Natalenko
2023-09-19 14:03     ` [Intel-gfx] " Bagas Sanjaya
2023-09-19 14:03       ` Bagas Sanjaya
2023-09-19 14:03       ` Bagas Sanjaya
2023-09-19 14:14       ` Oleksandr Natalenko
2023-09-19 14:14         ` Oleksandr Natalenko
2023-09-19 14:14         ` Oleksandr Natalenko
2023-09-19 15:43   ` Matthew Wilcox
2023-09-19 15:43     ` [Intel-gfx] " Matthew Wilcox
2023-09-19 15:43     ` Matthew Wilcox
2023-09-19 16:02     ` Matthew Wilcox
2023-09-19 16:02       ` [Intel-gfx] " Matthew Wilcox
2023-09-19 16:02       ` Matthew Wilcox
2023-09-19 18:11     ` Oleksandr Natalenko
2023-09-19 18:11       ` [Intel-gfx] " Oleksandr Natalenko
2023-09-19 18:11       ` Oleksandr Natalenko
2023-09-19 19:15       ` Matthew Wilcox
2023-09-19 19:15         ` [Intel-gfx] " Matthew Wilcox
2023-09-19 19:15         ` Matthew Wilcox
2023-09-19 14:08 ` [REGRESSION] " Bagas Sanjaya
2023-09-19 14:08   ` [Intel-gfx] " Bagas Sanjaya
2023-09-19 14:08   ` Bagas Sanjaya
2023-09-29 13:45   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-09-29 13:45     ` [Intel-gfx] " Linux regression tracking #update (Thorsten Leemhuis)
2023-09-29 13:45     ` Linux regression tracking #update (Thorsten Leemhuis)
2023-09-19 16:12 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2023-09-19 16:23   ` Matthew Wilcox
2023-09-19 16:12 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-09-19 16:30 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.