dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
@ 2022-10-23  8:04 Thorsten Leemhuis
  2022-10-24 10:26 ` Thomas Zimmermann
  0 siblings, 1 reply; 11+ messages in thread
From: Thorsten Leemhuis @ 2022-10-23  8:04 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Sasha Levin, regressions, Andreas, Javier Martinez Canillas,
	ML dri-devel, stable, Greg KH

Hi, this is your Linux kernel regression tracker speaking.

I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developer don't keep an eye on it, I decided to forward it by
mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616  :

>  Andreas 2022-10-22 14:25:32 UTC
> 
> Created attachment 303074 [details]
> dmesg
> 
> 6.0.2 works.
> 
> On 6.0.3 the system is very sluggish with graphic glitches all over the place in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also sluggish). SDDM works fine.
> 
> Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary nvidia driver).
> 
> [reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
> 
> Created attachment 303075 [details]
> my kernel .config for 6.0.3
> 
> Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my .config for 6.0.2.
> 
> [reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
> 
> In /var/log/Xorg.0.log the only obvious difference is the last line:
> ---- snap
> randr: falling back to unsynchronized pixmap sharing
> ---- snap
> The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
> 
> (Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)
> 
> [reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
> 
> I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is the result:
> 
> cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit
> commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad)
> Author: Thomas Zimmermann <tzimmermann@suse.de>
> Date:   Mon Jul 18 09:23:18 2022 +0200
> 
>     video/aperture: Disable and unregister sysfb devices via aperture helpers
>     
>     [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ]
>     
>     Call sysfb_disable() before removing conflicting devices in aperture
>     helpers. Fixes sysfb state if fbdev has been disabled.
>     
>     Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>     Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
>     Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")
> 
> [reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
> 
> Link to the suspect patch:
> 
> https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmermann@suse.de
> (or https://patchwork.freedesktop.org/patch/494608/)
> 
> [reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
> 
> Okay, so I reverted v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.
> 
> I always logged out immediately, which worked (even though everything is very very sluggish). Also, when I killed the X session within a couple of seconds (15 or so), no error was shown (I used "systemctl stop sddm" from another virtual console).
> 
> Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it was sluggish. The kernel compiled alright. When it was finished I moved the mouse to reboot, at which point it completely froze and I had to hard-reset the system.
> 
> While still running, after > 15 seconds, the fault looked like this (dmesg):
> ---- snap ----
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x00000008
> Call Trace:
>  <TASK>
>  ? commit_tail+0xd7/0x130
>  ? drm_atomic_helper_commit+0x126/0x150
>  ? drm_atomic_commit+0xa4/0xe0
>  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>  ? drm_atomic_helper_dirtyfb+0x19e/0x280
>  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? drm_ioctl_kernel+0xc4/0x150
>  ? drm_ioctl+0x246/0x3f0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? __x64_sys_ioctl+0x91/0xd0
>  ? do_syscall_64+0x60/0xd0
>  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>  </TASK>
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 29 jiffies s: 165 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x00000008
> Call Trace:
>  <TASK>
>  ? commit_tail+0xd7/0x130
>  ? drm_atomic_helper_commit+0x126/0x150
>  ? drm_atomic_commit+0xa4/0xe0
>  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>  ? drm_atomic_helper_dirtyfb+0x19e/0x280
>  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? drm_ioctl_kernel+0xc4/0x150
>  ? drm_ioctl+0x246/0x3f0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? __x64_sys_ioctl+0x91/0xd0
>  ? do_syscall_64+0x60/0xd0
>  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>  </TASK>
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 8 jiffies s: 169 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
> Call Trace:
>  <TASK>
>  ? memcpy_toio+0x76/0xc0
>  ? drm_fb_memcpy_toio+0x76/0xb0
>  ? drm_fb_blit_toio+0x75/0x2b0
>  ? simpledrm_simple_display_pipe_update+0x132/0x150
>  ? drm_atomic_helper_commit_planes+0xb6/0x230
>  ? drm_atomic_helper_commit_tail+0x44/0x80
>  ? commit_tail+0xd7/0x130
>  ? drm_atomic_helper_commit+0x126/0x150
>  ? drm_atomic_commit+0xa4/0xe0
>  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>  ? drm_atomic_helper_dirtyfb+0x19e/0x280
>  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? drm_ioctl_kernel+0xc4/0x150
>  ? drm_ioctl+0x246/0x3f0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? __x64_sys_ioctl+0x91/0xd0
>  ? do_syscall_64+0x60/0xd0
>  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>  </TASK>
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
> Call Trace:
>  <TASK>
>  ? memcpy_toio+0x76/0xc0
>  ? memcpy_toio+0x1b/0xc0
>  ? drm_fb_memcpy_toio+0x76/0xb0
>  ? drm_fb_blit_toio+0x75/0x2b0
>  ? simpledrm_simple_display_pipe_update+0x132/0x150
>  ? drm_atomic_helper_commit_planes+0xb6/0x230
>  ? drm_atomic_helper_commit_tail+0x44/0x80
>  ? commit_tail+0xd7/0x130
>  ? drm_atomic_helper_commit+0x126/0x150
>  ? drm_atomic_commit+0xa4/0xe0
>  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>  ? drm_atomic_helper_dirtyfb+0x19e/0x280
>  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? drm_ioctl_kernel+0xc4/0x150
>  ? drm_ioctl+0x246/0x3f0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? __x64_sys_ioctl+0x91/0xd0
>  ? do_syscall_64+0x60/0xd0
>  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>  </TASK>
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 52 jiffies s: 169 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
> Call Trace:
>  <TASK>
>  ? memcpy_toio+0x76/0xc0
>  ? memcpy_toio+0x1b/0xc0
>  ? drm_fb_memcpy_toio+0x76/0xb0
>  ? drm_fb_blit_toio+0x75/0x2b0
>  ? simpledrm_simple_display_pipe_update+0x132/0x150
>  ? drm_atomic_helper_commit_planes+0xb6/0x230
>  ? drm_atomic_helper_commit_tail+0x44/0x80
>  ? commit_tail+0xd7/0x130
>  ? drm_atomic_helper_commit+0x126/0x150
>  ? drm_atomic_commit+0xa4/0xe0
>  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>  ? drm_atomic_helper_dirtyfb+0x19e/0x280
>  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? drm_ioctl_kernel+0xc4/0x150
>  ? drm_ioctl+0x246/0x3f0
>  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>  ? __x64_sys_ioctl+0x91/0xd0
>  ? do_syscall_64+0x60/0xd0
>  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>  </TASK>
> traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1 sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]
> 

See the ticket for more details.

BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: cfecfc98a78d9
https://bugzilla.kernel.org/show_bug.cgi?id=216616
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-23  8:04 [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers" Thorsten Leemhuis
@ 2022-10-24 10:26 ` Thomas Zimmermann
  2022-10-24 10:41   ` Thorsten Leemhuis
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Zimmermann @ 2022-10-24 10:26 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Sasha Levin, regressions, Andreas, Javier Martinez Canillas,
	ML dri-devel, stable, Greg KH


[-- Attachment #1.1: Type: text/plain, Size: 10177 bytes --]

Hi

Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
> Hi, this is your Linux kernel regression tracker speaking.
> 
> I noticed a regression report in bugzilla.kernel.org. As many (most?)
> kernel developer don't keep an eye on it, I decided to forward it by
> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
> 
>>   Andreas 2022-10-22 14:25:32 UTC
>>
>> Created attachment 303074 [details]
>> dmesg

I've looked at the kernel log and found that simpledrm has been loaded 
*after* amdgpu, which should never happen. The problematic patch has 
been taken from a long list of refactoring work on this code. No wonder 
that it doesn't work as expected.

Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove 
remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and 
report on the results. It should fix the problem.

Best regards
Thomas


>>
>> 6.0.2 works.
>>
>> On 6.0.3 the system is very sluggish with graphic glitches all over the place in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also sluggish). SDDM works fine.
>>
>> Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary nvidia driver).
>>
>> [reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
>>
>> Created attachment 303075 [details]
>> my kernel .config for 6.0.3
>>
>> Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my .config for 6.0.2.
>>
>> [reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
>>
>> In /var/log/Xorg.0.log the only obvious difference is the last line:
>> ---- snap
>> randr: falling back to unsynchronized pixmap sharing
>> ---- snap
>> The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
>>
>> (Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)
>>
>> [reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
>>
>> I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is the result:
>>
>> cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit
>> commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad)
>> Author: Thomas Zimmermann <tzimmermann@suse.de>
>> Date:   Mon Jul 18 09:23:18 2022 +0200
>>
>>      video/aperture: Disable and unregister sysfb devices via aperture helpers
>>      
>>      [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ]
>>      
>>      Call sysfb_disable() before removing conflicting devices in aperture
>>      helpers. Fixes sysfb state if fbdev has been disabled.
>>      
>>      Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>      Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
>>      Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")
>>
>> [reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
>>
>> Link to the suspect patch:
>>
>> https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmermann@suse.de
>> (or https://patchwork.freedesktop.org/patch/494608/)
>>
>> [reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
>>
>> Okay, so I reverted v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.
>>
>> I always logged out immediately, which worked (even though everything is very very sluggish). Also, when I killed the X session within a couple of seconds (15 or so), no error was shown (I used "systemctl stop sddm" from another virtual console).
>>
>> Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it was sluggish. The kernel compiled alright. When it was finished I moved the mouse to reboot, at which point it completely froze and I had to hard-reset the system.
>>
>> While still running, after > 15 seconds, the fault looked like this (dmesg):
>> ---- snap ----
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x00000008
>> Call Trace:
>>   <TASK>
>>   ? commit_tail+0xd7/0x130
>>   ? drm_atomic_helper_commit+0x126/0x150
>>   ? drm_atomic_commit+0xa4/0xe0
>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? drm_ioctl_kernel+0xc4/0x150
>>   ? drm_ioctl+0x246/0x3f0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? __x64_sys_ioctl+0x91/0xd0
>>   ? do_syscall_64+0x60/0xd0
>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>   </TASK>
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 29 jiffies s: 165 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x00000008
>> Call Trace:
>>   <TASK>
>>   ? commit_tail+0xd7/0x130
>>   ? drm_atomic_helper_commit+0x126/0x150
>>   ? drm_atomic_commit+0xa4/0xe0
>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? drm_ioctl_kernel+0xc4/0x150
>>   ? drm_ioctl+0x246/0x3f0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? __x64_sys_ioctl+0x91/0xd0
>>   ? do_syscall_64+0x60/0xd0
>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>   </TASK>
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 8 jiffies s: 169 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
>> Call Trace:
>>   <TASK>
>>   ? memcpy_toio+0x76/0xc0
>>   ? drm_fb_memcpy_toio+0x76/0xb0
>>   ? drm_fb_blit_toio+0x75/0x2b0
>>   ? simpledrm_simple_display_pipe_update+0x132/0x150
>>   ? drm_atomic_helper_commit_planes+0xb6/0x230
>>   ? drm_atomic_helper_commit_tail+0x44/0x80
>>   ? commit_tail+0xd7/0x130
>>   ? drm_atomic_helper_commit+0x126/0x150
>>   ? drm_atomic_commit+0xa4/0xe0
>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? drm_ioctl_kernel+0xc4/0x150
>>   ? drm_ioctl+0x246/0x3f0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? __x64_sys_ioctl+0x91/0xd0
>>   ? do_syscall_64+0x60/0xd0
>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>   </TASK>
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
>> Call Trace:
>>   <TASK>
>>   ? memcpy_toio+0x76/0xc0
>>   ? memcpy_toio+0x1b/0xc0
>>   ? drm_fb_memcpy_toio+0x76/0xb0
>>   ? drm_fb_blit_toio+0x75/0x2b0
>>   ? simpledrm_simple_display_pipe_update+0x132/0x150
>>   ? drm_atomic_helper_commit_planes+0xb6/0x230
>>   ? drm_atomic_helper_commit_tail+0x44/0x80
>>   ? commit_tail+0xd7/0x130
>>   ? drm_atomic_helper_commit+0x126/0x150
>>   ? drm_atomic_commit+0xa4/0xe0
>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? drm_ioctl_kernel+0xc4/0x150
>>   ? drm_ioctl+0x246/0x3f0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? __x64_sys_ioctl+0x91/0xd0
>>   ? do_syscall_64+0x60/0xd0
>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>   </TASK>
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 52 jiffies s: 169 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
>> Call Trace:
>>   <TASK>
>>   ? memcpy_toio+0x76/0xc0
>>   ? memcpy_toio+0x1b/0xc0
>>   ? drm_fb_memcpy_toio+0x76/0xb0
>>   ? drm_fb_blit_toio+0x75/0x2b0
>>   ? simpledrm_simple_display_pipe_update+0x132/0x150
>>   ? drm_atomic_helper_commit_planes+0xb6/0x230
>>   ? drm_atomic_helper_commit_tail+0x44/0x80
>>   ? commit_tail+0xd7/0x130
>>   ? drm_atomic_helper_commit+0x126/0x150
>>   ? drm_atomic_commit+0xa4/0xe0
>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? drm_ioctl_kernel+0xc4/0x150
>>   ? drm_ioctl+0x246/0x3f0
>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>   ? __x64_sys_ioctl+0x91/0xd0
>>   ? do_syscall_64+0x60/0xd0
>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>   </TASK>
>> traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1 sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]
>>
> 
> See the ticket for more details.
> 
> BTW, let me use this mail to also add the report to the list of tracked
> regressions to ensure it's doesn't fall through the cracks:
> 
> #regzbot introduced: cfecfc98a78d9
> https://bugzilla.kernel.org/show_bug.cgi?id=216616
> #regzbot ignore-activity
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-24 10:26 ` Thomas Zimmermann
@ 2022-10-24 10:41   ` Thorsten Leemhuis
  2022-10-24 11:27     ` Greg KH
  0 siblings, 1 reply; 11+ messages in thread
From: Thorsten Leemhuis @ 2022-10-24 10:41 UTC (permalink / raw)
  To: Greg KH, Andreas
  Cc: Sasha Levin, regressions, Thomas Zimmermann,
	Javier Martinez Canillas, ML dri-devel, stable

Hi! Thx for the reply.

On 24.10.22 12:26, Thomas Zimmermann wrote:
> Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
>>
>> I noticed a regression report in bugzilla.kernel.org. As many (most?)
>> kernel developer don't keep an eye on it, I decided to forward it by
>> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
>>
>>>   Andreas 2022-10-22 14:25:32 UTC
>>>
>>> Created attachment 303074 [details]
>>> dmesg
> 
> I've looked at the kernel log and found that simpledrm has been loaded
> *after* amdgpu, which should never happen. The problematic patch has
> been taken from a long list of refactoring work on this code. No wonder
> that it doesn't work as expected.
> 
> Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
> remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and
> report on the results. It should fix the problem.

Greg, is that enough for you to pick this up? Or do you want Andreas to
test first if it really fixes the reported problem?

Ciao, Thorsten


>>> 6.0.2 works.
>>>
>>> On 6.0.3 the system is very sluggish with graphic glitches all over
>>> the place in KDE Plasma Desktop X11 (no graphic glitches when using
>>> Wayland, but also sluggish). SDDM works fine.
>>>
>>> Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne",
>>> hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and
>>> Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm
>>> not using the proprietary nvidia driver).
>>>
>>> [reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
>>>
>>> Created attachment 303075 [details]
>>> my kernel .config for 6.0.3
>>>
>>> Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical
>>> as my .config for 6.0.2.
>>>
>>> [reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
>>>
>>> In /var/log/Xorg.0.log the only obvious difference is the last line:
>>> ---- snap
>>> randr: falling back to unsynchronized pixmap sharing
>>> ---- snap
>>> The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
>>>
>>> (Obviously this is when I login to KDE with X11, not with Wayland,
>>> from SDDM.)
>>>
>>> [reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
>>>
>>> I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good,
>>> this is the result:
>>>
>>> cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit
>>> commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad)
>>> Author: Thomas Zimmermann <tzimmermann@suse.de>
>>> Date:   Mon Jul 18 09:23:18 2022 +0200
>>>
>>>      video/aperture: Disable and unregister sysfb devices via
>>> aperture helpers
>>>           [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ]
>>>           Call sysfb_disable() before removing conflicting devices in
>>> aperture
>>>      helpers. Fixes sysfb state if fbdev has been disabled.
>>>           Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>>      Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
>>>      Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before
>>> internal helpers")
>>>
>>> [reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
>>>
>>> Link to the suspect patch:
>>>
>>> https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmermann@suse.de
>>> (or https://patchwork.freedesktop.org/patch/494608/)
>>>
>>> [reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
>>>
>>> Okay, so I reverted
>>> v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.
>>>
>>> I always logged out immediately, which worked (even though everything
>>> is very very sluggish). Also, when I killed the X session within a
>>> couple of seconds (15 or so), no error was shown (I used "systemctl
>>> stop sddm" from another virtual console).
>>>
>>> Noteworthy: I once compiled a kernel from within the Plasma Desktop,
>>> while it was sluggish. The kernel compiled alright. When it was
>>> finished I moved the mouse to reboot, at which point it completely
>>> froze and I had to hard-reset the system.
>>>
>>> While still running, after > 15 seconds, the fault looked like this
>>> (dmesg):
>>> ---- snap ----
>>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
>>> 13-.... } 7 jiffies s: 165 root: 0x2000/.
>>> rcu: blocking rcu_node structures (internal RCU debug):
>>> Task dump for CPU 13:
>>> task:X               state:R  running task     stack:    0 pid: 4242
>>> ppid:  4228 flags:0x00000008
>>> Call Trace:
>>>   <TASK>
>>>   ? commit_tail+0xd7/0x130
>>>   ? drm_atomic_helper_commit+0x126/0x150
>>>   ? drm_atomic_commit+0xa4/0xe0
>>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? drm_ioctl_kernel+0xc4/0x150
>>>   ? drm_ioctl+0x246/0x3f0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? __x64_sys_ioctl+0x91/0xd0
>>>   ? do_syscall_64+0x60/0xd0
>>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>>   </TASK>
>>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
>>> 13-.... } 29 jiffies s: 165 root: 0x2000/.
>>> rcu: blocking rcu_node structures (internal RCU debug):
>>> Task dump for CPU 13:
>>> task:X               state:R  running task     stack:    0 pid: 4242
>>> ppid:  4228 flags:0x00000008
>>> Call Trace:
>>>   <TASK>
>>>   ? commit_tail+0xd7/0x130
>>>   ? drm_atomic_helper_commit+0x126/0x150
>>>   ? drm_atomic_commit+0xa4/0xe0
>>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? drm_ioctl_kernel+0xc4/0x150
>>>   ? drm_ioctl+0x246/0x3f0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? __x64_sys_ioctl+0x91/0xd0
>>>   ? do_syscall_64+0x60/0xd0
>>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>>   </TASK>
>>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
>>> 13-.... } 8 jiffies s: 169 root: 0x2000/.
>>> rcu: blocking rcu_node structures (internal RCU debug):
>>> Task dump for CPU 13:
>>> task:X               state:R  running task     stack:    0 pid: 4242
>>> ppid:  4228 flags:0x0000400e
>>> Call Trace:
>>>   <TASK>
>>>   ? memcpy_toio+0x76/0xc0
>>>   ? drm_fb_memcpy_toio+0x76/0xb0
>>>   ? drm_fb_blit_toio+0x75/0x2b0
>>>   ? simpledrm_simple_display_pipe_update+0x132/0x150
>>>   ? drm_atomic_helper_commit_planes+0xb6/0x230
>>>   ? drm_atomic_helper_commit_tail+0x44/0x80
>>>   ? commit_tail+0xd7/0x130
>>>   ? drm_atomic_helper_commit+0x126/0x150
>>>   ? drm_atomic_commit+0xa4/0xe0
>>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? drm_ioctl_kernel+0xc4/0x150
>>>   ? drm_ioctl+0x246/0x3f0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? __x64_sys_ioctl+0x91/0xd0
>>>   ? do_syscall_64+0x60/0xd0
>>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>>   </TASK>
>>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
>>> 13-.... } 30 jiffies s: 169 root: 0x2000/.
>>> rcu: blocking rcu_node structures (internal RCU debug):
>>> Task dump for CPU 13:
>>> task:X               state:R  running task     stack:    0 pid: 4242
>>> ppid:  4228 flags:0x0000400e
>>> Call Trace:
>>>   <TASK>
>>>   ? memcpy_toio+0x76/0xc0
>>>   ? memcpy_toio+0x1b/0xc0
>>>   ? drm_fb_memcpy_toio+0x76/0xb0
>>>   ? drm_fb_blit_toio+0x75/0x2b0
>>>   ? simpledrm_simple_display_pipe_update+0x132/0x150
>>>   ? drm_atomic_helper_commit_planes+0xb6/0x230
>>>   ? drm_atomic_helper_commit_tail+0x44/0x80
>>>   ? commit_tail+0xd7/0x130
>>>   ? drm_atomic_helper_commit+0x126/0x150
>>>   ? drm_atomic_commit+0xa4/0xe0
>>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? drm_ioctl_kernel+0xc4/0x150
>>>   ? drm_ioctl+0x246/0x3f0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? __x64_sys_ioctl+0x91/0xd0
>>>   ? do_syscall_64+0x60/0xd0
>>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>>   </TASK>
>>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
>>> 13-.... } 52 jiffies s: 169 root: 0x2000/.
>>> rcu: blocking rcu_node structures (internal RCU debug):
>>> Task dump for CPU 13:
>>> task:X               state:R  running task     stack:    0 pid: 4242
>>> ppid:  4228 flags:0x0000400e
>>> Call Trace:
>>>   <TASK>
>>>   ? memcpy_toio+0x76/0xc0
>>>   ? memcpy_toio+0x1b/0xc0
>>>   ? drm_fb_memcpy_toio+0x76/0xb0
>>>   ? drm_fb_blit_toio+0x75/0x2b0
>>>   ? simpledrm_simple_display_pipe_update+0x132/0x150
>>>   ? drm_atomic_helper_commit_planes+0xb6/0x230
>>>   ? drm_atomic_helper_commit_tail+0x44/0x80
>>>   ? commit_tail+0xd7/0x130
>>>   ? drm_atomic_helper_commit+0x126/0x150
>>>   ? drm_atomic_commit+0xa4/0xe0
>>>   ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>>>   ? drm_atomic_helper_dirtyfb+0x19e/0x280
>>>   ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? drm_ioctl_kernel+0xc4/0x150
>>>   ? drm_ioctl+0x246/0x3f0
>>>   ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>>>   ? __x64_sys_ioctl+0x91/0xd0
>>>   ? do_syscall_64+0x60/0xd0
>>>   ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>>>   </TASK>
>>> traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1
>>> sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]
>>>
>>
>> See the ticket for more details.
>>
>> BTW, let me use this mail to also add the report to the list of tracked
>> regressions to ensure it's doesn't fall through the cracks:
>>
>> #regzbot introduced: cfecfc98a78d9
>> https://bugzilla.kernel.org/show_bug.cgi?id=216616
>> #regzbot ignore-activity
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>
>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>> reports and sometimes miss something important when writing mails like
>> this. If that's the case here, don't hesitate to tell me in a public
>> reply, it's in everyone's interest to set the public record straight.
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-24 10:41   ` Thorsten Leemhuis
@ 2022-10-24 11:27     ` Greg KH
  2022-10-24 11:31       ` Thomas Zimmermann
  0 siblings, 1 reply; 11+ messages in thread
From: Greg KH @ 2022-10-24 11:27 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Sasha Levin, regressions, Thomas Zimmermann, Andreas,
	Javier Martinez Canillas, ML dri-devel, stable

On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
> Hi! Thx for the reply.
> 
> On 24.10.22 12:26, Thomas Zimmermann wrote:
> > Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
> >>
> >> I noticed a regression report in bugzilla.kernel.org. As many (most?)
> >> kernel developer don't keep an eye on it, I decided to forward it by
> >> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
> >>
> >>>   Andreas 2022-10-22 14:25:32 UTC
> >>>
> >>> Created attachment 303074 [details]
> >>> dmesg
> > 
> > I've looked at the kernel log and found that simpledrm has been loaded
> > *after* amdgpu, which should never happen. The problematic patch has
> > been taken from a long list of refactoring work on this code. No wonder
> > that it doesn't work as expected.
> > 
> > Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
> > remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and
> > report on the results. It should fix the problem.
> 
> Greg, is that enough for you to pick this up? Or do you want Andreas to
> test first if it really fixes the reported problem?

This should be good enough.  If this does NOT fix the issue, please let
me know.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-24 11:27     ` Greg KH
@ 2022-10-24 11:31       ` Thomas Zimmermann
  2022-10-24 16:19         ` Andreas Thalhammer
  2022-10-24 16:53         ` Andreas Thalhammer
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Zimmermann @ 2022-10-24 11:31 UTC (permalink / raw)
  To: Greg KH, Thorsten Leemhuis
  Cc: Sasha Levin, regressions, Javier Martinez Canillas, stable,
	ML dri-devel, Andreas


[-- Attachment #1.1: Type: text/plain, Size: 1636 bytes --]

Hi

Am 24.10.22 um 13:27 schrieb Greg KH:
> On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
>> Hi! Thx for the reply.
>>
>> On 24.10.22 12:26, Thomas Zimmermann wrote:
>>> Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
>>>>
>>>> I noticed a regression report in bugzilla.kernel.org. As many (most?)
>>>> kernel developer don't keep an eye on it, I decided to forward it by
>>>> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
>>>>
>>>>>    Andreas 2022-10-22 14:25:32 UTC
>>>>>
>>>>> Created attachment 303074 [details]
>>>>> dmesg
>>>
>>> I've looked at the kernel log and found that simpledrm has been loaded
>>> *after* amdgpu, which should never happen. The problematic patch has
>>> been taken from a long list of refactoring work on this code. No wonder
>>> that it doesn't work as expected.
>>>
>>> Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
>>> remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and
>>> report on the results. It should fix the problem.
>>
>> Greg, is that enough for you to pick this up? Or do you want Andreas to
>> test first if it really fixes the reported problem?
> 
> This should be good enough.  If this does NOT fix the issue, please let
> me know.

Thanks a lot. I think I can provided a dedicated fix if the proposed 
commit doesn't work.

Best regards
Thomas

> 
> thanks,
> 
> greg k-h

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-24 11:31       ` Thomas Zimmermann
@ 2022-10-24 16:19         ` Andreas Thalhammer
  2022-10-25  8:16           ` Thomas Zimmermann
  2022-10-24 16:53         ` Andreas Thalhammer
  1 sibling, 1 reply; 11+ messages in thread
From: Andreas Thalhammer @ 2022-10-24 16:19 UTC (permalink / raw)
  To: Thomas Zimmermann, Greg KH, Thorsten Leemhuis
  Cc: Sasha Levin, stable, regressions, ML dri-devel, Javier Martinez Canillas

Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:
> Hi
>
> Am 24.10.22 um 13:27 schrieb Greg KH:
>> On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
>>> Hi! Thx for the reply.
>>>
>>> On 24.10.22 12:26, Thomas Zimmermann wrote:
>>>> Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
>>>>>
>>>>> I noticed a regression report in bugzilla.kernel.org. As many (most?)
>>>>> kernel developer don't keep an eye on it, I decided to forward it by
>>>>> mail. Quoting from
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
>>>>>
>>>>>>    Andreas 2022-10-22 14:25:32 UTC
>>>>>>
>>>>>> Created attachment 303074 [details]
>>>>>> dmesg
>>>>
>>>> I've looked at the kernel log and found that simpledrm has been loaded
>>>> *after* amdgpu, which should never happen. The problematic patch has
>>>> been taken from a long list of refactoring work on this code. No wonder
>>>> that it doesn't work as expected.
>>>>
>>>> Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
>>>> remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and
>>>> report on the results. It should fix the problem.
>>>
>>> Greg, is that enough for you to pick this up? Or do you want Andreas to
>>> test first if it really fixes the reported problem?
>>
>> This should be good enough.  If this does NOT fix the issue, please let
>> me know.
>
> Thanks a lot. I think I can provided a dedicated fix if the proposed
> commit doesn't work.
>
> Best regards
> Thomas
>
>>
>> thanks,
>>
>> greg k-h
>

Thanks... In short: the additional patch did NOT fix the problem.

I don't use git and I don't know how to /cherry-pick commit/
9d69ef183815, but I found the patch here:
https://patchwork.freedesktop.org/patch/494609/

I hope that's the right one. I reintegrated
v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch
and also applied
v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch,
did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel
(same .config).

Now the system doesn't even boot to a console. The first boot got me to
a rcu_shed stall on CPUs/tasks, same as above, but this time with:
Workqueue: btrfs-cache btrfs_work_helper

I booted a second time with the same kernel, and it got stuck after
mounting the root btrfs filesystem (what looked like a total freeze, but
when it didn't show a rcu_stall message after ~2 min I got impatient and
wanted to see if I had just busted my root filesystem...)

I booted 6.0.2 and everything is fine. (I'm very glad! I definitely
should update my backup right away!)

I will try 6.1-rc1 next, bear with...


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-24 11:31       ` Thomas Zimmermann
  2022-10-24 16:19         ` Andreas Thalhammer
@ 2022-10-24 16:53         ` Andreas Thalhammer
  1 sibling, 0 replies; 11+ messages in thread
From: Andreas Thalhammer @ 2022-10-24 16:53 UTC (permalink / raw)
  To: Thomas Zimmermann, Greg KH, Thorsten Leemhuis
  Cc: Sasha Levin, stable, regressions, ML dri-devel, Javier Martinez Canillas

Just tested with 6.1-rc2 (tarball from kernel.org), which works.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-24 16:19         ` Andreas Thalhammer
@ 2022-10-25  8:16           ` Thomas Zimmermann
  2022-10-25  8:45             ` Andreas Thalhammer
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Zimmermann @ 2022-10-25  8:16 UTC (permalink / raw)
  To: andreas.thalhammer, Greg KH, Thorsten Leemhuis
  Cc: Sasha Levin, stable, regressions, ML dri-devel, Javier Martinez Canillas


[-- Attachment #1.1.1: Type: text/plain, Size: 3413 bytes --]

Hi Andreas

Am 24.10.22 um 18:19 schrieb Andreas Thalhammer:
> Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:
>> Hi
>>
>> Am 24.10.22 um 13:27 schrieb Greg KH:
>>> On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
>>>> Hi! Thx for the reply.
>>>>
>>>> On 24.10.22 12:26, Thomas Zimmermann wrote:
>>>>> Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
>>>>>>
>>>>>> I noticed a regression report in bugzilla.kernel.org. As many (most?)
>>>>>> kernel developer don't keep an eye on it, I decided to forward it by
>>>>>> mail. Quoting from
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
>>>>>>
>>>>>>>    Andreas 2022-10-22 14:25:32 UTC
>>>>>>>
>>>>>>> Created attachment 303074 [details]
>>>>>>> dmesg
>>>>>
>>>>> I've looked at the kernel log and found that simpledrm has been loaded
>>>>> *after* amdgpu, which should never happen. The problematic patch has
>>>>> been taken from a long list of refactoring work on this code. No 
>>>>> wonder
>>>>> that it doesn't work as expected.
>>>>>
>>>>> Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
>>>>> remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and
>>>>> report on the results. It should fix the problem.
>>>>
>>>> Greg, is that enough for you to pick this up? Or do you want Andreas to
>>>> test first if it really fixes the reported problem?
>>>
>>> This should be good enough.  If this does NOT fix the issue, please let
>>> me know.
>>
>> Thanks a lot. I think I can provided a dedicated fix if the proposed
>> commit doesn't work.
>>
>> Best regards
>> Thomas
>>
>>>
>>> thanks,
>>>
>>> greg k-h
>>
> 
> Thanks... In short: the additional patch did NOT fix the problem.

Yeah, it's also part of a larger changeset. But I wouldn't want to 
backport all those changes either.

Attached is a simple patch for linux-stable that adds the necessary fix. 
If this still doesn't work, we should probably revert the problematic patch.

Please test the patch and let me know if it works.

Best regards
Thomas

> 
> I don't use git and I don't know how to /cherry-pick commit/
> 9d69ef183815, but I found the patch here:
> https://patchwork.freedesktop.org/patch/494609/
> 
> I hope that's the right one. I reintegrated
> v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch
> and also applied
> v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch,
> did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel
> (same .config).
> 
> Now the system doesn't even boot to a console. The first boot got me to
> a rcu_shed stall on CPUs/tasks, same as above, but this time with:
> Workqueue: btrfs-cache btrfs_work_helper
> 
> I booted a second time with the same kernel, and it got stuck after
> mounting the root btrfs filesystem (what looked like a total freeze, but
> when it didn't show a rcu_stall message after ~2 min I got impatient and
> wanted to see if I had just busted my root filesystem...)
> 
> I booted 6.0.2 and everything is fine. (I'm very glad! I definitely
> should update my backup right away!)
> 
> I will try 6.1-rc1 next, bear with...
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #1.1.2: 0001-video-aperture-Call-sysfb_disable-before-removing-PC.patch --]
[-- Type: text/x-patch, Size: 4567 bytes --]

From ba55e238e64817a2369a267153a5b980683465a1 Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann@suse.de>
Date: Tue, 25 Oct 2022 09:38:44 +0200
Subject: [PATCH] video/aperture: Call sysfb_disable() before removing PCI
 devices

Call sysfb_disable() from aperture_remove_conflicting_pci_devices()
before removing PCI devices. Without, simpledrm can still bind to
simple-framebuffer devices after the hardware driver has taken over
the hardware. Both drivers interfere with each other and results are
undefined.

Reported modesetting errors are shown below.

---- snap ----
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/.
rcu: blocking rcu_node structures (internal RCU debug):
Task dump for CPU 13:
task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x00000008
Call Trace:
 <TASK>
 ? commit_tail+0xd7/0x130
 ? drm_atomic_helper_commit+0x126/0x150
 ? drm_atomic_commit+0xa4/0xe0
 ? drm_plane_get_damage_clips.cold+0x1c/0x1c
 ? drm_atomic_helper_dirtyfb+0x19e/0x280
 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
 ? drm_ioctl_kernel+0xc4/0x150
 ? drm_ioctl+0x246/0x3f0
 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
 ? __x64_sys_ioctl+0x91/0xd0
 ? do_syscall_64+0x60/0xd0
 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
 </TASK>
...
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/.
rcu: blocking rcu_node structures (internal RCU debug):
Task dump for CPU 13:
task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
Call Trace:
 <TASK>
 ? memcpy_toio+0x76/0xc0
 ? memcpy_toio+0x1b/0xc0
 ? drm_fb_memcpy_toio+0x76/0xb0
 ? drm_fb_blit_toio+0x75/0x2b0
 ? simpledrm_simple_display_pipe_update+0x132/0x150
 ? drm_atomic_helper_commit_planes+0xb6/0x230
 ? drm_atomic_helper_commit_tail+0x44/0x80
 ? commit_tail+0xd7/0x130
 ? drm_atomic_helper_commit+0x126/0x150
 ? drm_atomic_commit+0xa4/0xe0
 ? drm_plane_get_damage_clips.cold+0x1c/0x1c
 ? drm_atomic_helper_dirtyfb+0x19e/0x280
 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
 ? drm_ioctl_kernel+0xc4/0x150
 ? drm_ioctl+0x246/0x3f0
 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
 ? __x64_sys_ioctl+0x91/0xd0
 ? do_syscall_64+0x60/0xd0
 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
 </TASK>

The problem was introduced by backporting commit 5e0137612430
("video/aperture: Disable and unregister sysfb devices via aperture
 helpers") to v6.0.3 and does not exist in the mainline branch.

Reported-by: Andreas Thalhammer <andreas.thalhammer-linux@gmx.net>
Reported-by: Thorsten Leemhuis <regressions@leemhuis.info>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: cfecfc98a78d ("video/aperture: Disable and unregister sysfb devices via aperture helpers")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Changcheng Deng <deng.changcheng@zte.com.cn>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: dri-devel@lists.freedesktop.org
Cc: Sasha Levin <sashal@kernel.org>
Cc: linux-fbdev@vger.kernel.org
Cc: <stable@vger.kernel.org> # v6.0.3+
Link: https://lore.kernel.org/dri-devel/d6afe54b-f8d7-beb2-3609-186e566cbfac@gmx.net/T/#t
---
 drivers/video/aperture.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index d245826a9324d..cc6427a091bc7 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -338,6 +338,17 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
 	resource_size_t base, size;
 	int bar, ret;
 
+	/*
+	 * If a driver asked to unregister a platform device registered by
+	 * sysfb, then can be assumed that this is a driver for a display
+	 * that is set up by the system firmware and has a generic driver.
+	 *
+	 * Drivers for devices that don't have a generic driver will never
+	 * ask for this, so let's assume that a real driver for the display
+	 * was already probed and prevent sysfb to register devices later.
+	 */
+	sysfb_disable();
+
 	/*
 	 * WARNING: Apparently we must kick fbdev drivers before vgacon,
 	 * otherwise the vga fbdev driver falls over.
-- 
2.38.0


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-25  8:16           ` Thomas Zimmermann
@ 2022-10-25  8:45             ` Andreas Thalhammer
  2022-10-25  9:21               ` Thomas Zimmermann
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Thalhammer @ 2022-10-25  8:45 UTC (permalink / raw)
  To: Thomas Zimmermann, Greg KH, Thorsten Leemhuis
  Cc: Sasha Levin, stable, regressions, ML dri-devel, Javier Martinez Canillas

Am 25.10.22 um 10:16 schrieb Thomas Zimmermann:
> Hi Andreas
>
> Am 24.10.22 um 18:19 schrieb Andreas Thalhammer:
>> Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:
>>> Hi
>>>
>>> Am 24.10.22 um 13:27 schrieb Greg KH:
>>>> On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
>>>>> Hi! Thx for the reply.
>>>>>
>>>>> On 24.10.22 12:26, Thomas Zimmermann wrote:
>>>>>> Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
>>>>>>>
>>>>>>> I noticed a regression report in bugzilla.kernel.org. As many
>>>>>>> (most?)
>>>>>>> kernel developer don't keep an eye on it, I decided to forward it by
>>>>>>> mail. Quoting from
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
>>>>>>>
>>>>>>>>    Andreas 2022-10-22 14:25:32 UTC
>>>>>>>>
>>>>>>>> Created attachment 303074 [details]
>>>>>>>> dmesg
>>>>>>
>>>>>> I've looked at the kernel log and found that simpledrm has been
>>>>>> loaded
>>>>>> *after* amdgpu, which should never happen. The problematic patch has
>>>>>> been taken from a long list of refactoring work on this code. No
>>>>>> wonder
>>>>>> that it doesn't work as expected.
>>>>>>
>>>>>> Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
>>>>>> remove_conflicting_pci_framebuffers()") into the 6.0 stable branch
>>>>>> and
>>>>>> report on the results. It should fix the problem.
>>>>>
>>>>> Greg, is that enough for you to pick this up? Or do you want
>>>>> Andreas to
>>>>> test first if it really fixes the reported problem?
>>>>
>>>> This should be good enough.  If this does NOT fix the issue, please let
>>>> me know.
>>>
>>> Thanks a lot. I think I can provided a dedicated fix if the proposed
>>> commit doesn't work.
>>>
>>> Best regards
>>> Thomas
>>>
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>
>>
>> Thanks... In short: the additional patch did NOT fix the problem.
>
> Yeah, it's also part of a larger changeset. But I wouldn't want to
> backport all those changes either.
>
> Attached is a simple patch for linux-stable that adds the necessary fix.
> If this still doesn't work, we should probably revert the problematic
> patch.
>
> Please test the patch and let me know if it works.


Yes, this fixed the problem. I'm running 6.0.3 with your patch now, all
fine.

Thanks!
Andreas

>
> Best regards
> Thomas
>
>>
>> I don't use git and I don't know how to /cherry-pick commit/
>> 9d69ef183815, but I found the patch here:
>> https://patchwork.freedesktop.org/patch/494609/
>>
>> I hope that's the right one. I reintegrated
>> v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch
>> and also applied
>> v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch,
>> did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel
>> (same .config).
>>
>> Now the system doesn't even boot to a console. The first boot got me to
>> a rcu_shed stall on CPUs/tasks, same as above, but this time with:
>> Workqueue: btrfs-cache btrfs_work_helper
>>
>> I booted a second time with the same kernel, and it got stuck after
>> mounting the root btrfs filesystem (what looked like a total freeze, but
>> when it didn't show a rcu_stall message after ~2 min I got impatient and
>> wanted to see if I had just busted my root filesystem...)
>>
>> I booted 6.0.2 and everything is fine. (I'm very glad! I definitely
>> should update my backup right away!)
>>
>> I will try 6.1-rc1 next, bear with...
>>
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-25  8:45             ` Andreas Thalhammer
@ 2022-10-25  9:21               ` Thomas Zimmermann
  2022-10-25 10:25                 ` Greg KH
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Zimmermann @ 2022-10-25  9:21 UTC (permalink / raw)
  To: andreas.thalhammer, Greg KH, Thorsten Leemhuis
  Cc: Sasha Levin, ML dri-devel, regressions, stable, Javier Martinez Canillas


[-- Attachment #1.1: Type: text/plain, Size: 2115 bytes --]

Hi

Am 25.10.22 um 10:45 schrieb Andreas Thalhammer:
[...]
>> Yeah, it's also part of a larger changeset. But I wouldn't want to
>> backport all those changes either.
>>
>> Attached is a simple patch for linux-stable that adds the necessary fix.
>> If this still doesn't work, we should probably revert the problematic
>> patch.
>>
>> Please test the patch and let me know if it works.
> 
> 
> Yes, this fixed the problem. I'm running 6.0.3 with your patch now, all
> fine.

Thanks a lot for testing. If Greg doesn't already pick up the patch from 
this discussion, I'll send it to stable soonish; adding your Tested-by tag.

Best regards
Thomas

> 
> Thanks!
> Andreas
> 
>>
>> Best regards
>> Thomas
>>
>>>
>>> I don't use git and I don't know how to /cherry-pick commit/
>>> 9d69ef183815, but I found the patch here:
>>> https://patchwork.freedesktop.org/patch/494609/
>>>
>>> I hope that's the right one. I reintegrated
>>> v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch
>>> and also applied
>>> v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch,
>>> did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel
>>> (same .config).
>>>
>>> Now the system doesn't even boot to a console. The first boot got me to
>>> a rcu_shed stall on CPUs/tasks, same as above, but this time with:
>>> Workqueue: btrfs-cache btrfs_work_helper
>>>
>>> I booted a second time with the same kernel, and it got stuck after
>>> mounting the root btrfs filesystem (what looked like a total freeze, but
>>> when it didn't show a rcu_stall message after ~2 min I got impatient and
>>> wanted to see if I had just busted my root filesystem...)
>>>
>>> I booted 6.0.2 and everything is fine. (I'm very glad! I definitely
>>> should update my backup right away!)
>>>
>>> I will try 6.1-rc1 next, bear with...
>>>
>>
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
  2022-10-25  9:21               ` Thomas Zimmermann
@ 2022-10-25 10:25                 ` Greg KH
  0 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2022-10-25 10:25 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Sasha Levin, regressions, Javier Martinez Canillas, ML dri-devel,
	Thorsten Leemhuis, stable, andreas.thalhammer

On Tue, Oct 25, 2022 at 11:21:57AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 25.10.22 um 10:45 schrieb Andreas Thalhammer:
> [...]
> > > Yeah, it's also part of a larger changeset. But I wouldn't want to
> > > backport all those changes either.
> > > 
> > > Attached is a simple patch for linux-stable that adds the necessary fix.
> > > If this still doesn't work, we should probably revert the problematic
> > > patch.
> > > 
> > > Please test the patch and let me know if it works.
> > 
> > 
> > Yes, this fixed the problem. I'm running 6.0.3 with your patch now, all
> > fine.
> 
> Thanks a lot for testing. If Greg doesn't already pick up the patch from
> this discussion, I'll send it to stable soonish; adding your Tested-by tag.

Please send it as a real patch.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-10-25 10:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-23  8:04 [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers" Thorsten Leemhuis
2022-10-24 10:26 ` Thomas Zimmermann
2022-10-24 10:41   ` Thorsten Leemhuis
2022-10-24 11:27     ` Greg KH
2022-10-24 11:31       ` Thomas Zimmermann
2022-10-24 16:19         ` Andreas Thalhammer
2022-10-25  8:16           ` Thomas Zimmermann
2022-10-25  8:45             ` Andreas Thalhammer
2022-10-25  9:21               ` Thomas Zimmermann
2022-10-25 10:25                 ` Greg KH
2022-10-24 16:53         ` Andreas Thalhammer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).