regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
@ 2021-11-10 20:02 Ilya Trukhanov
  2021-11-10 22:24 ` Ard Biesheuvel
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ilya Trukhanov @ 2021-11-10 20:02 UTC (permalink / raw)
  To: stable
  Cc: regressions, linux-efi, linux-pm, javierm, tzimmermann, ardb,
	rafael, len.brown, pavel

Suspend-to-RAM with elogind under Wayland stopped working in 5.15.

This occurs with 5.15, 5.15.1 and latest master at
89d714ab6043bca7356b5c823f5335f5dce1f930. 5.14 and earlier releases work
fine.

git bisect gives d391c58271072d0b0fad93c82018d495b2633448.

To reproduce:
- Use elogind and Linux 5.15.1 with CONFIG_SYSFB_SIMPLEFB=n.
- Start a Wayland session. I tested sway and weston, neither worked.
- In a terminal emulator (I used alacritty) execute `loginctl suspend`.

Normally after the last step the system would suspend, but it no longer
does so after I upgraded to Linux 5.15. After running `loginctl suspend`
in dmesg I get the following:
[  103.098782] elogind-daemon[2357]: Suspending system...
[  103.098794] PM: suspend entry (deep)
[  103.124621] Filesystems sync: 0.025 seconds

But nothing happens afterwards.

Suspend works as expected if I do any of the following:
- Revert d391c58271072d0b0fad93c82018d495b2633448.
- Build with CONFIG_SYSFB_SIMPLEFB=y.
- Suspend from tty, even if a Wayland session is running in parallel.
- Suspend from under an X11 session.
- Suspend with `echo mem > /sys/power/state`.

If I attach strace to the elogind-daemon process after running
`loginctl suspend` then the system immediately suspends. However, if
I attach strace *prior* to running `loginctl suspend` then no suspend,
and the process gets stuck on a write syscall to `/sys/power/state`.

I "traced" a little bit with printk (sorry, I don't know of a better
way) and the call chain is as follows:
state_store -> pm_suspend -> enter_state -> suspend_prepare
-> pm_prepare_console -> vt_move_to_console -> vt_waitactive
-> __vt_event_wait

__vt_event_wait just waits until wait_event_interruptible completes, but
it never does (not until I attach to elogind-daemon with strace, at
least). I did not follow the chain further.

- Linux version 5.15.1 (lahvuun@lahvuun) (gcc (Gentoo 11.2.0 p1) 11.2.0,
  GNU ld (Gentoo 2.37_p1 p0) 2.37) #51 SMP PREEMPT Tue Nov 9 23:39:25
  EET 2021
- Gentoo Linux 2.8
- x86_64 AuthenticAMD
- dmesg: https://pastebin.com/duj33bY8
- .config: https://pastebin.com/7Hew1g0T

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-10 20:02 [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support Ilya Trukhanov
@ 2021-11-10 22:24 ` Ard Biesheuvel
  2021-11-10 23:21   ` Ilya Trukhanov
  2021-11-10 23:07 ` Javier Martinez Canillas
  2021-11-11  6:11 ` Thorsten Leemhuis
  2 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-11-10 22:24 UTC (permalink / raw)
  To: Ilya Trukhanov
  Cc: # 3.4.x, regressions, linux-efi, Linux PM,
	Javier Martinez Canillas, Thomas Zimmermann, Rafael J. Wysocki,
	Len Brown, pavel

Hi Ilya,

On Wed, 10 Nov 2021 at 21:02, Ilya Trukhanov <lahvuun@gmail.com> wrote:
>
> Suspend-to-RAM with elogind under Wayland stopped working in 5.15.
>
> This occurs with 5.15, 5.15.1 and latest master at
> 89d714ab6043bca7356b5c823f5335f5dce1f930. 5.14 and earlier releases work
> fine.
>
> git bisect gives d391c58271072d0b0fad93c82018d495b2633448.
>
> To reproduce:
> - Use elogind and Linux 5.15.1 with CONFIG_SYSFB_SIMPLEFB=n.
> - Start a Wayland session. I tested sway and weston, neither worked.
> - In a terminal emulator (I used alacritty) execute `loginctl suspend`.
>
> Normally after the last step the system would suspend, but it no longer
> does so after I upgraded to Linux 5.15. After running `loginctl suspend`
> in dmesg I get the following:
> [  103.098782] elogind-daemon[2357]: Suspending system...
> [  103.098794] PM: suspend entry (deep)
> [  103.124621] Filesystems sync: 0.025 seconds
>
> But nothing happens afterwards.
>
> Suspend works as expected if I do any of the following:
> - Revert d391c58271072d0b0fad93c82018d495b2633448.
> - Build with CONFIG_SYSFB_SIMPLEFB=y.

If this solves the issue, what else is there to discuss?



> - Suspend from tty, even if a Wayland session is running in parallel.
> - Suspend from under an X11 session.
> - Suspend with `echo mem > /sys/power/state`.
>
> If I attach strace to the elogind-daemon process after running
> `loginctl suspend` then the system immediately suspends. However, if
> I attach strace *prior* to running `loginctl suspend` then no suspend,
> and the process gets stuck on a write syscall to `/sys/power/state`.
>
> I "traced" a little bit with printk (sorry, I don't know of a better
> way) and the call chain is as follows:
> state_store -> pm_suspend -> enter_state -> suspend_prepare
> -> pm_prepare_console -> vt_move_to_console -> vt_waitactive
> -> __vt_event_wait
>
> __vt_event_wait just waits until wait_event_interruptible completes, but
> it never does (not until I attach to elogind-daemon with strace, at
> least). I did not follow the chain further.
>
> - Linux version 5.15.1 (lahvuun@lahvuun) (gcc (Gentoo 11.2.0 p1) 11.2.0,
>   GNU ld (Gentoo 2.37_p1 p0) 2.37) #51 SMP PREEMPT Tue Nov 9 23:39:25
>   EET 2021
> - Gentoo Linux 2.8
> - x86_64 AuthenticAMD
> - dmesg: https://pastebin.com/duj33bY8
> - .config: https://pastebin.com/7Hew1g0T

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-10 20:02 [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support Ilya Trukhanov
  2021-11-10 22:24 ` Ard Biesheuvel
@ 2021-11-10 23:07 ` Javier Martinez Canillas
  2021-11-11  0:45   ` Ilya Trukhanov
  2021-11-11  6:11 ` Thorsten Leemhuis
  2 siblings, 1 reply; 12+ messages in thread
From: Javier Martinez Canillas @ 2021-11-10 23:07 UTC (permalink / raw)
  To: Ilya Trukhanov, stable
  Cc: regressions, linux-efi, linux-pm, tzimmermann, ardb, rafael,
	len.brown, pavel, dri-devel

[ adding dri-devel mailing list as Cc ]

Hello Ilya,

On 11/10/21 21:02, Ilya Trukhanov wrote:
> Suspend-to-RAM with elogind under Wayland stopped working in 5.15.
> 
> This occurs with 5.15, 5.15.1 and latest master at
> 89d714ab6043bca7356b5c823f5335f5dce1f930. 5.14 and earlier releases work
> fine.
> 
> git bisect gives d391c58271072d0b0fad93c82018d495b2633448.
>

That's strange because this patch is just moving code around, there shouldn't
be any functional changes...

> To reproduce:
> - Use elogind and Linux 5.15.1 with CONFIG_SYSFB_SIMPLEFB=n.
> - Start a Wayland session. I tested sway and weston, neither worked.
> - In a terminal emulator (I used alacritty) execute `loginctl suspend`.
> 
> Normally after the last step the system would suspend, but it no longer
> does so after I upgraded to Linux 5.15. After running `loginctl suspend`
> in dmesg I get the following:
> [  103.098782] elogind-daemon[2357]: Suspending system...
> [  103.098794] PM: suspend entry (deep)
> [  103.124621] Filesystems sync: 0.025 seconds
> 
> But nothing happens afterwards.
> 
> Suspend works as expected if I do any of the following:
> - Revert d391c58271072d0b0fad93c82018d495b2633448.
> - Build with CONFIG_SYSFB_SIMPLEFB=y.

Can you please share the kernel boot log for any of these cases too ?

> - Suspend from tty, even if a Wayland session is running in parallel.
> - Suspend from under an X11 session.
> - Suspend with `echo mem > /sys/power/state`.
> 
> If I attach strace to the elogind-daemon process after running
> `loginctl suspend` then the system immediately suspends. However, if
> I attach strace *prior* to running `loginctl suspend` then no suspend,
> and the process gets stuck on a write syscall to `/sys/power/state`.
> 
> I "traced" a little bit with printk (sorry, I don't know of a better
> way) and the call chain is as follows:
> state_store -> pm_suspend -> enter_state -> suspend_prepare
> -> pm_prepare_console -> vt_move_to_console -> vt_waitactive
> -> __vt_event_wait
> 
> __vt_event_wait just waits until wait_event_interruptible completes, but
> it never does (not until I attach to elogind-daemon with strace, at
> least). I did not follow the chain further.
> 
> - Linux version 5.15.1 (lahvuun@lahvuun) (gcc (Gentoo 11.2.0 p1) 11.2.0,
>   GNU ld (Gentoo 2.37_p1 p0) 2.37) #51 SMP PREEMPT Tue Nov 9 23:39:25
>   EET 2021
> - Gentoo Linux 2.8
> - x86_64 AuthenticAMD
> - dmesg: https://pastebin.com/duj33bY8
> - .config: https://pastebin.com/7Hew1g0T
> 

Looking at your .config and dmesg output, my guess is that is related to the
fact that you have both CONFIG_FB_EFI=y and CONFIG_DRM_AMDGPU=y.

The code that adds the "efi-framebuffer" platform device used to be in the
arch/x86/kernel/sysfb.c file but now is in drivers/firmware/sysfb.c, and it
could affect the order in which the device <--> driver matching happens.

From your kernel boot log:

...
[    0.375796] [drm] amdgpu kernel modesetting enabled.
[    0.375819] amdgpu: CRAT table disabled by module option
[    0.375823] amdgpu: Virtual CRAT table created for CPU
[    0.375831] amdgpu: Topology: Add CPU node
[    0.375865] amdgpu 0000:0a:00.0: vgaarb: deactivate vga console
[    0.375911] [drm] initializing kernel modesetting (VEGA10 0x1002:0x687F 0x1DA2:0xE376 0xC3).
...
[    0.868997] fbcon: amdgpu (fb0) is primary device
[    1.004397] Console: switching to colour frame buffer device 240x67
[    1.017815] amdgpu 0000:0a:00.0: [drm] fb0: amdgpu frame buffer device
...
[    1.133997] efifb: probing for efifb
[    1.134716] efifb: framebuffer at 0xe0000000, using 8100k, total 8100k
[    1.135438] efifb: mode is 1920x1080x32, linelength=7680, pages=1
[    1.136180] efifb: scrolling: redraw
[    1.136891] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    1.137638] fb1: EFI VGA frame buffer device

Usually the efifb is to have early framebuffer output before the native DRM
driver probes, but in your case is the opposite. This wouldn't happen if the
amdpug driver was built as a module.

Probably before the mentioned commit, the efifb driver was probed earlier and
then the amdgpu driver would had removed the conflicting efifb framebuffer
before registering its DRM device. But that doesn't happen here and the efifb
framebuffer is still around since is registered after the one for the amdgpu.

Which would explain why also works with CONFIG_SYSFB_SIMPLEFB=y for you, since
in that case a "simple-framebuffer" platform device is added instead of an
"efi-framebuffer". But since neither CONFIG_FB_SIMPLE nor CONFIG_DRM_SIMPLEDRM
are enabled in your kernel config, no device driver will match that device.

This is just a guess though. Would be good if you could test following cases:

1) CONFIG_FB_EFI not set
2) CONFIG_FB_EFI=y and CONFIG_DRM_AMDGPU=m
3) CONFIG_SYSFB_SIMPLEFB=y and CONFIG_FB_SIMPLE=y

And for each check /proc/fb, the kernel boot log, and if Suspend-to-RAM works.

If the explanation above is correct, then I would expect (1) and (2) to work and
(3) to also fail.

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-10 22:24 ` Ard Biesheuvel
@ 2021-11-10 23:21   ` Ilya Trukhanov
  2021-11-10 23:25     ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Trukhanov @ 2021-11-10 23:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: # 3.4.x, regressions, linux-efi, Linux PM,
	Javier Martinez Canillas, Thomas Zimmermann, Rafael J. Wysocki,
	Len Brown, pavel

On Wed, Nov 10, 2021 at 11:24:03PM +0100, Ard Biesheuvel wrote:
> Hi Ilya,
> 
> On Wed, 10 Nov 2021 at 21:02, Ilya Trukhanov <lahvuun@gmail.com> wrote:
> >
> > Suspend-to-RAM with elogind under Wayland stopped working in 5.15.
> >
> > This occurs with 5.15, 5.15.1 and latest master at
> > 89d714ab6043bca7356b5c823f5335f5dce1f930. 5.14 and earlier releases work
> > fine.
> >
> > git bisect gives d391c58271072d0b0fad93c82018d495b2633448.
> >
> > To reproduce:
> > - Use elogind and Linux 5.15.1 with CONFIG_SYSFB_SIMPLEFB=n.
> > - Start a Wayland session. I tested sway and weston, neither worked.
> > - In a terminal emulator (I used alacritty) execute `loginctl suspend`.
> >
> > Normally after the last step the system would suspend, but it no longer
> > does so after I upgraded to Linux 5.15. After running `loginctl suspend`
> > in dmesg I get the following:
> > [  103.098782] elogind-daemon[2357]: Suspending system...
> > [  103.098794] PM: suspend entry (deep)
> > [  103.124621] Filesystems sync: 0.025 seconds
> >
> > But nothing happens afterwards.
> >
> > Suspend works as expected if I do any of the following:
> > - Revert d391c58271072d0b0fad93c82018d495b2633448.
> > - Build with CONFIG_SYSFB_SIMPLEFB=y.
> 
> If this solves the issue, what else is there to discuss?
Sorry, I'm not a kernel developer, but I was under the impression
that this is a regression and should at least be brought to attention.

I also think I'm probably not the last person to encounter this. I'm
fortunate because I had the time to bisect and get the idea to try
enabling that option, but others may not know how to fix it.

The suspend not working is also not the only effect. After you execute
`loginctl suspend`, for example, the compositor just hangs if you try to
exit. Should you kill it with SysRq+I, the system suspends but after
resume doesn't respond to anything and has to be hard reset. I think
this is a pretty serious issue, even if it won't affect most users.

Sorry if I wasn't meant to CC you. The issue reporting guide says that
you should CC maintainers of affected subsystems.
> 
> 
> 
> > - Suspend from tty, even if a Wayland session is running in parallel.
> > - Suspend from under an X11 session.
> > - Suspend with `echo mem > /sys/power/state`.
> >
> > If I attach strace to the elogind-daemon process after running
> > `loginctl suspend` then the system immediately suspends. However, if
> > I attach strace *prior* to running `loginctl suspend` then no suspend,
> > and the process gets stuck on a write syscall to `/sys/power/state`.
> >
> > I "traced" a little bit with printk (sorry, I don't know of a better
> > way) and the call chain is as follows:
> > state_store -> pm_suspend -> enter_state -> suspend_prepare
> > -> pm_prepare_console -> vt_move_to_console -> vt_waitactive
> > -> __vt_event_wait
> >
> > __vt_event_wait just waits until wait_event_interruptible completes, but
> > it never does (not until I attach to elogind-daemon with strace, at
> > least). I did not follow the chain further.
> >
> > - Linux version 5.15.1 (lahvuun@lahvuun) (gcc (Gentoo 11.2.0 p1) 11.2.0,
> >   GNU ld (Gentoo 2.37_p1 p0) 2.37) #51 SMP PREEMPT Tue Nov 9 23:39:25
> >   EET 2021
> > - Gentoo Linux 2.8
> > - x86_64 AuthenticAMD
> > - dmesg: https://pastebin.com/duj33bY8
> > - .config: https://pastebin.com/7Hew1g0T

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-10 23:21   ` Ilya Trukhanov
@ 2021-11-10 23:25     ` Ard Biesheuvel
  2021-11-11  0:08       ` Ilya Trukhanov
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2021-11-10 23:25 UTC (permalink / raw)
  To: Ilya Trukhanov
  Cc: # 3.4.x, regressions, linux-efi, Linux PM,
	Javier Martinez Canillas, Thomas Zimmermann, Rafael J. Wysocki,
	Len Brown, pavel

On Thu, 11 Nov 2021 at 00:22, Ilya Trukhanov <lahvuun@gmail.com> wrote:
>
> On Wed, Nov 10, 2021 at 11:24:03PM +0100, Ard Biesheuvel wrote:
> > Hi Ilya,
> >
> > On Wed, 10 Nov 2021 at 21:02, Ilya Trukhanov <lahvuun@gmail.com> wrote:
> > >
> > > Suspend-to-RAM with elogind under Wayland stopped working in 5.15.
> > >
> > > This occurs with 5.15, 5.15.1 and latest master at
> > > 89d714ab6043bca7356b5c823f5335f5dce1f930. 5.14 and earlier releases work
> > > fine.
> > >
> > > git bisect gives d391c58271072d0b0fad93c82018d495b2633448.
> > >
> > > To reproduce:
> > > - Use elogind and Linux 5.15.1 with CONFIG_SYSFB_SIMPLEFB=n.
> > > - Start a Wayland session. I tested sway and weston, neither worked.
> > > - In a terminal emulator (I used alacritty) execute `loginctl suspend`.
> > >
> > > Normally after the last step the system would suspend, but it no longer
> > > does so after I upgraded to Linux 5.15. After running `loginctl suspend`
> > > in dmesg I get the following:
> > > [  103.098782] elogind-daemon[2357]: Suspending system...
> > > [  103.098794] PM: suspend entry (deep)
> > > [  103.124621] Filesystems sync: 0.025 seconds
> > >
> > > But nothing happens afterwards.
> > >
> > > Suspend works as expected if I do any of the following:
> > > - Revert d391c58271072d0b0fad93c82018d495b2633448.
> > > - Build with CONFIG_SYSFB_SIMPLEFB=y.
> >
> > If this solves the issue, what else is there to discuss?
> Sorry, I'm not a kernel developer, but I was under the impression
> that this is a regression and should at least be brought to attention.
>
> I also think I'm probably not the last person to encounter this. I'm
> fortunate because I had the time to bisect and get the idea to try
> enabling that option, but others may not know how to fix it.
>
> The suspend not working is also not the only effect. After you execute
> `loginctl suspend`, for example, the compositor just hangs if you try to
> exit. Should you kill it with SysRq+I, the system suspends but after
> resume doesn't respond to anything and has to be hard reset. I think
> this is a pretty serious issue, even if it won't affect most users.
>
> Sorry if I wasn't meant to CC you. The issue reporting guide says that
> you should CC maintainers of affected subsystems.

No worries. You cc'ed the right people, and we appreciate the time you
have spent to track down the root cause.

So can you explain why the solution to this issue is not simply
'enable CONFIG_SYSFB_SIMPLEFB' ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-10 23:25     ` Ard Biesheuvel
@ 2021-11-11  0:08       ` Ilya Trukhanov
  0 siblings, 0 replies; 12+ messages in thread
From: Ilya Trukhanov @ 2021-11-11  0:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: # 3.4.x, regressions, linux-efi, Linux PM,
	Javier Martinez Canillas, Thomas Zimmermann, Rafael J. Wysocki,
	Len Brown, pavel

On Thu, Nov 11, 2021 at 12:25:35AM +0100, Ard Biesheuvel wrote:
> On Thu, 11 Nov 2021 at 00:22, Ilya Trukhanov <lahvuun@gmail.com> wrote:
> >
> > On Wed, Nov 10, 2021 at 11:24:03PM +0100, Ard Biesheuvel wrote:
> > > Hi Ilya,
> > >
> > > On Wed, 10 Nov 2021 at 21:02, Ilya Trukhanov <lahvuun@gmail.com> wrote:
> > > >
> > > > Suspend-to-RAM with elogind under Wayland stopped working in 5.15.
> > > >
> > > > This occurs with 5.15, 5.15.1 and latest master at
> > > > 89d714ab6043bca7356b5c823f5335f5dce1f930. 5.14 and earlier releases work
> > > > fine.
> > > >
> > > > git bisect gives d391c58271072d0b0fad93c82018d495b2633448.
> > > >
> > > > To reproduce:
> > > > - Use elogind and Linux 5.15.1 with CONFIG_SYSFB_SIMPLEFB=n.
> > > > - Start a Wayland session. I tested sway and weston, neither worked.
> > > > - In a terminal emulator (I used alacritty) execute `loginctl suspend`.
> > > >
> > > > Normally after the last step the system would suspend, but it no longer
> > > > does so after I upgraded to Linux 5.15. After running `loginctl suspend`
> > > > in dmesg I get the following:
> > > > [  103.098782] elogind-daemon[2357]: Suspending system...
> > > > [  103.098794] PM: suspend entry (deep)
> > > > [  103.124621] Filesystems sync: 0.025 seconds
> > > >
> > > > But nothing happens afterwards.
> > > >
> > > > Suspend works as expected if I do any of the following:
> > > > - Revert d391c58271072d0b0fad93c82018d495b2633448.
> > > > - Build with CONFIG_SYSFB_SIMPLEFB=y.
> > >
> > > If this solves the issue, what else is there to discuss?
> > Sorry, I'm not a kernel developer, but I was under the impression
> > that this is a regression and should at least be brought to attention.
> >
> > I also think I'm probably not the last person to encounter this. I'm
> > fortunate because I had the time to bisect and get the idea to try
> > enabling that option, but others may not know how to fix it.
> >
> > The suspend not working is also not the only effect. After you execute
> > `loginctl suspend`, for example, the compositor just hangs if you try to
> > exit. Should you kill it with SysRq+I, the system suspends but after
> > resume doesn't respond to anything and has to be hard reset. I think
> > this is a pretty serious issue, even if it won't affect most users.
> >
> > Sorry if I wasn't meant to CC you. The issue reporting guide says that
> > you should CC maintainers of affected subsystems.
> 
> No worries. You cc'ed the right people, and we appreciate the time you
> have spent to track down the root cause.
> 
> So can you explain why the solution to this issue is not simply
> 'enable CONFIG_SYSFB_SIMPLEFB' ?

I'm not sure I understand what you're asking.

I can definitely enable CONFIG_SYSFB_SIMPLEFB and it would be *a*
solution, but only for me. In the future other people with setups
similar to mine will update to 5.15 or later and also face this issue.
They will then have to do everything I did (or at least search through
the mailing list) to get suspend working again. Is this desirable?

Besides, this option existed before (albeit under a different name), and
there was no need to enable it for suspend to work properly. The change
in question did not indicate that this option must now be enabled, it
wasn't even made `default y`. And even if it were, some people might
still have legitimate reasons to ignore the default and disable it, and
then have their suspend not work.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-10 23:07 ` Javier Martinez Canillas
@ 2021-11-11  0:45   ` Ilya Trukhanov
  2021-11-11  7:31     ` Javier Martinez Canillas
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Trukhanov @ 2021-11-11  0:45 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: stable, regressions, linux-efi, linux-pm, tzimmermann, ardb,
	rafael, len.brown, pavel, dri-devel

On Thu, Nov 11, 2021 at 12:07:19AM +0100, Javier Martinez Canillas wrote:
> [ adding dri-devel mailing list as Cc ]
> 
> Hello Ilya,
> 
> On 11/10/21 21:02, Ilya Trukhanov wrote:
> > Suspend-to-RAM with elogind under Wayland stopped working in 5.15.
> > 
> > This occurs with 5.15, 5.15.1 and latest master at
> > 89d714ab6043bca7356b5c823f5335f5dce1f930. 5.14 and earlier releases work
> > fine.
> > 
> > git bisect gives d391c58271072d0b0fad93c82018d495b2633448.
> >
> 
> That's strange because this patch is just moving code around, there shouldn't
> be any functional changes...
> 
> > To reproduce:
> > - Use elogind and Linux 5.15.1 with CONFIG_SYSFB_SIMPLEFB=n.
> > - Start a Wayland session. I tested sway and weston, neither worked.
> > - In a terminal emulator (I used alacritty) execute `loginctl suspend`.
> > 
> > Normally after the last step the system would suspend, but it no longer
> > does so after I upgraded to Linux 5.15. After running `loginctl suspend`
> > in dmesg I get the following:
> > [  103.098782] elogind-daemon[2357]: Suspending system...
> > [  103.098794] PM: suspend entry (deep)
> > [  103.124621] Filesystems sync: 0.025 seconds
> > 
> > But nothing happens afterwards.
> > 
> > Suspend works as expected if I do any of the following:
> > - Revert d391c58271072d0b0fad93c82018d495b2633448.
> > - Build with CONFIG_SYSFB_SIMPLEFB=y.
> 
> Can you please share the kernel boot log for any of these cases too ?

revert dmesg: https://pastebin.com/BpnMvV2u
CONFIG_SYSFB_SIMPLEFB=y dmesg: https://pastebin.com/qSUdQygt

> 
> > - Suspend from tty, even if a Wayland session is running in parallel.
> > - Suspend from under an X11 session.
> > - Suspend with `echo mem > /sys/power/state`.
> > 
> > If I attach strace to the elogind-daemon process after running
> > `loginctl suspend` then the system immediately suspends. However, if
> > I attach strace *prior* to running `loginctl suspend` then no suspend,
> > and the process gets stuck on a write syscall to `/sys/power/state`.
> > 
> > I "traced" a little bit with printk (sorry, I don't know of a better
> > way) and the call chain is as follows:
> > state_store -> pm_suspend -> enter_state -> suspend_prepare
> > -> pm_prepare_console -> vt_move_to_console -> vt_waitactive
> > -> __vt_event_wait
> > 
> > __vt_event_wait just waits until wait_event_interruptible completes, but
> > it never does (not until I attach to elogind-daemon with strace, at
> > least). I did not follow the chain further.
> > 
> > - Linux version 5.15.1 (lahvuun@lahvuun) (gcc (Gentoo 11.2.0 p1) 11.2.0,
> >   GNU ld (Gentoo 2.37_p1 p0) 2.37) #51 SMP PREEMPT Tue Nov 9 23:39:25
> >   EET 2021
> > - Gentoo Linux 2.8
> > - x86_64 AuthenticAMD
> > - dmesg: https://pastebin.com/duj33bY8
> > - .config: https://pastebin.com/7Hew1g0T
> > 
> 
> Looking at your .config and dmesg output, my guess is that is related to the
> fact that you have both CONFIG_FB_EFI=y and CONFIG_DRM_AMDGPU=y.
> 
> The code that adds the "efi-framebuffer" platform device used to be in the
> arch/x86/kernel/sysfb.c file but now is in drivers/firmware/sysfb.c, and it
> could affect the order in which the device <--> driver matching happens.
> 
> From your kernel boot log:
> 
> ...
> [    0.375796] [drm] amdgpu kernel modesetting enabled.
> [    0.375819] amdgpu: CRAT table disabled by module option
> [    0.375823] amdgpu: Virtual CRAT table created for CPU
> [    0.375831] amdgpu: Topology: Add CPU node
> [    0.375865] amdgpu 0000:0a:00.0: vgaarb: deactivate vga console
> [    0.375911] [drm] initializing kernel modesetting (VEGA10 0x1002:0x687F 0x1DA2:0xE376 0xC3).
> ...
> [    0.868997] fbcon: amdgpu (fb0) is primary device
> [    1.004397] Console: switching to colour frame buffer device 240x67
> [    1.017815] amdgpu 0000:0a:00.0: [drm] fb0: amdgpu frame buffer device
> ...
> [    1.133997] efifb: probing for efifb
> [    1.134716] efifb: framebuffer at 0xe0000000, using 8100k, total 8100k
> [    1.135438] efifb: mode is 1920x1080x32, linelength=7680, pages=1
> [    1.136180] efifb: scrolling: redraw
> [    1.136891] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> [    1.137638] fb1: EFI VGA frame buffer device
> 
> Usually the efifb is to have early framebuffer output before the native DRM
> driver probes, but in your case is the opposite. This wouldn't happen if the
> amdpug driver was built as a module.
> 
> Probably before the mentioned commit, the efifb driver was probed earlier and
> then the amdgpu driver would had removed the conflicting efifb framebuffer
> before registering its DRM device. But that doesn't happen here and the efifb
> framebuffer is still around since is registered after the one for the amdgpu.
> 
> Which would explain why also works with CONFIG_SYSFB_SIMPLEFB=y for you, since
> in that case a "simple-framebuffer" platform device is added instead of an
> "efi-framebuffer". But since neither CONFIG_FB_SIMPLE nor CONFIG_DRM_SIMPLEDRM
> are enabled in your kernel config, no device driver will match that device.
> 
> This is just a guess though. Would be good if you could test following cases:
> 
> 1) CONFIG_FB_EFI not set

/proc/fb:
0 amdgpu

dmesg: https://pastebin.com/c1BcWLEh

Suspend-to-RAM works.

> 2) CONFIG_FB_EFI=y and CONFIG_DRM_AMDGPU=m

/proc/fb before `modprobe amdgpu`:
0 EFI VGA

after:
0 amdgpu

dmesg: https://pastebin.com/vSsTw2Km

Suspend-to-RAM works.

> 3) CONFIG_SYSFB_SIMPLEFB=y and CONFIG_FB_SIMPLE=y

/proc/fb:
0 amdgpu
1 simple

dmesg: https://pastebin.com/ZSXnpLqQ

Suspend-to-RAM fails.

> 
> And for each check /proc/fb, the kernel boot log, and if Suspend-to-RAM works.
> 
> If the explanation above is correct, then I would expect (1) and (2) to work and
> (3) to also fail.
> 
> Best regards,
> -- 
> Javier Martinez Canillas
> Linux Engineering
> Red Hat
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-10 20:02 [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support Ilya Trukhanov
  2021-11-10 22:24 ` Ard Biesheuvel
  2021-11-10 23:07 ` Javier Martinez Canillas
@ 2021-11-11  6:11 ` Thorsten Leemhuis
  2 siblings, 0 replies; 12+ messages in thread
From: Thorsten Leemhuis @ 2021-11-11  6:11 UTC (permalink / raw)
  To: Ilya Trukhanov, regressions



On 10.11.21 21:02, Ilya Trukhanov wrote:
> Suspend-to-RAM with elogind under Wayland stopped working in 5.15.
> 
> This occurs with 5.15, 5.15.1 and latest master at
> 89d714ab6043bca7356b5c823f5335f5dce1f930. 5.14 and earlier releases work
> fine.
> 
> git bisect gives d391c58271072d0b0fad93c82018d495b2633448.

Ilya, thx for CCing the regression list. To be sure this issue doesn't
fall through the cracks unnoticed, I'm adding it to regzbot, the Linux
kernel regression tracking bot:

#regzbot ^introduced d391c58271072d0b0fad93c82018d495b2633448
#regzbot title Suspend-to-RAM with elogind under Wayland stopped working
in 5.15.
#regzbot ignore-activity

FYI: I removed everyone else and the other lists from the To or CC to
avoid noise, as this mail is meaningless for them.

Ciao, Thorsten, your Linux kernel regression tracker.

P.S.: If you want to know more about regzbot, check out its
web-interface, the getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

But note, regzbot is doing its first field-testing now and thus still
has some bugs. Adding this regression will help be to find them, hence
feel free to ignore this mail or any errors you spot in the web-ui.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-11  0:45   ` Ilya Trukhanov
@ 2021-11-11  7:31     ` Javier Martinez Canillas
  2021-11-11  9:24       ` Javier Martinez Canillas
  0 siblings, 1 reply; 12+ messages in thread
From: Javier Martinez Canillas @ 2021-11-11  7:31 UTC (permalink / raw)
  To: Ilya Trukhanov
  Cc: stable, regressions, linux-efi, linux-pm, tzimmermann, ardb,
	rafael, len.brown, pavel, dri-devel

Hello Ilya,

On 11/11/21 01:45, Ilya Trukhanov wrote:

[snip]

>> Can you please share the kernel boot log for any of these cases too ?
>

Thanks a lot for the testing and providing the info!
 
>> This is just a guess though. Would be good if you could test following cases:
>>
>> 1) CONFIG_FB_EFI not set
> 
> /proc/fb:
> 0 amdgpu
> 
> dmesg: https://pastebin.com/c1BcWLEh
> 
> Suspend-to-RAM works.
> 
>> 2) CONFIG_FB_EFI=y and CONFIG_DRM_AMDGPU=m
> 
> /proc/fb before `modprobe amdgpu`:
> 0 EFI VGA
> 
> after:
> 0 amdgpu
> 
> dmesg: https://pastebin.com/vSsTw2Km
> 
> Suspend-to-RAM works.
> 
>> 3) CONFIG_SYSFB_SIMPLEFB=y and CONFIG_FB_SIMPLE=y
> 
> /proc/fb:
> 0 amdgpu
> 1 simple
> 
> dmesg: https://pastebin.com/ZSXnpLqQ
> 
> Suspend-to-RAM fails.
> 
>>
>> And for each check /proc/fb, the kernel boot log, and if Suspend-to-RAM works.
>>
>> If the explanation above is correct, then I would expect (1) and (2) to work and
>> (3) to also fail.
>>

Your testing confirms my assumptions. I'll check how this could be solved to
prevent the efifb driver to be probed if there's already a framebuffer device.

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-11  7:31     ` Javier Martinez Canillas
@ 2021-11-11  9:24       ` Javier Martinez Canillas
  2021-11-11 10:52         ` Ilya Trukhanov
  0 siblings, 1 reply; 12+ messages in thread
From: Javier Martinez Canillas @ 2021-11-11  9:24 UTC (permalink / raw)
  To: Ilya Trukhanov
  Cc: stable, regressions, linux-efi, linux-pm, tzimmermann, ardb,
	rafael, len.brown, pavel, dri-devel

On 11/11/21 08:31, Javier Martinez Canillas wrote:

[snip]

>>> And for each check /proc/fb, the kernel boot log, and if Suspend-to-RAM works.
>>>
>>> If the explanation above is correct, then I would expect (1) and (2) to work and
>>> (3) to also fail.
>>>
> 
> Your testing confirms my assumptions. I'll check how this could be solved to
> prevent the efifb driver to be probed if there's already a framebuffer device.
> 

I've posted [0] which does this and also for the simplefb driver.

[0]: https://lore.kernel.org/dri-devel/20211111092053.1328304-1-javierm@redhat.com/T/#u

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-11  9:24       ` Javier Martinez Canillas
@ 2021-11-11 10:52         ` Ilya Trukhanov
  2021-11-11 11:13           ` Javier Martinez Canillas
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Trukhanov @ 2021-11-11 10:52 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: stable, regressions, linux-efi, linux-pm, tzimmermann, ardb,
	rafael, len.brown, pavel, dri-devel

On Thu, Nov 11, 2021 at 10:24:56AM +0100, Javier Martinez Canillas wrote:
> On 11/11/21 08:31, Javier Martinez Canillas wrote:
> 
> [snip]
> 
> >>> And for each check /proc/fb, the kernel boot log, and if Suspend-to-RAM works.
> >>>
> >>> If the explanation above is correct, then I would expect (1) and (2) to work and
> >>> (3) to also fail.
> >>>
> > 
> > Your testing confirms my assumptions. I'll check how this could be solved to
> > prevent the efifb driver to be probed if there's already a framebuffer device.
> > 
> 
> I've posted [0] which does this and also for the simplefb driver.
> 
> [0]: https://lore.kernel.org/dri-devel/20211111092053.1328304-1-javierm@redhat.com/T/#u

I applied the patch and it fixes the issue for me.
Thank you!

> 
> Best regards,
> -- 
> Javier Martinez Canillas
> Linux Engineering
> Red Hat
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support
  2021-11-11 10:52         ` Ilya Trukhanov
@ 2021-11-11 11:13           ` Javier Martinez Canillas
  0 siblings, 0 replies; 12+ messages in thread
From: Javier Martinez Canillas @ 2021-11-11 11:13 UTC (permalink / raw)
  To: Ilya Trukhanov
  Cc: stable, regressions, linux-efi, linux-pm, tzimmermann, ardb,
	rafael, len.brown, pavel, dri-devel

Hello Ilya,

On 11/11/21 11:52, Ilya Trukhanov wrote:
> On Thu, Nov 11, 2021 at 10:24:56AM +0100, Javier Martinez Canillas wrote:
>> On 11/11/21 08:31, Javier Martinez Canillas wrote:
>>
>> [snip]
>>
>>>>> And for each check /proc/fb, the kernel boot log, and if Suspend-to-RAM works.
>>>>>
>>>>> If the explanation above is correct, then I would expect (1) and (2) to work and
>>>>> (3) to also fail.
>>>>>
>>>
>>> Your testing confirms my assumptions. I'll check how this could be solved to
>>> prevent the efifb driver to be probed if there's already a framebuffer device.
>>>
>>
>> I've posted [0] which does this and also for the simplefb driver.
>>
>> [0]: https://lore.kernel.org/dri-devel/20211111092053.1328304-1-javierm@redhat.com/T/#u
> 
> I applied the patch and it fixes the issue for me.
> Thank you!
> 

Great! And thanks for tracking this down.

Feel free to add your Tested-by to v2.

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-11-11 11:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 20:02 [REGRESSION]: drivers/firmware: move x86 Generic System Framebuffers support Ilya Trukhanov
2021-11-10 22:24 ` Ard Biesheuvel
2021-11-10 23:21   ` Ilya Trukhanov
2021-11-10 23:25     ` Ard Biesheuvel
2021-11-11  0:08       ` Ilya Trukhanov
2021-11-10 23:07 ` Javier Martinez Canillas
2021-11-11  0:45   ` Ilya Trukhanov
2021-11-11  7:31     ` Javier Martinez Canillas
2021-11-11  9:24       ` Javier Martinez Canillas
2021-11-11 10:52         ` Ilya Trukhanov
2021-11-11 11:13           ` Javier Martinez Canillas
2021-11-11  6:11 ` Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).