regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
@ 2023-04-13 19:35 Acid Bong
  2023-04-14  7:51 ` Bagas Sanjaya
  0 siblings, 1 reply; 19+ messages in thread
From: Acid Bong @ 2023-04-13 19:35 UTC (permalink / raw)
  To: regressions; +Cc: stable, linux-acpi

Hi there, hello,

Sometimes when I suspend (by closing the lid, less often - by pressing
Fn+F1 (sleep key combo)) or poweroff my laptop (both by pressing powerit
button and running "loginctl poweroff"), it goes in such a state when it
doesn't respond to opening/closing the lid, power button nor
Ctrl+Alt+Del, but, unlike in sleep mode, the fan is rotating and the
"awake status" LED is on. I checked /var/log/kern.log, but it didn't
report suspend at that moment at all: went straight from [UFW BLOCK] to
"Microcode updated" on force reboot (marked with an arrow):

	Apr 13 10:40:32 bong kernel: asus_wmi: Unknown key code 0xcf
	Apr 13 10:44:05 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
	Apr 13 10:47:45 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
	Apr 13 10:47:46 bong kernel: ICMPv6: NA: /*router*/ advertised our address /*ipv6*/ on wlan0!
	Apr 13 10:47:48 bong last message buffered 2 times
->	Apr 13 10:49:11 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
	Apr 13 10:52:34 bong kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-12
	Apr 13 10:52:34 bong kernel: Linux version 6.1.23-bong+ (acid@bong) (gcc (Gentoo Hardened 12.2.1_p20230121-r1 p10) 12.2.1 20230121, GNU ld (Gentoo 2.39 p5) 2.39.0) #1 SMP PREEMPT_DYNAMIC Tue Apr 11 15:21:57 EEST 2023
	Apr 13 10:52:34 bong kernel: Command line: root=/dev/genston/root ro loglevel=4 rd.lvm.vg=genston rd.luks.uuid=97d10669-2da1-452d-a372-887e420b2ad4 rd.luks.allow-discards pci=nomsi initrd=\x5cinitramfs-6.1.23-bong+.img
	Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
	Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
	Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

Normally it starts like this (taken from dmesg to sync with elogind messages)

	[ 7835.869228] elogind-daemon[2033]: Lid closed.
	[ 7835.872875] elogind-daemon[2033]: Suspending...
	[ 7835.873955] elogind-daemon[2033]: Suspending system...
	[ 7835.873970] PM: suspend entry (deep)
	[ 7835.902814] Filesystems sync: 0.028 seconds
	[ 7835.920362] Freezing user space processes
	[ 7835.923030] Freezing user space processes completed (elapsed 0.002 seconds)
	[ 7835.923046] OOM killer disabled.
	[ 7835.923049] Freezing remaining freezable tasks
	[ 7835.924445] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
	[ 7835.924624] printk: Suspending console(s) (use no_console_suspend to debug)

The issue appeared when I was using pf-kernel with genpatches and
updated from 6.1-pf2 to 6.1-pf3 (corresponding to vanilla versions 6.1.3
-> 6.1.6). I used that fork until 6.2-pf2, but since then (early March)
moved to vanilla sources and started following the 6.1.y branch when it
was declared LTS. And the issue was present on all of them.

The hang was last detected 3 days ago on 6.1.22 and today on 6.1.23.

I'd like to bisect it, but it could take ages for a couple of reasons:

1) I don't know exact patterns it follows. One of the scenarios I've
noticed was this one (sorry if too ridiculous):
	- put the laptop on the nearby couch and simultaneously close
	  the lid; the loose charger jack might disconnect;
	- lay the mouse upside down (so it doesn't wake up when I
	  reconnect the charger),
but it's not a 100% guarantee of the bug and, as I said earlier, the
laptop also misbehaves on shutdown.

2) The issue happens rarely, once in a few days (sometimes up to a week;
I haven't measured it precisely back then).

Hardware: https://tilde.cafe/u/acidbong/kernel/lspci (`lspci -vvnn`)
Config (latest vanilla): https://git.sr.ht/~acid-bong/kernel/tree/806e6639da610952798e1b5d8c0d700062f915de/item/.config
Built with KCFLAGS="-march=native"
Isolated cmdline: root=/dev/genston/root ro loglevel=4 rd.lvm.vg=genston rd.luks.uuid=97d10669-2da1-452d-a372-887e420b2ad4 rd.luks.allow-discards pci=nomsi initrd=\initramfs-6.1.23-bong+.img

# regzbot introduced v6.1.3..v6.1.6

---
Regards,
~acidbong

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-13 19:35 [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward) Acid Bong
@ 2023-04-14  7:51 ` Bagas Sanjaya
  2023-04-14  8:15   ` Linux regression tracking (Thorsten Leemhuis)
  2023-05-01 12:50   ` Bagas Sanjaya
  0 siblings, 2 replies; 19+ messages in thread
From: Bagas Sanjaya @ 2023-04-14  7:51 UTC (permalink / raw)
  To: Acid Bong, regressions; +Cc: stable, linux-acpi, Thorsten Leemhuis

On 4/14/23 02:35, Acid Bong wrote:
> The issue appeared when I was using pf-kernel with genpatches and
> updated from 6.1-pf2 to 6.1-pf3 (corresponding to vanilla versions 6.1.3
> -> 6.1.6). I used that fork until 6.2-pf2, but since then (early March)
> moved to vanilla sources and started following the 6.1.y branch when it
> was declared LTS. And the issue was present on all of them.
> 
> The hang was last detected 3 days ago on 6.1.22 and today on 6.1.23.
> 

Have you tried testing latest mainline to see if commits which are
backported to 6.1.y cause your regression?

> # regzbot introduced v6.1.3..v6.1.6
> 

Anyway, I'm adding this to regzbot:

#regzbot ^introduced v6.1.3..v6.1.6
#regzbot title Asus X541UAK hangs on suspend and poweroff
#regzbot ignore-activity

Thanks.

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-14  7:51 ` Bagas Sanjaya
@ 2023-04-14  8:15   ` Linux regression tracking (Thorsten Leemhuis)
  2023-04-14  9:07     ` Acid Bong
  2023-04-17  7:37     ` Acid Bong
  2023-05-01 12:50   ` Bagas Sanjaya
  1 sibling, 2 replies; 19+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-04-14  8:15 UTC (permalink / raw)
  To: Bagas Sanjaya, Acid Bong, regressions; +Cc: stable, linux-acpi

On 14.04.23 09:51, Bagas Sanjaya wrote:
> On 4/14/23 02:35, Acid Bong wrote:
>> The issue appeared when I was using pf-kernel with genpatches and
>> updated from 6.1-pf2 to 6.1-pf3 (corresponding to vanilla versions 6.1.3
>> -> 6.1.6). I used that fork until 6.2-pf2, but since then (early March)
>> moved to vanilla sources and started following the 6.1.y branch when it
>> was declared LTS. And the issue was present on all of them.
>>
>> The hang was last detected 3 days ago on 6.1.22 and today on 6.1.23.
> 
> Have you tried testing latest mainline to see if commits which are
> backported to 6.1.y cause your regression?

Well, if it something that started between v6.1.3 and v6.1.6 it must be
a backported commit from mainline that causes the regression.

But yeah, testing mainline would be wise to differentiate between "this
is something that is caused by a change in mainline" and "this is
something stable specific and might be caused by a bad or incomplete
backport".

It's not totally clear to me, but it seems 6.2 is affected as well?
Well, then it's a mainline issue. Testing latest mainline nevertheless
would be good to know if this maybe was fixed already.

But first something else: acidbong, why do you pass "pci=nomsi" to your
kernel? Maybe that makes your machine run in a unusual configuration
that directly or indirectly leads to your problem (which only worked by
chance earlier).

>> # regzbot introduced v6.1.3..v6.1.6
> 
> Anyway, I'm adding this to regzbot:

Well, the quoted string above already did that. But whatever, a...

> #regzbot ^introduced v6.1.3..v6.1.6

...should do no harm and this...

> #regzbot title Asus X541UAK hangs on suspend and poweroff

... has improved the title (which was derived from the subject
beforehand) somewhat. :-D

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-14  8:15   ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-04-14  9:07     ` Acid Bong
  2023-04-14 18:51       ` Acid Bong
                         ` (3 more replies)
  2023-04-17  7:37     ` Acid Bong
  1 sibling, 4 replies; 19+ messages in thread
From: Acid Bong @ 2023-04-14  9:07 UTC (permalink / raw)
  To: Linux regressions mailing list, Bagas Sanjaya; +Cc: stable, linux-acpi

> Thorsten
> why do you pass pci=nomsi
It's a workaround for another issue i've been facing for about 2 or 3
years, since when I first tried out Linux (started with loading Kubuntu
and Mint live images). Without that workaround Kubuntu didn't boot for
me - on kernel 5.8 it only reached the graphic installer part, but hung
after language selection menu, on 5.4 and 5.11 - didn't even reach the
graphic session. With Mint it was more severe - the screen was flooded
with PCIe errors, like so:

	Apr 10 18:47:08 bong last message buffered 3 times
	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask\0000001/00002000
	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
	Apr 10 18:47:08 bong last message buffered 5 times
	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask\0000001/00002000
	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
	Apr 10 18:47:08 bong last message buffered 13 times
	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask\0000001/00002000
	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
	Apr 10 18:47:08 bong last message buffered 5 times
	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask\0000001/00002000
	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
	Apr 10 18:47:08 bong last message buffered 6 times

`pci=nomsi` saved me also during Debian installation - without it the
live ISO just crashed mid-installation.

But it wasn't a complete cure for Debian- and Ubuntu-based distros, and
they still crashed even with this parameter (I don't know how exactly,
at least they didn't flood with PCIe bus errors).

Since I moved to Manjaro, Void, Arch and now to Gentoo (which bases its
config on the Fedora one), PCIe errors were my only trouble, which was
easily mitigated with `pci=nomsi`. Recently I discovered that without it
one of the kernel modules (irq/124-aerdrv) had high CPU load, so double
useful.

https://forums.linuxmint.com/viewtopic.php?p=2237628
Just googled and there's a guy with a very similar model as me (UVK
instead of UAK) and same issues, but `noaer` and `nomsi` work identically
for me (I found `nomsi` in a different thread).

Since I'm building my own kernel for the last 3 months, I've disabled
the MSI in kernel config - and with that, a big part of IOMMU part as well:
https://git.sr.ht/~acid-bong/kernel/commit/cac5c09dec0bea919ca071a9b738108b0d8a8ee5
but I did it _after_ I first experienced the issue I described in the
thread head, hoping that it'll save me from these hangs as well. It
didn't.

I'm keeping it in the bootloader config for cases when I boot with a
prebuilt Gentoo kernel, and add every time I'm booting with Arch or Void
live USB for rescue purposes. It's not a constant issue tho, happens every
other time.

---
> Bagas
> Have you tried testing latest mainline?
Just built and will boot in a moment. But we'll gotta wait for a couple
of days, since the hanging is unexpected.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-14  9:07     ` Acid Bong
@ 2023-04-14 18:51       ` Acid Bong
  2023-04-15  7:37       ` Bagas Sanjaya
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Acid Bong @ 2023-04-14 18:51 UTC (permalink / raw)
  To: acidbong; +Cc: linux-acpi, regressions, stable

For the readers: here's a copy of the letter as it should've looked (it
looks normally in the regressions archive, but wasn't parsed correctly
in stable and linux-acpi lores):
https://tilde.cafe/u/acidbong/kernel/pci-nomsi.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-14  9:07     ` Acid Bong
  2023-04-14 18:51       ` Acid Bong
@ 2023-04-15  7:37       ` Bagas Sanjaya
  2023-04-15  8:06       ` Linux regression tracking (Thorsten Leemhuis)
  2023-05-15 20:51       ` Bjorn Helgaas
  3 siblings, 0 replies; 19+ messages in thread
From: Bagas Sanjaya @ 2023-04-15  7:37 UTC (permalink / raw)
  To: Acid Bong, Linux regressions mailing list; +Cc: stable, linux-acpi

On 4/14/23 16:07, Acid Bong wrote:
>> Thorsten
>> why do you pass pci=nomsi
> It's a workaround for another issue i've been facing for about 2 or 3
> years, since when I first tried out Linux (started with loading Kubuntu
> and Mint live images). Without that workaround Kubuntu didn't boot for
> me - on kernel 5.8 it only reached the graphic installer part, but hung
> after language selection menu, on 5.4 and 5.11 - didn't even reach the
> graphic session. With Mint it was more severe - the screen was flooded
> with PCIe errors, like so:
> 
> 	Apr 10 18:47:08 bong last message buffered 3 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask

Hardware issue? Or is this another kernel issue? If it is the latter,
file separate report (see Documentation/admin-guide/reporting-issues.rst
for how to report kernel issues).

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-14  9:07     ` Acid Bong
  2023-04-14 18:51       ` Acid Bong
  2023-04-15  7:37       ` Bagas Sanjaya
@ 2023-04-15  8:06       ` Linux regression tracking (Thorsten Leemhuis)
  2023-05-15 20:51       ` Bjorn Helgaas
  3 siblings, 0 replies; 19+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-04-15  8:06 UTC (permalink / raw)
  To: Acid Bong, Linux regressions mailing list, Bagas Sanjaya
  Cc: stable, linux-acpi



On 14.04.23 11:07, Acid Bong wrote:
>> Thorsten
>> why do you pass pci=nomsi
> It's a workaround for another issue i've been facing for about 2 or 3
> years, since when I first tried out Linux (started with loading Kubuntu
> and Mint live images). Without that workaround Kubuntu didn't boot for
> me - on kernel 5.8 it only reached the graphic installer part, but hung
> after language selection menu, on 5.4 and 5.11 - didn't even reach the
> graphic session. With Mint it was more severe - the screen was flooded
> with PCIe errors, like so:
> 
> 	Apr 10 18:47:08 bong last message buffered 3 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask000001/00002000
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> 	Apr 10 18:47:08 bong last message buffered 5 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask000001/00002000
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> 	Apr 10 18:47:08 bong last message buffered 13 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask000001/00002000
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> 	Apr 10 18:47:08 bong last message buffered 5 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask000001/00002000
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> 	Apr 10 18:47:08 bong last message buffered 6 times
> 
> `pci=nomsi` saved me also during Debian installation - without it the
> live ISO just crashed mid-installation.
> 
> But it wasn't a complete cure for Debian- and Ubuntu-based distros, and
> they still crashed even with this parameter (I don't know how exactly,
> at least they didn't flood with PCIe bus errors).
> 
> Since I moved to Manjaro, Void, Arch and now to Gentoo (which bases its
> config on the Fedora one), PCIe errors were my only trouble, which was
> easily mitigated with `pci=nomsi`. Recently I discovered that without it
> one of the kernel modules (irq/124-aerdrv) had high CPU load, so double
> useful.
> 
> https://forums.linuxmint.com/viewtopic.php?p=2237628
> Just googled and there's a guy with a very similar model as me (UVK
> instead of UAK) and same issues, but `noaer` and `nomsi` work identically
> for me (I found `nomsi` in a different thread).
> 
> Since I'm building my own kernel for the last 3 months, I've disabled
> the MSI in kernel config - and with that, a big part of IOMMU part as well:
> https://git.sr.ht/~acid-bong/kernel/commit/cac5c09dec0bea919ca071a9b738108b0d8a8ee5
> but I did it _after_ I first experienced the issue I described in the
> thread head, hoping that it'll save me from these hangs as well. It
> didn't.
> 
> I'm keeping it in the bootloader config for cases when I boot with a
> prebuilt Gentoo kernel, and add every time I'm booting with Arch or Void
> live USB for rescue purposes. It's not a constant issue tho, happens every
> other time.
> 
> ---
>> Bagas
>> Have you tried testing latest mainline?
> Just built and will boot in a moment. But we'll gotta wait for a couple
> of days, since the hanging is unexpected.

This is not my area of expertise, but the pre-existing hardware config
trouble the kernel apparently has makes this a problematic case, as what
causes those problems might directly or indirectly cause the regression
you see by chance -- and might be something that only happens on your
machine.

Maybe we are lucky and some developer of the affected kernel code areas
will see your report and have an idea what might cause the regressions.
But I'd say chances are slim. So unless we are lucky, we'll likely won't
can any closer to a solution without a bisection.

But I wouldn't take that path; instead I in your place would report and
sort out the hardware config trouble, as the problem might vanish by
solving that.

But as I said, this is not my area of expertise, so maybe it's a bad advice.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-14  8:15   ` Linux regression tracking (Thorsten Leemhuis)
  2023-04-14  9:07     ` Acid Bong
@ 2023-04-17  7:37     ` Acid Bong
  2023-04-17 10:56       ` Linux regression tracking (Thorsten Leemhuis)
  1 sibling, 1 reply; 19+ messages in thread
From: Acid Bong @ 2023-04-17  7:37 UTC (permalink / raw)
  To: regressions; +Cc: acidbong, bagasdotme, linux-acpi, regressions, stable

So, I followed your advice and used the sources (6.3-rc6). Compiled even
two versions: with my config (cf. head letter) and the Arch Linux one
(I'm using Gentoo, but it still fits well), both updated with
`olddefconfig`. Just to make sure that the problem is independent from
the config.

Good news: I experienced the hanging 3 times with both kernels
yesterday.

Two of them were on the custom kernel, and they were of the rare kind -
they occured on shutdown. It goes normally, init disables the services,
unmounts the filesystems, turns off the screen, but then - no response
and the LED and the fan are still on. Another couple of shutdowns went
normal, so the issue it still irregular.

One happened later on the Arch-based one and after a suspend.

/var/log/kern.log showed nothing specific in all cases.

Bad news: it seems, the fix hasn't arrived yet.

How do I proceed next?

--

P.S. On the `pci=nomsi` case: I don't consider it being related to the
issue we're discussing. For me it seems like a hardware issue that can
be bypassed by reconfiguration.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-17  7:37     ` Acid Bong
@ 2023-04-17 10:56       ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 0 replies; 19+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-04-17 10:56 UTC (permalink / raw)
  To: Acid Bong
  Cc: bagasdotme, linux-acpi, regressions, stable, Bjorn Helgaas,
	Rafael J. Wysocki

On 17.04.23 09:37, Acid Bong wrote:
> So, I followed your advice and used the sources (6.3-rc6). Compiled even
> two versions: with my config (cf. head letter) and the Arch Linux one
> (I'm using Gentoo, but it still fits well), both updated with
> `olddefconfig`. Just to make sure that the problem is independent from
> the config.
> 
> Good news: I experienced the hanging 3 times with both kernels
> yesterday.
> 
> Two of them were on the custom kernel, and they were of the rare kind -
> they occured on shutdown. It goes normally, init disables the services,
> unmounts the filesystems, turns off the screen, but then - no response
> and the LED and the fan are still on. Another couple of shutdowns went
> normal, so the issue it still irregular.
> 
> One happened later on the Arch-based one and after a suspend.
> 
> /var/log/kern.log showed nothing specific in all cases.
> 
> Bad news: it seems, the fix hasn't arrived yet.
> 
> How do I proceed next?

Ideally you should still try to bisect this to find the change that
causes your problems.

But I'm CCing the ACPI and PCI maintainers nevertheless, now that it's
clear that it happens in vanilla mainline, too. *If* you are lucky they
have an idea what might be wrong and can point you in a direction to
narrow the cause down. But if you are unlucky, they will have no idea
and just ignore this until you bisect the problem.

FWIW, Rafael, Bjorn thread starts here:
https://lore.kernel.org/all/CRVU11I7JJWF.367PSO4YAQQEI@bong/

To quote some parts of it
```
Sometimes when I suspend (by closing the lid, less often - by pressing
Fn+F1 (sleep key combo)) or poweroff my laptop (both by pressing powerit
button and running "loginctl poweroff"), it goes in such a state when it
doesn't respond to opening/closing the lid, power button nor
Ctrl+Alt+Del, but, unlike in sleep mode, the fan is rotating and the
"awake status" LED is on.
[...]
The issue appeared when I was using pf-kernel with genpatches and
updated from 6.1-pf2 to 6.1-pf3 (corresponding to vanilla versions 6.1.3
-> 6.1.6). I used that fork until 6.2-pf2, but since then (early March)
moved to vanilla sources and started following the 6.1.y branch when it
was declared LTS. And the issue was present on all of them.
```

> P.S. On the `pci=nomsi` case: I don't consider it being related to the
> issue we're discussing. For me it seems like a hardware issue that can
> be bypassed by reconfiguration.

I wouldn't be so sure about that.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-14  7:51 ` Bagas Sanjaya
  2023-04-14  8:15   ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-05-01 12:50   ` Bagas Sanjaya
  2023-05-01 21:02     ` Acid Bong
  2023-05-13 10:50     ` Acid Bong
  1 sibling, 2 replies; 19+ messages in thread
From: Bagas Sanjaya @ 2023-05-01 12:50 UTC (permalink / raw)
  To: Acid Bong, regressions
  Cc: stable, linux-acpi, Thorsten Leemhuis, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]

On Fri, Apr 14, 2023 at 02:51:47PM +0700, Bagas Sanjaya wrote:
> On 4/14/23 02:35, Acid Bong wrote:
> > The issue appeared when I was using pf-kernel with genpatches and
> > updated from 6.1-pf2 to 6.1-pf3 (corresponding to vanilla versions 6.1.3
> > -> 6.1.6). I used that fork until 6.2-pf2, but since then (early March)
> > moved to vanilla sources and started following the 6.1.y branch when it
> > was declared LTS. And the issue was present on all of them.
> > 
> > The hang was last detected 3 days ago on 6.1.22 and today on 6.1.23.
> > 
> 
> Have you tried testing latest mainline to see if commits which are
> backported to 6.1.y cause your regression?
> 

#regzbot poke

Acid Bong, have you successfully bisected to find the culprit commit?
How about swapping the hardware? I'm poking because the thread looks
stale for a while.

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-05-01 12:50   ` Bagas Sanjaya
@ 2023-05-01 21:02     ` Acid Bong
  2023-05-03  4:31       ` Bagas Sanjaya
  2023-06-09 11:09       ` Acid Bong
  2023-05-13 10:50     ` Acid Bong
  1 sibling, 2 replies; 19+ messages in thread
From: Acid Bong @ 2023-05-01 21:02 UTC (permalink / raw)
  To: Bagas Sanjaya, regressions
  Cc: stable, linux-acpi, Thorsten Leemhuis, Rafael J. Wysocki

Hi there, and thank you for the reminder.

Bisecting, unfortunately, takes a long time: I'm only trying out the 7th
commit, 15e7433e1dc2 (previous 6 marked as bad). The bug, as noted in
the head, doesn't have any (strict) patterns and takes randomly long
times: some kernels hung on the next day after compilation, one took 5
days. I'm not excluding a possibility that I might've got the versions
wrong and the bug occured on the update from 6.1-pf1 to 6.1-pf2 (6.1 and
6.1.3; could be unrelated, but I saw a bunch of commits related to i915
and Skylake).

I also checked my package manager log, no programs related to kernel
compilation (glibc, gcc, archivers and such) were updated until I
updated to the problematic version, and for about two weeks after the
upgrade (the first occurence happened soon after it).

What exactly do you mean by "swapping the hardware"? I'm already sure
it's not related to my storage, because a month ago I replaced my faulty
HDD with an SSD, but the bug still remained. Unfortunately, I don't have
spare PCs or resources to purchase new hardware.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-05-01 21:02     ` Acid Bong
@ 2023-05-03  4:31       ` Bagas Sanjaya
  2023-06-09 11:09       ` Acid Bong
  1 sibling, 0 replies; 19+ messages in thread
From: Bagas Sanjaya @ 2023-05-03  4:31 UTC (permalink / raw)
  To: Acid Bong, regressions
  Cc: stable, linux-acpi, Thorsten Leemhuis, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 1197 bytes --]

On Tue, May 02, 2023 at 12:02:26AM +0300, Acid Bong wrote:
> Hi there, and thank you for the reminder.
> 
> Bisecting, unfortunately, takes a long time: I'm only trying out the 7th
> commit, 15e7433e1dc2 (previous 6 marked as bad). The bug, as noted in
> the head, doesn't have any (strict) patterns and takes randomly long
> times: some kernels hung on the next day after compilation, one took 5
> days. I'm not excluding a possibility that I might've got the versions
> wrong and the bug occured on the update from 6.1-pf1 to 6.1-pf2 (6.1 and
> 6.1.3; could be unrelated, but I saw a bunch of commits related to i915
> and Skylake).

OK, try keep updating on bisection process.

> What exactly do you mean by "swapping the hardware"? I'm already sure
> it's not related to my storage, because a month ago I replaced my faulty
> HDD with an SSD, but the bug still remained. Unfortunately, I don't have
> spare PCs or resources to purchase new hardware.

In case of laptops, I mean buying out new laptop (maybe with similar
hardware specs as your current one) and try reproducing the regression
there.

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-05-01 12:50   ` Bagas Sanjaya
  2023-05-01 21:02     ` Acid Bong
@ 2023-05-13 10:50     ` Acid Bong
  1 sibling, 0 replies; 19+ messages in thread
From: Acid Bong @ 2023-05-13 10:50 UTC (permalink / raw)
  To: Bagas Sanjaya, regressions
  Cc: stable, linux-acpi, Thorsten Leemhuis, Rafael J. Wysocki

Hi there, hello. A little mid-update.

I bisected almost all range after 6.1.3 and the only untested commits
left are unrelated to my hardware (AMD-specific stuff). I spent a week
with a 6.1.1 kernel and didn't experience a single hang since, which
leads me to a couple of conclusions:

1) it's not a hardware issue after all, since certain versions don't
produce the bug
2) (this one's more an assumption) I might've got the version range
wrong.

I'm gonna try 6.1.2 and 6.1.3 as well (up to 7 more days for each), and,
depending on the output, bisect in a different range (now I regret not
doing it in the beginning).

At the moment the earliest *tested* commit is:
```
[15e7433e1dc202] arm64: dts: qcom: sc8280xp: fix UFS DMA coherency.
```
and it's marked as "bad".

Thank you for your patience.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-04-14  9:07     ` Acid Bong
                         ` (2 preceding siblings ...)
  2023-04-15  8:06       ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-05-15 20:51       ` Bjorn Helgaas
  2023-05-16 10:26         ` Acid Bong
  3 siblings, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2023-05-15 20:51 UTC (permalink / raw)
  To: Acid Bong
  Cc: Linux regressions mailing list, Bagas Sanjaya, stable,
	linux-acpi, Rafael J. Wysocki, linux-pci

[+cc linux-pci; thread at
https://lore.kernel.org/r/CRVU11I7JJWF.367PSO4YAQQEI@bong]

On Fri, Apr 14, 2023 at 12:07:42PM +0300, Acid Bong wrote:
> > Thorsten
> > why do you pass pci=nomsi
>
> It's a workaround for another issue i've been facing for about 2 or 3
> years, since when I first tried out Linux (started with loading Kubuntu
> and Mint live images). Without that workaround Kubuntu didn't boot for
> me - on kernel 5.8 it only reached the graphic installer part, but hung
> after language selection menu, on 5.4 and 5.11 - didn't even reach the
> graphic session. With Mint it was more severe - the screen was flooded
> with PCIe errors, like so:
> 
> 	Apr 10 18:47:08 bong last message buffered 3 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask
> 	Apr 10 18:47:08 bong last message buffered 5 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask
> 	Apr 10 18:47:08 bong last message buffered 13 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask
> 	Apr 10 18:47:08 bong last message buffered 5 times
> 	Apr 10 18:47:08 bong kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask
> 	Apr 10 18:47:08 bong last message buffered 6 times
> 
> `pci=nomsi` saved me also during Debian installation - without it the
> live ISO just crashed mid-installation.

Likely "pci=nomsi" or "pci=noaer" are not related to the
suspend/poweroff issue, but I'd really like to fix the AER problem
anyway.

Can you collect the complete dmesg log and output of "sudo lspci -vv"
and post them somewhere (https://bugzilla.kernel.org is a good place)?

Ideally the dmesg would be from the most recent kernel you have.

Bjorn

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-05-15 20:51       ` Bjorn Helgaas
@ 2023-05-16 10:26         ` Acid Bong
  2023-05-16 19:32           ` Bjorn Helgaas
  0 siblings, 1 reply; 19+ messages in thread
From: Acid Bong @ 2023-05-16 10:26 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linux regressions mailing list, Bagas Sanjaya, stable,
	linux-acpi, Rafael J. Wysocki, linux-pci

>Can you collect the complete dmesg log and output of "sudo lspci -vv"
>and post them somewhere (https://bugzilla.kernel.org is a good place)?
`lspci -vvnn` output is linked in the head of the thread. Append .txt to make it readable in the browser (I only understood it after the upload).

>Ideally the dmesg would be from the most recent kernel you have.
Speaking of that, a couple of questions:

1) Should I post them with or without pci=nomsi/noaer? The problem with disabling it is that it floods the logs so fast, that they reach 700M in 5-7 minutes, and, when rotation is enabled (my parameters are default, up to 10 copies 10M each), all pre-flood data is lost instantly.

Also I'm currently bisecting the kernel with MSI disabled in the config. But I'm keeping the parameter in the bootloader for cases when I'm using Gentoo's prebuilt kernel.

2) Can I delete messages by ufw? They contain MACs of my router, laptop and cellphone and I don't really wanna share them

3) I'm not savvy in logs, how exactly should I share dmesg? `dmesg > file`? /var/log/syslog? I already know kern.log doesn't contain logind and some other messages that are present in dmesg

4) Should we continue in this thread or rather start a new one?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-05-16 10:26         ` Acid Bong
@ 2023-05-16 19:32           ` Bjorn Helgaas
  0 siblings, 0 replies; 19+ messages in thread
From: Bjorn Helgaas @ 2023-05-16 19:32 UTC (permalink / raw)
  To: Acid Bong
  Cc: Linux regressions mailing list, Bagas Sanjaya, stable,
	linux-acpi, Rafael J. Wysocki, linux-pci

On Tue, May 16, 2023 at 01:26:23PM +0300, Acid Bong wrote:
> >Can you collect the complete dmesg log and output of "sudo lspci -vv"
> >and post them somewhere (https://bugzilla.kernel.org is a good place)?
> `lspci -vvnn` output is linked in the head of the thread. Append .txt to make it readable in the browser (I only understood it after the upload).
> 
> >Ideally the dmesg would be from the most recent kernel you have.
>
> Speaking of that, a couple of questions:
> 
> 1) Should I post them with or without pci=nomsi/noaer? The problem
> with disabling it is that it floods the logs so fast, that they
> reach 700M in 5-7 minutes, and, when rotation is enabled (my
> parameters are default, up to 10 copies 10M each), all pre-flood
> data is lost instantly.

You're seeing AER logging, and that's what I'm interested in, so if
you could do one quick boot *without* "pci=nomsi" and "pci=noaer",
that would be great.  Then turn it off again so you don't drown in
logs.

The snippet from [1] shows a few messages related to 00:1c.5, and it
would be useful to know if there are errors related to other devices
as well.

Something like "head -c500K /var/log/dmesg > file" should be plenty.

> Also I'm currently bisecting the kernel with MSI disabled in the
> config. But I'm keeping the parameter in the bootloader for cases
> when I'm using Gentoo's prebuilt kernel.
> 
> 2) Can I delete messages by ufw? They contain MACs of my router,
> laptop and cellphone and I don't really wanna share them

Sure, delete those.

> 3) I'm not savvy in logs, how exactly should I share dmesg? `dmesg >
> file`? /var/log/syslog? I already know kern.log doesn't contain
> logind and some other messages that are present in dmesg
> 
> 4) Should we continue in this thread or rather start a new one?

Good point, a new thread would probably be better.

Bjorn

[1] https://lore.kernel.org/all/CRWCUOAB4JKZ.3EKQN1TFFMVQL@bong/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-05-01 21:02     ` Acid Bong
  2023-05-03  4:31       ` Bagas Sanjaya
@ 2023-06-09 11:09       ` Acid Bong
  2023-06-09 16:55         ` Bjorn Helgaas
  1 sibling, 1 reply; 19+ messages in thread
From: Acid Bong @ 2023-06-09 11:09 UTC (permalink / raw)
  To: acidbong
  Cc: bagasdotme, linux-acpi, rafael, regressions, regressions, stable,
	helgaas

Hi there, hello.

This seems to be my final update.

About a week ago I returned to using Gajim, which, as I remember from
earlier, also seemed to be responsible for these hangings, and they got
more frequent (I haven't updated any software for the last 2 months). I
decided to move to the kernel version 6.1.1, which I earlier marked as
"good", and my laptop hung last evening during the shutdown. As always,
nothing in the logs.

I tried to compile some versions from 5.15.y branch, but either I had a
bad luck, or the commits weren't properly compatible with GCC 12 yet,
but they (.48 and .78) emitted warnings, so I never used them (or I
broke the repo, who knows).

Due to the fact that software does have impact on this behaviour, and
due to my health issues and potential conscription (cuz our army doesn't
care about health), which will cut me from my laptop for a long-long
time, I give up on bisecting. I'll just update all my software (there's
also a GCC upgrade in the repos) and hope for the best.

Sorry for inconvenience and have a great day. Thank you very much.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-06-09 11:09       ` Acid Bong
@ 2023-06-09 16:55         ` Bjorn Helgaas
  2023-06-14  8:53           ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2023-06-09 16:55 UTC (permalink / raw)
  To: Acid Bong
  Cc: bagasdotme, linux-acpi, rafael, regressions, regressions, stable

On Fri, Jun 09, 2023 at 02:09:17PM +0300, Acid Bong wrote:
> Hi there, hello.
> 
> This seems to be my final update.
> 
> About a week ago I returned to using Gajim, which, as I remember from
> earlier, also seemed to be responsible for these hangings, and they got
> more frequent (I haven't updated any software for the last 2 months). I
> decided to move to the kernel version 6.1.1, which I earlier marked as
> "good", and my laptop hung last evening during the shutdown. As always,
> nothing in the logs.
> 
> I tried to compile some versions from 5.15.y branch, but either I had a
> bad luck, or the commits weren't properly compatible with GCC 12 yet,
> but they (.48 and .78) emitted warnings, so I never used them (or I
> broke the repo, who knows).
> 
> Due to the fact that software does have impact on this behaviour, and
> due to my health issues and potential conscription (cuz our army doesn't
> care about health), which will cut me from my laptop for a long-long
> time, I give up on bisecting. I'll just update all my software (there's
> also a GCC upgrade in the repos) and hope for the best.
> 
> Sorry for inconvenience and have a great day. Thank you very much.

No inconvenience on our side; your help is invaluable, especially for
intermittent problems like this one.  They are really hard to find and
debug, and I'm sorry that we didn't get this one resolved.

Bjorn

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward)
  2023-06-09 16:55         ` Bjorn Helgaas
@ 2023-06-14  8:53           ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 19+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-06-14  8:53 UTC (permalink / raw)
  To: Bjorn Helgaas, Acid Bong
  Cc: bagasdotme, linux-acpi, rafael, regressions, stable

On 09.06.23 18:55, Bjorn Helgaas wrote:
> On Fri, Jun 09, 2023 at 02:09:17PM +0300, Acid Bong wrote:
>> Hi there, hello.
>>
>> About a week ago I returned to using Gajim, which, as I remember from
>> earlier, also seemed to be responsible for these hangings, and they got
>> more frequent (I haven't updated any software for the last 2 months). I
>> decided to move to the kernel version 6.1.1, which I earlier marked as
>> "good", and my laptop hung last evening during the shutdown. As always,
>> nothing in the logs.
>>
>> I tried to compile some versions from 5.15.y branch, but either I had a
>> bad luck, or the commits weren't properly compatible with GCC 12 yet,
>> but they (.48 and .78) emitted warnings, so I never used them (or I
>> broke the repo, who knows).
>>
>> Due to the fact that software does have impact on this behaviour, and
>> due to my health issues and potential conscription (cuz our army doesn't
>> care about health), which will cut me from my laptop for a long-long
>> time, I give up on bisecting. I'll just update all my software (there's
>> also a GCC upgrade in the repos) and hope for the best.
>>
>> Sorry for inconvenience and have a great day. Thank you very much.
> 
> No inconvenience on our side; your help is invaluable, especially for
> intermittent problems like this one.  They are really hard to find and
> debug, and I'm sorry that we didn't get this one resolved.

+1

Then let me remove this from the regression tracking, too.

#regzbot inconclusive: ignored, reporter for various real life reasons
unfortunately will be unable to bisect/debug
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-06-14  8:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-13 19:35 [REGRESSION] Asus X541UAK hangs on suspend and poweroff (v6.1.6 onward) Acid Bong
2023-04-14  7:51 ` Bagas Sanjaya
2023-04-14  8:15   ` Linux regression tracking (Thorsten Leemhuis)
2023-04-14  9:07     ` Acid Bong
2023-04-14 18:51       ` Acid Bong
2023-04-15  7:37       ` Bagas Sanjaya
2023-04-15  8:06       ` Linux regression tracking (Thorsten Leemhuis)
2023-05-15 20:51       ` Bjorn Helgaas
2023-05-16 10:26         ` Acid Bong
2023-05-16 19:32           ` Bjorn Helgaas
2023-04-17  7:37     ` Acid Bong
2023-04-17 10:56       ` Linux regression tracking (Thorsten Leemhuis)
2023-05-01 12:50   ` Bagas Sanjaya
2023-05-01 21:02     ` Acid Bong
2023-05-03  4:31       ` Bagas Sanjaya
2023-06-09 11:09       ` Acid Bong
2023-06-09 16:55         ` Bjorn Helgaas
2023-06-14  8:53           ` Linux regression tracking #update (Thorsten Leemhuis)
2023-05-13 10:50     ` Acid Bong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).