regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [Regression] acpi: laptop panics early in boot
@ 2023-03-06 21:45 Pierre Asselin
  2023-03-07  9:33 ` Uwe Kleine-König
  2023-04-17 11:41 ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 2 replies; 14+ messages in thread
From: Pierre Asselin @ 2023-03-06 21:45 UTC (permalink / raw)
  To: linux-acpi
  Cc: regressions, "Uwe Kleine-König", Rafael J. Wysocki

[First bug report, apologies if I break etiquette, pointers appreciated.]

Laptop crashes when booting on AC power.

The laptop is on Gentoo; upgrading from 5.15.88 to 6.1.12, the kernel
panics before reaching early userspace.  I find no other report like
this on the Gentoo, arch, mint, or Ubuntu forums, on kernel.org, or out
of duckduckgo.  This is a regression (because, you know, the kernel used
to boot) but so far it's only me, and only on my 16-year old laptop.
Also, I have a workaround.

Image of the panic screen at
https://www.panix.com/~pa/linux-6.1.12-crash/boot-failure.jpg .

Seeing acpi_ac_notify() near the top of the traceback, I tried booting on
battery power.  Works fine.  The panic occurs only when booting on AC power,
whether or not a battery is present.  Sometimes, rarely, it boots fine
even on AC, but that is the exception not the rule.

Based on comments in drivers/acpi/ac.c I tried a trivial patch,

	--- drivers/acpi/ac.c.orig	2022-12-11 17:15:18.000000000 -0500
	+++ drivers/acpi/ac.c	2023-02-19 17:38:06.958733031 -0500
	@@ -47,7 +47,7 @@ static int acpi_ac_resume(struct device
	 #endif
	 static SIMPLE_DEV_PM_OPS(acpi_ac_pm, NULL, acpi_ac_resume);

	-static int ac_sleep_before_get_state_ms;
	+static int ac_sleep_before_get_state_ms = 1000;
	 static int ac_only;

	 static struct acpi_driver acpi_ac_driver = {

and it works.  This is probably not the right fix but it will be my
workaround for the time being.

Problem is still present in 6.3-rc1 .  Bisecting, the first bad commit
is d6fb6ee1820c .

The starting config (from 5.15.88) is at
https://www.panix.com/~pa/linux-6.1.12-crash/config-base .
The test is make olddefconfig, make bzImage, then kexec the bzImage with an
initramfs to see if it reaches the rescueshell prompt.  Has to succeed
three times to be declared good.

In case I forgot something, here is my post on the Gentoo forums:
https://forums.gentoo.org/viewtopic-t-1161834-highlight-.html

#regzbot introduced d6fb6ee1820c

Also in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=217152


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-06 21:45 [Regression] acpi: laptop panics early in boot Pierre Asselin
@ 2023-03-07  9:33 ` Uwe Kleine-König
  2023-03-07 14:25   ` Pierre Asselin
  2023-04-17 11:41 ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 14+ messages in thread
From: Uwe Kleine-König @ 2023-03-07  9:33 UTC (permalink / raw)
  To: Pierre Asselin; +Cc: linux-acpi, regressions, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 1564 bytes --]

Hello,

> Problem is still present in 6.3-rc1 .  Bisecting, the first bad commit
> is d6fb6ee1820c .

There is a small timeframe where the (now deleted) struct
acpi_device::driver is different from acpi_device::dev->driver. That's
because the latter is assigned to in drivers/base/dd.c's really_probe()
before the acpi probe function is called, while the former is assigned
only later.

So acpi_drv->ops.add(acpi_dev) is called with the driver set while that
was not true before d6fb6ee1820c.

To test if this is the culprit, can you please try a 5.19 kernel plus
the following patch?:

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index e2db1bdd9dd2..657a2520a3e1 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1022,6 +1022,8 @@ static int acpi_device_probe(struct device *dev)
 	struct acpi_driver *acpi_drv = to_acpi_driver(dev->driver);
 	int ret;
 
+	acpi_dev->driver = acpi_drv;
+
 	if (acpi_dev->handler && !acpi_is_pnp_device(acpi_dev))
 		return -EINVAL;
 
@@ -1032,8 +1034,6 @@ static int acpi_device_probe(struct device *dev)
 	if (ret)
 		return ret;
 
-	acpi_dev->driver = acpi_drv;
-
 	pr_debug("Driver [%s] successfully bound to device [%s]\n",
 		 acpi_drv->name, acpi_dev->pnp.bus_id);
 

If the above patch breaks 5.19 that's the trail we have to follow. That
this is timing-related is strange though.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-07  9:33 ` Uwe Kleine-König
@ 2023-03-07 14:25   ` Pierre Asselin
  2023-03-07 16:10     ` Uwe Kleine-König
  0 siblings, 1 reply; 14+ messages in thread
From: Pierre Asselin @ 2023-03-07 14:25 UTC (permalink / raw)
  To: Uwe Kleine-Koenig
  Cc: Pierre Asselin, linux-acpi, regressions, Rafael J. Wysocki


5.19: good.
5.19+Uwe's patch: bad.

Slightly different traceback,
https://www.panix.com/~pa/linux-6.1.12-crash/boot-failure-5.19-patched.jpg



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-07 14:25   ` Pierre Asselin
@ 2023-03-07 16:10     ` Uwe Kleine-König
  2023-03-07 19:31       ` Pierre Asselin
  0 siblings, 1 reply; 14+ messages in thread
From: Uwe Kleine-König @ 2023-03-07 16:10 UTC (permalink / raw)
  To: Pierre Asselin; +Cc: linux-acpi, regressions, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 1595 bytes --]

On Tue, Mar 07, 2023 at 09:25:09AM -0500, Pierre Asselin wrote:
> 
> 5.19: good.
> 5.19+Uwe's patch: bad.
> 
> Slightly different traceback,
> https://www.panix.com/~pa/linux-6.1.12-crash/boot-failure-5.19-patched.jpg

Maybe the following patch helps (on top of v6.3-rc1):

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 9531dd0fef50..a5a8f82981ce 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -518,7 +518,7 @@ static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
 	if (!adev)
 		goto err;
 
-	if (adev->dev.driver) {
+	if (device_is_bound(&adev->dev)) {
 		struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
 
 		if (driver && driver->ops.notify &&

This might be broken because usually you have to hold the device lock
and this isn't given here I think.

On the other hand, I don't know if you could just grab the lock here.
You'd probably have to hold it while calling driver->ops.notify which
might or might not be safe. I think there was a race already before
commit d6fb6ee1820c ("ACPI: bus: Drop driver member of struct
acpi_device"), it just wasn't hit that reliably. (Consider the state
before d6fb6ee1820c and the device being unbound just between
acpi_bus_notify()'s driver = adev->driver; and the call to
driver->ops.notify. So conceptually it was necessary already before
d6fb6ee1820c to hold the device lock?)

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-07 16:10     ` Uwe Kleine-König
@ 2023-03-07 19:31       ` Pierre Asselin
  2023-03-23 12:31         ` Linux regression tracking (Thorsten Leemhuis)
       [not found]         ` <20230307200843.yxhvnb23tpecjjng@pengutronix.de>
  0 siblings, 2 replies; 14+ messages in thread
From: Pierre Asselin @ 2023-03-07 19:31 UTC (permalink / raw)
  To: "Uwe Kleine-König"
  Cc: Pierre Asselin, linux-acpi, regressions, Rafael J. Wysocki

> Maybe the following patch helps (on top of v6.3-rc1):
>
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 9531dd0fef50..a5a8f82981ce 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -518,7 +518,7 @@ static void acpi_bus_notify(acpi_handle handle, u32
> type, void *data)
>  	if (!adev)
>  		goto err;
>
> -	if (adev->dev.driver) {
> +	if (device_is_bound(&adev->dev)) {
>  		struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
>
>  		if (driver && driver->ops.notify &&
>

It does indeed "fix" 6.3-rc1.  Modulo locking issues, which I am
not qualified to evaluate.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-07 19:31       ` Pierre Asselin
@ 2023-03-23 12:31         ` Linux regression tracking (Thorsten Leemhuis)
  2023-03-23 13:16           ` Uwe Kleine-König
       [not found]         ` <20230307200843.yxhvnb23tpecjjng@pengutronix.de>
  1 sibling, 1 reply; 14+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-03-23 12:31 UTC (permalink / raw)
  To: Pierre Asselin, Uwe Kleine-König
  Cc: linux-acpi, regressions, Rafael J. Wysocki,
	Linux kernel regressions list

On 07.03.23 20:31, Pierre Asselin wrote:
>> Maybe the following patch helps (on top of v6.3-rc1):
>>
>> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
>> index 9531dd0fef50..a5a8f82981ce 100644
>> --- a/drivers/acpi/bus.c
>> +++ b/drivers/acpi/bus.c
>> @@ -518,7 +518,7 @@ static void acpi_bus_notify(acpi_handle handle, u32
>> type, void *data)
>>  	if (!adev)
>>  		goto err;
>>
>> -	if (adev->dev.driver) {
>> +	if (device_is_bound(&adev->dev)) {
>>  		struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
>>
>>  		if (driver && driver->ops.notify &&
>>
> 
> It does indeed "fix" 6.3-rc1.  Modulo locking issues, which I am
> not qualified to evaluate.

Uwe, what happens to this regression fix? It looks like it didn't make
any progress towards mainline, but maybe I missed something in my brief
search on lore.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-23 12:31         ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-03-23 13:16           ` Uwe Kleine-König
  2023-03-23 17:20             ` Pierre Asselin
  0 siblings, 1 reply; 14+ messages in thread
From: Uwe Kleine-König @ 2023-03-23 13:16 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Pierre Asselin, linux-acpi, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

On Thu, Mar 23, 2023 at 01:31:37PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 07.03.23 20:31, Pierre Asselin wrote:
> >> Maybe the following patch helps (on top of v6.3-rc1):
> >>
> >> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> >> index 9531dd0fef50..a5a8f82981ce 100644
> >> --- a/drivers/acpi/bus.c
> >> +++ b/drivers/acpi/bus.c
> >> @@ -518,7 +518,7 @@ static void acpi_bus_notify(acpi_handle handle, u32
> >> type, void *data)
> >>  	if (!adev)
> >>  		goto err;
> >>
> >> -	if (adev->dev.driver) {
> >> +	if (device_is_bound(&adev->dev)) {
> >>  		struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
> >>
> >>  		if (driver && driver->ops.notify &&
> >>
> > 
> > It does indeed "fix" 6.3-rc1.  Modulo locking issues, which I am
> > not qualified to evaluate.
> 
> Uwe, what happens to this regression fix? It looks like it didn't make
> any progress towards mainline, but maybe I missed something in my brief
> search on lore.

If you missed something then so did I. From my POV the patch is not
known to lead to a correct handling, but for sure it's better than the
status quo.

I didn't create a proper patch because I thought someone might want to
do a deeper dive and check the logging. If that doesn't happen, I can
followup with a patch.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-23 13:16           ` Uwe Kleine-König
@ 2023-03-23 17:20             ` Pierre Asselin
  0 siblings, 0 replies; 14+ messages in thread
From: Pierre Asselin @ 2023-03-23 17:20 UTC (permalink / raw)
  To: "Uwe Kleine-König"
  Cc: Linux regressions mailing list, Pierre Asselin, linux-acpi,
	Rafael J. Wysocki

> On Thu, Mar 23, 2023 at 01:31:37PM +0100, Linux regression tracking
> (Thorsten Leemhuis) wrote:
>
> [snip]
>
> I didn't create a proper patch because I thought someone might want to
> do a deeper dive and check the logging. If that doesn't happen, I can
> followup with a patch.
>

I will test any patch anyone cares to throw at me.
--PA


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
       [not found]         ` <20230307200843.yxhvnb23tpecjjng@pengutronix.de>
@ 2023-03-23 21:22           ` Uwe Kleine-König
  2023-03-24 12:12             ` Rafael J. Wysocki
  2023-03-24 15:02             ` Pierre Asselin
  2023-03-24 11:54           ` Rafael J. Wysocki
  1 sibling, 2 replies; 14+ messages in thread
From: Uwe Kleine-König @ 2023-03-23 21:22 UTC (permalink / raw)
  To: Pierre Asselin, Rafael J. Wysocki
  Cc: linux-acpi, regressions, Bjorn Helgaas, Robert Święcki

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

Hello Pierre,

On Tue, Mar 07, 2023 at 09:08:55PM +0100, Uwe Kleine-König wrote:
> On Tue, Mar 07, 2023 at 02:31:49PM -0500, Pierre Asselin wrote:
> > > Maybe the following patch helps (on top of v6.3-rc1):
> > >
> > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > index 9531dd0fef50..a5a8f82981ce 100644
> > > --- a/drivers/acpi/bus.c
> > > +++ b/drivers/acpi/bus.c
> > > @@ -518,7 +518,7 @@ static void acpi_bus_notify(acpi_handle handle, u32
> > > type, void *data)
> > >  	if (!adev)
> > >  		goto err;
> > >
> > > -	if (adev->dev.driver) {
> > > +	if (device_is_bound(&adev->dev)) {
> > >  		struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
> > >
> > >  		if (driver && driver->ops.notify &&
> > >
> > 
> > It does indeed "fix" 6.3-rc1.  Modulo locking issues, which I am
> > not qualified to evaluate.
> 
> Thanks for your prompt test feedback.
> 
> The locked variant could look as follows:
> 
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 9531dd0fef50..fddca263ac40 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -518,13 +518,15 @@ static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
>  	if (!adev)
>  		goto err;
>  
> -	if (adev->dev.driver) {
> +	device_lock(&adev->dev);
> +	if (device_is_bound(&adev->dev)) {
>  		struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
>  
>  		if (driver && driver->ops.notify &&
>  		    (driver->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS))
>  			driver->ops.notify(adev, type);
>  	}
> +	device_unlock(&adev->dev);
>  
>  	if (!hotplug_event) {
>  		acpi_put_acpi_dev(adev);
> 
> Pierre: If you want to test that, I suggest to also enable
> PROVE_LOCKING.
> 
> Rafael: I don't know if this would work and I hope you're able to judge
> this better than I do. The change without the lock is for sure better
> than the status quo.
> 
> I did a similar conversion as the blamed commit for pci that got
> reverted for similiar reasons. See
> 68da4e0eaaab421f228eac57cbe7505b136464af. (Added Bjorn and Robert to
> Cc:) I think the pci code suffers from a similar race even after
> reverting my change. If someone is able to find the right fix for one of
> them, that might be transferable to the other?!

This mail didn't make it into the archive on lore.kernel.org. Did
someone receive it? If not, that would explain why I didn't get any
feedback on it ...

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
       [not found]         ` <20230307200843.yxhvnb23tpecjjng@pengutronix.de>
  2023-03-23 21:22           ` Uwe Kleine-König
@ 2023-03-24 11:54           ` Rafael J. Wysocki
  1 sibling, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2023-03-24 11:54 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Pierre Asselin, Rafael J. Wysocki, linux-acpi, regressions,
	Bjorn Helgaas, Robert Święcki

On Thu, Mar 23, 2023 at 11:14 PM Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
>
> Hello Pierre,
>
> On Tue, Mar 07, 2023 at 02:31:49PM -0500, Pierre Asselin wrote:
> > > Maybe the following patch helps (on top of v6.3-rc1):
> > >
> > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > index 9531dd0fef50..a5a8f82981ce 100644
> > > --- a/drivers/acpi/bus.c
> > > +++ b/drivers/acpi/bus.c
> > > @@ -518,7 +518,7 @@ static void acpi_bus_notify(acpi_handle handle, u32
> > > type, void *data)
> > >     if (!adev)
> > >             goto err;
> > >
> > > -   if (adev->dev.driver) {
> > > +   if (device_is_bound(&adev->dev)) {
> > >             struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
> > >
> > >             if (driver && driver->ops.notify &&
> > >
> >
> > It does indeed "fix" 6.3-rc1.  Modulo locking issues, which I am
> > not qualified to evaluate.
>
> Thanks for your prompt test feedback.
>
> The locked variant could look as follows:
>
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 9531dd0fef50..fddca263ac40 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -518,13 +518,15 @@ static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
>         if (!adev)
>                 goto err;
>
> -       if (adev->dev.driver) {
> +       device_lock(&adev->dev);
> +       if (device_is_bound(&adev->dev)) {
>                 struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
>
>                 if (driver && driver->ops.notify &&
>                     (driver->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS))
>                         driver->ops.notify(adev, type);
>         }
> +       device_unlock(&adev->dev);
>
>         if (!hotplug_event) {
>                 acpi_put_acpi_dev(adev);
>
> Pierre: If you want to test that, I suggest to also enable
> PROVE_LOCKING.
>
> Rafael: I don't know if this would work and I hope you're able to judge
> this better than I do. The change without the lock is for sure better
> than the status quo.

It is better than the status quo, but it is not ideal either.

It appears to me that it is a mistake to invoke the driver's
->notify() callback from acpi_bus_notify(), because there is no
synchronization between this and the driver probe/remove, so basically
all drivers setting ACPI_DRIVER_ALL_NOTIFY_EVENTS in flags need to
synchronize their ->notify() callbacks with probe and remove by hand.
The AC driver evidently doesn't do that and I have not looked at the
other ones yet, but chances are that they don't do that either.

However, this is readily avoidable if the handler installed by
acpi_device_install_notify_handler() is installed for all events.

I'll send a patch along these lines shortly.

> I did a similar conversion as the blamed commit for pci that got
> reverted for similiar reasons. See
> 68da4e0eaaab421f228eac57cbe7505b136464af. (Added Bjorn and Robert to
> Cc:) I think the pci code suffers from a similar race even after
> reverting my change.

It is very likely that there is a race, but it may be very hard to trigger.

> If someone is able to find the right fix for one of
> them, that might be transferable to the other?!

I don't think so.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-23 21:22           ` Uwe Kleine-König
@ 2023-03-24 12:12             ` Rafael J. Wysocki
  2023-03-24 12:27               ` Uwe Kleine-König
  2023-03-24 15:02             ` Pierre Asselin
  1 sibling, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2023-03-24 12:12 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Pierre Asselin, Rafael J. Wysocki, linux-acpi, regressions,
	Bjorn Helgaas, Robert Święcki

On Thu, Mar 23, 2023 at 10:22 PM Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
>
> Hello Pierre,
>
> On Tue, Mar 07, 2023 at 09:08:55PM +0100, Uwe Kleine-König wrote:
> > On Tue, Mar 07, 2023 at 02:31:49PM -0500, Pierre Asselin wrote:
> > > > Maybe the following patch helps (on top of v6.3-rc1):
> > > >
> > > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > > index 9531dd0fef50..a5a8f82981ce 100644
> > > > --- a/drivers/acpi/bus.c
> > > > +++ b/drivers/acpi/bus.c
> > > > @@ -518,7 +518,7 @@ static void acpi_bus_notify(acpi_handle handle, u32
> > > > type, void *data)
> > > >   if (!adev)
> > > >           goto err;
> > > >
> > > > - if (adev->dev.driver) {
> > > > + if (device_is_bound(&adev->dev)) {
> > > >           struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
> > > >
> > > >           if (driver && driver->ops.notify &&
> > > >
> > >
> > > It does indeed "fix" 6.3-rc1.  Modulo locking issues, which I am
> > > not qualified to evaluate.
> >
> > Thanks for your prompt test feedback.
> >
> > The locked variant could look as follows:
> >
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index 9531dd0fef50..fddca263ac40 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -518,13 +518,15 @@ static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
> >       if (!adev)
> >               goto err;
> >
> > -     if (adev->dev.driver) {
> > +     device_lock(&adev->dev);
> > +     if (device_is_bound(&adev->dev)) {
> >               struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
> >
> >               if (driver && driver->ops.notify &&
> >                   (driver->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS))
> >                       driver->ops.notify(adev, type);
> >       }
> > +     device_unlock(&adev->dev);
> >
> >       if (!hotplug_event) {
> >               acpi_put_acpi_dev(adev);
> >
> > Pierre: If you want to test that, I suggest to also enable
> > PROVE_LOCKING.
> >
> > Rafael: I don't know if this would work and I hope you're able to judge
> > this better than I do. The change without the lock is for sure better
> > than the status quo.
> >
> > I did a similar conversion as the blamed commit for pci that got
> > reverted for similiar reasons. See
> > 68da4e0eaaab421f228eac57cbe7505b136464af. (Added Bjorn and Robert to
> > Cc:) I think the pci code suffers from a similar race even after
> > reverting my change. If someone is able to find the right fix for one of
> > them, that might be transferable to the other?!
>
> This mail didn't make it into the archive on lore.kernel.org. Did
> someone receive it?

I have received it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-24 12:12             ` Rafael J. Wysocki
@ 2023-03-24 12:27               ` Uwe Kleine-König
  0 siblings, 0 replies; 14+ messages in thread
From: Uwe Kleine-König @ 2023-03-24 12:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pierre Asselin, Rafael J. Wysocki, linux-acpi, regressions,
	Bjorn Helgaas, Robert Święcki

[-- Attachment #1: Type: text/plain, Size: 3315 bytes --]

Hello Rafael,

On Fri, Mar 24, 2023 at 01:12:35PM +0100, Rafael J. Wysocki wrote:
> On Thu, Mar 23, 2023 at 10:22 PM Uwe Kleine-König
> <u.kleine-koenig@pengutronix.de> wrote:
> >
> > Hello Pierre,
> >
> > On Tue, Mar 07, 2023 at 09:08:55PM +0100, Uwe Kleine-König wrote:
> > > On Tue, Mar 07, 2023 at 02:31:49PM -0500, Pierre Asselin wrote:
> > > > > Maybe the following patch helps (on top of v6.3-rc1):
> > > > >
> > > > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > > > index 9531dd0fef50..a5a8f82981ce 100644
> > > > > --- a/drivers/acpi/bus.c
> > > > > +++ b/drivers/acpi/bus.c
> > > > > @@ -518,7 +518,7 @@ static void acpi_bus_notify(acpi_handle handle, u32
> > > > > type, void *data)
> > > > >   if (!adev)
> > > > >           goto err;
> > > > >
> > > > > - if (adev->dev.driver) {
> > > > > + if (device_is_bound(&adev->dev)) {
> > > > >           struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
> > > > >
> > > > >           if (driver && driver->ops.notify &&
> > > > >
> > > >
> > > > It does indeed "fix" 6.3-rc1.  Modulo locking issues, which I am
> > > > not qualified to evaluate.
> > >
> > > Thanks for your prompt test feedback.
> > >
> > > The locked variant could look as follows:
> > >
> > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > index 9531dd0fef50..fddca263ac40 100644
> > > --- a/drivers/acpi/bus.c
> > > +++ b/drivers/acpi/bus.c
> > > @@ -518,13 +518,15 @@ static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
> > >       if (!adev)
> > >               goto err;
> > >
> > > -     if (adev->dev.driver) {
> > > +     device_lock(&adev->dev);
> > > +     if (device_is_bound(&adev->dev)) {
> > >               struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
> > >
> > >               if (driver && driver->ops.notify &&
> > >                   (driver->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS))
> > >                       driver->ops.notify(adev, type);
> > >       }
> > > +     device_unlock(&adev->dev);
> > >
> > >       if (!hotplug_event) {
> > >               acpi_put_acpi_dev(adev);
> > >
> > > Pierre: If you want to test that, I suggest to also enable
> > > PROVE_LOCKING.
> > >
> > > Rafael: I don't know if this would work and I hope you're able to judge
> > > this better than I do. The change without the lock is for sure better
> > > than the status quo.
> > >
> > > I did a similar conversion as the blamed commit for pci that got
> > > reverted for similiar reasons. See
> > > 68da4e0eaaab421f228eac57cbe7505b136464af. (Added Bjorn and Robert to
> > > Cc:) I think the pci code suffers from a similar race even after
> > > reverting my change. If someone is able to find the right fix for one of
> > > them, that might be transferable to the other?!
> >
> > This mail didn't make it into the archive on lore.kernel.org. Did
> > someone receive it?
> 
> I have received it.

I bounced it to the list and Pierre. This was enough to make it appear
in the archive on lore.kernel.org. If you got it yesterday, that was my
bounce.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-23 21:22           ` Uwe Kleine-König
  2023-03-24 12:12             ` Rafael J. Wysocki
@ 2023-03-24 15:02             ` Pierre Asselin
  1 sibling, 0 replies; 14+ messages in thread
From: Pierre Asselin @ 2023-03-24 15:02 UTC (permalink / raw)
  To: "Uwe Kleine-König"
  Cc: Pierre Asselin, Rafael J. Wysocki, linux-acpi, regressions,
	Bjorn Helgaas, "Robert Święcki"

> This mail didn't make it into the archive on lore.kernel.org. Did
> someone receive it? If not, that would explain why I didn't get any
> feedback on it ...

First time I see it.

>> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
>> index 9531dd0fef50..fddca263ac40 100644
>> --- a/drivers/acpi/bus.c
>> +++ b/drivers/acpi/bus.c
>> @@ -518,13 +518,15 @@ static void acpi_bus_notify(acpi_handle handle,
>> u32 type, void *data)
>>  	if (!adev)
>>  		goto err;
>>
>> -	if (adev->dev.driver) {
>> +	device_lock(&adev->dev);
>> +	if (device_is_bound(&adev->dev)) {
>>  		struct acpi_driver *driver = to_acpi_driver(adev->dev.driver);
>>
>>  		if (driver && driver->ops.notify &&
>>  		    (driver->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS))
>>  			driver->ops.notify(adev, type);
>>  	}
>> +	device_unlock(&adev->dev);
>>
>>  	if (!hotplug_event) {
>>  		acpi_put_acpi_dev(adev);
>>

This patch works, in that the kernel reaches early userspace reliably.
I would want to test it more extensively, i.e. install the modules and
run it for real instead of just kexec-ing the bzImage.  I'll test Rafael's
patch first.

Full disclosure: I made small changes to the starting .config, see below.
6.3-rc1 is still bad with the new .config, the patch makes it good.
The changes:
  1) PROVE_LOCKING=y
  2) CONFIG_DEBUG_ATOMIC_SLEEP=y  (why not, what could go wrong :)
  3) radeon driver modular instead of built-in.
I don't think it matters.  Here is the diffconfig, just in case.

-EXTRA_FIRMWARE_DIR "/lib/firmware"
 CEC_CORE y -> m
 DEBUG_ATOMIC_SLEEP n -> y
 DEBUG_LOCK_ALLOC n -> y
 DEBUG_MUTEXES n -> y
 DEBUG_RT_MUTEXES n -> y
 DEBUG_RWSEMS n -> y
 DEBUG_SPINLOCK n -> y
 DEBUG_WW_MUTEX_SLOWPATH n -> y
 DRM_DISPLAY_HELPER y -> m
 DRM_KMS_HELPER y -> m
 DRM_RADEON y -> m
 DRM_TTM y -> m
 DRM_TTM_HELPER y -> m
 EXTRA_FIRMWARE "radeon/R300_cp.bin" -> ""
 FB_CFB_COPYAREA y -> m
 FB_CFB_FILLRECT y -> m
 FB_CFB_IMAGEBLIT y -> m
 FB_SYS_COPYAREA y -> m
 FB_SYS_FILLRECT y -> m
 FB_SYS_FOPS y -> m
 FB_SYS_IMAGEBLIT y -> m
 I2C_ALGOBIT y -> m
 PROVE_LOCKING n -> y
+DEBUG_LOCKDEP n
+LOCKDEP y
+LOCKDEP_BITS 15
+LOCKDEP_CHAINS_BITS 16
+LOCKDEP_CIRCULAR_QUEUE_BITS 12
+LOCKDEP_STACK_TRACE_BITS 19
+LOCKDEP_STACK_TRACE_HASH_BITS 14
+PREEMPTIRQ_TRACEPOINTS y
+PROVE_RAW_LOCK_NESTING n
+PROVE_RCU y
+TRACE_IRQFLAGS y
+TRACE_IRQFLAGS_NMI y



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Regression] acpi: laptop panics early in boot
  2023-03-06 21:45 [Regression] acpi: laptop panics early in boot Pierre Asselin
  2023-03-07  9:33 ` Uwe Kleine-König
@ 2023-04-17 11:41 ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 0 replies; 14+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-04-17 11:41 UTC (permalink / raw)
  To: linux-acpi; +Cc: regressions

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 06.03.23 22:45, Pierre Asselin wrote:
> 
> Laptop crashes when booting on AC power.
> 
> The laptop is on Gentoo; upgrading from 5.15.88 to 6.1.12, the kernel
> panics before reaching early userspace.  I find no other report like
> this on the Gentoo, arch, mint, or Ubuntu forums, on kernel.org, or out
> of duckduckgo.  This is a regression (because, you know, the kernel used
> to boot) but so far it's only me, and only on my 16-year old laptop.
> Also, I have a workaround.
> [...]
> #regzbot introduced d6fb6ee1820c

regzbot accidentally missed the fix for this, hence let me point to it
manually:

#regzbot fix: c56610a869bce03490
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-04-17 11:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06 21:45 [Regression] acpi: laptop panics early in boot Pierre Asselin
2023-03-07  9:33 ` Uwe Kleine-König
2023-03-07 14:25   ` Pierre Asselin
2023-03-07 16:10     ` Uwe Kleine-König
2023-03-07 19:31       ` Pierre Asselin
2023-03-23 12:31         ` Linux regression tracking (Thorsten Leemhuis)
2023-03-23 13:16           ` Uwe Kleine-König
2023-03-23 17:20             ` Pierre Asselin
     [not found]         ` <20230307200843.yxhvnb23tpecjjng@pengutronix.de>
2023-03-23 21:22           ` Uwe Kleine-König
2023-03-24 12:12             ` Rafael J. Wysocki
2023-03-24 12:27               ` Uwe Kleine-König
2023-03-24 15:02             ` Pierre Asselin
2023-03-24 11:54           ` Rafael J. Wysocki
2023-04-17 11:41 ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).