All of lore.kernel.org
 help / color / mirror / Atom feed
* ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
@ 2021-01-14 21:55 Pierre-Louis Bossart
  2021-01-14 23:34 ` Hans de Goede
  0 siblings, 1 reply; 20+ messages in thread
From: Pierre-Louis Bossart @ 2021-01-14 21:55 UTC (permalink / raw)
  To: ACPI Devel Mailing List
  Cc: Hans de Goede, Mika Westerberg, Rafael J. Wysocki,
	Andy Shevchenko, Rafael J. Wysocki, Pierre-Louis Bossart

Hi,
My primary test device for SOF on Cherrytrail no longer boots with 
v5.11-rc3 and the sof-dev branch, nothing happens after the 'loading 
initial ramdisk'. It's a 'Zotac' headless device derived from the 
Cherrytrail FFD design, so likely there are other devices hit by this 
problem.

A long bisect points to the commit 71da201f38dfb ('ACPI: scan: Defer 
enumeration of devices with _DEP lists').

Reverting the two commits below solves the boot issue.

I have absolutely no idea what these two patches do, but they sure have 
a large impact. Please let me know what sort of information or tests 
might help root-cause this problem.

FWIW I uploaded the alsa-info here:
http://alsa-project.org/db/?f=963685fa2c775b98b60835da3e61d9dd60056841

Thanks for your help
-Pierre

git revert 0de7fb7c8687048299305529d17f6a1e98ae658c
git revert 71da201f38dfb0c3a3d33bbe3168ea9112299dde

71da201f38dfb0c3a3d33bbe3168ea9112299dde is the first bad commit
commit 71da201f38dfb0c3a3d33bbe3168ea9112299dde
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Mon Dec 14 21:27:27 2020 +0100

     ACPI: scan: Defer enumeration of devices with _DEP lists

     In some cases ACPI control methods used during device enumeration
     (such as _HID or _STA) may rely on Operation Region handlers
     supplied by the drivers of other devices [1]:

      An example of this is the Acer Switch 10E SW3-016 model. The _HID
      method of the ACPI node for the UART attached Bluetooth, reads
      GPIOs to detect the installed wifi chip and update the _HID for the
      Bluetooth's ACPI node accordingly. The current ACPI scan code calls
      _HID before the GPIO controller's OpRegions are available, leading
      to the wrong _HID being used and Bluetooth not working.

     In principle, in those cases there should be a _DEP control method
     under the device object with OpRegion enumeration dependencies, so
     deferring the enumeration of devices with _DEP returning a non-empty
     list of suppliers of OpRegions depended on by the given device
     (modulo some known exceptions that don't really supply any OpRegions
     and are listed by _DEP for other reasons irrelevant for Linux) should
     at least address the first-order dependencies by allowing the OpRegion
     suppliers to be enumerated before their consumers.

     Implement the above idea by modifying acpi_bus_scan() to enumerate
     devices in the given scope of the ACPI namespace in two passes,
     where the first pass covers the devices without "significant" lists
     of dependencies coming from _DEP only and the second pass covers
     all of the devices that were not enumerated in the first pass.

     Take _DEP into account only for device objects with _HID, mostly in
     order to avoid deferring the creation of ACPI device objects that
     represent PCI devices and must be present during the enumeration
     of the PCI bus (which takes place during the processing of the ACPI
     device object that represents the host bridge), so that they can
     be properly associated with the corresponding PCI devices.

     Link: 
https://lore.kernel.org/linux-acpi/20201121203040.146252-1-hdegoede@redhat.com/ 
# [1]
     Reported-by: Hans de Goede <hdegoede@redhat.com>
     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
     Reviewed-by: Hans de Goede <hdegoede@redhat.com>
     Tested-by: Hans de Goede <hdegoede@redhat.com>
     Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

  drivers/acpi/scan.c | 103 
+++++++++++++++++++++++++++++++++++++++-------------
  1 file changed, 78 insertions(+), 25 deletions(-)

git bisect start
# bad: [7c53f6b671f4aba70ff15e1b05148b10d58c2837] Linux 5.11-rc3
git bisect bad 7c53f6b671f4aba70ff15e1b05148b10d58c2837
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# good: [3db1a3fa98808aa90f95ec3e0fa2fc7abf28f5c9] Merge tag 
'staging-5.11-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good 3db1a3fa98808aa90f95ec3e0fa2fc7abf28f5c9
# good: [9805529ec544ea7a82d891d5239a8ebd3dbb2a3e] Merge tag 
'arm-soc-dt-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good 9805529ec544ea7a82d891d5239a8ebd3dbb2a3e
# good: [f4a2f7866faaf89ea1595b136e01fcb336b46aab] Merge tag 'rtc-5.11' 
of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
git bisect good f4a2f7866faaf89ea1595b136e01fcb336b46aab
# bad: [ef2c8b81b88868f042579b9dd021cc9edbc2d0c6] Merge tag 
'drm-next-2020-12-24' of git://anongit.freedesktop.org/drm/drm
git bisect bad ef2c8b81b88868f042579b9dd021cc9edbc2d0c6
# good: [8653b778e454a7708847aeafe689bce07aeeb94e] Merge tag 
'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect good 8653b778e454a7708847aeafe689bce07aeeb94e
# good: [4960821a4d80781fd3e63cd71fb1b38c2dadb915] Merge tag 
'pm-5.11-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 4960821a4d80781fd3e63cd71fb1b38c2dadb915
# good: [418eddef050d5f6393c303a94e3173847ab85466] vdpa: Use simpler 
version of ida allocation
git bisect good 418eddef050d5f6393c303a94e3173847ab85466
# bad: [58cf05f597b03a8212d9ecf2c79ee046d3ee8ad9] Merge tag 
'sound-fix-5.11-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 58cf05f597b03a8212d9ecf2c79ee046d3ee8ad9
# bad: [6755f4563144e38f375f43dbb01926fd4ce08620] Merge tag 
'linux-watchdog-5.11-rc1' of git://www.linux-watchdog.org/linux-watchdog
git bisect bad 6755f4563144e38f375f43dbb01926fd4ce08620
# good: [6f733cb2e7db38f8141b14740bcde577844a03b7] watchdog: Fix 
potential dereferencing of null pointer
git bisect good 6f733cb2e7db38f8141b14740bcde577844a03b7
# bad: [538fcf57aaee6ad78a05f52b69a99baa22b33418] Merge branches 
'acpi-scan', 'acpi-pnp' and 'acpi-sleep'
git bisect bad 538fcf57aaee6ad78a05f52b69a99baa22b33418
# bad: [9272e97ae9e9b95e0805c690404a0df9fb03055f] ACPI: scan: Add Intel 
Baytrail Mailbox Device to acpi_ignore_dep_ids
git bisect bad 9272e97ae9e9b95e0805c690404a0df9fb03055f
# bad: [71da201f38dfb0c3a3d33bbe3168ea9112299dde] ACPI: scan: Defer 
enumeration of devices with _DEP lists
git bisect bad 71da201f38dfb0c3a3d33bbe3168ea9112299dde
# good: [6fc250887cbe14a350d472516f2e0118240c5d68] ACPI: scan: Evaluate 
_DEP before adding the device
git bisect good 6fc250887cbe14a350d472516f2e0118240c5d68
# first bad commit: [71da201f38dfb0c3a3d33bbe3168ea9112299dde] ACPI: 
scan: Defer enumeration of devices with _DEP lists

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-14 21:55 ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3 Pierre-Louis Bossart
@ 2021-01-14 23:34 ` Hans de Goede
  2021-01-15  0:49   ` Pierre-Louis Bossart
  0 siblings, 1 reply; 20+ messages in thread
From: Hans de Goede @ 2021-01-14 23:34 UTC (permalink / raw)
  To: Pierre-Louis Bossart, ACPI Devel Mailing List
  Cc: Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko, Rafael J. Wysocki

Hi,

On 1/14/21 10:55 PM, Pierre-Louis Bossart wrote:
> Hi,
> My primary test device for SOF on Cherrytrail no longer boots with v5.11-rc3 and the sof-dev branch, nothing happens after the 'loading initial ramdisk'. It's a 'Zotac' headless device derived from the Cherrytrail FFD design, so likely there are other devices hit by this problem.
> 
> A long bisect points to the commit 71da201f38dfb ('ACPI: scan: Defer enumeration of devices with _DEP lists').
> 
> Reverting the two commits below solves the boot issue.
> 
> I have absolutely no idea what these two patches do, but they sure have a large impact. Please let me know what sort of information or tests might help root-cause this problem.

Heh, I was just about to answer your other (off-list) email about your
CHT test device booting with a suggestion that you should try reverting that
exact commit as it is the only commit that I'm aware of which went into 5.11
which might cause this...

So I just boot 5.11-rc3 on a Acer Aspire Switch 10E SW3-016 (x5-Z8300 CHT
based) myself and that booted fine.`

Next I tried a MINIX NEO Z83-4 (x5-Z8300) which is a Mini PC and as such
probably the closest to the Zotac box which you are using which I have
at hand to test on, and I can somewhat reproduce it there.

It seems that the new code somehow causes us to hit a race somewhere, so
the NEO Z83-4 will boot most of the times but not always, it get past
the loading initrd phase for me and then it threw the following error
and after that the boot hung (waiting for the rootfs to show up)

platform device 80860F14: Resources present before probing

As I already told Rafael in a previous email, I did see something
similar when my personal tree was still 5.10 based, with the ACPI
scan rework patches cherry-picked for testing. In that case I got
a backtrace (followed by a hang) during boot about a kernel NULL
pointer deref triggered by sysfs_seq_file_read or some such. But
this problem went away with 5.11-rc1, so I stopped looking into
it. I do have a tag of my broken 5.10 + cherry-picks tree, so
I should be able to reproduce that issue.

So I see 2 possible theories here:

1. We have 2 probes of the same device racing somehow
2. The struct device memory is getting corrupted somehow.

Pierre-Louis, can you see if the following hack helps? :

--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1939,7 +1939,6 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
 		/* Bail out if the number of recorded dependencies is not 0. */
 		if (count > 0) {
 			acpi_bus_scan_second_pass = true;
-			return AE_CTRL_DEPTH;
 		}
 	}
 
@@ -1948,8 +1947,7 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
 		return AE_CTRL_DEPTH;
 
 	acpi_scan_init_hotplug(device);
-	if (!check_dep)
-		acpi_scan_dep_init(device);
+	acpi_scan_dep_init(device);
 
 out:
 	if (!*adev_p)

And can you collect an acpidump from the device and either send it to me and Rafael
offlist, or upload it somewhere and send us a link ?

Regards,

Hans


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-14 23:34 ` Hans de Goede
@ 2021-01-15  0:49   ` Pierre-Louis Bossart
  2021-01-15  8:54     ` Hans de Goede
  2021-01-15 13:54     ` Rafael J. Wysocki
  0 siblings, 2 replies; 20+ messages in thread
From: Pierre-Louis Bossart @ 2021-01-15  0:49 UTC (permalink / raw)
  To: Hans de Goede, ACPI Devel Mailing List
  Cc: Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko, Rafael J. Wysocki

Thanks Hans for your reply, much appreciated.

> Pierre-Louis, can you see if the following hack helps? :
> 
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1939,7 +1939,6 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
>   		/* Bail out if the number of recorded dependencies is not 0. */
>   		if (count > 0) {
>   			acpi_bus_scan_second_pass = true;
> -			return AE_CTRL_DEPTH;
>   		}
>   	}
>   
> @@ -1948,8 +1947,7 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
>   		return AE_CTRL_DEPTH;
>   
>   	acpi_scan_init_hotplug(device);
> -	if (!check_dep)
> -		acpi_scan_dep_init(device);
> +	acpi_scan_dep_init(device);
>   
>   out:
>   	if (!*adev_p)

Yep, those 'hacks' solve the boot problem on my device. I tried multiple 
times and it's completely reproducible.

> And can you collect an acpidump from the device and either send it to me and Rafael
> offlist, or upload it somewhere and send us a link ?

will do


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15  0:49   ` Pierre-Louis Bossart
@ 2021-01-15  8:54     ` Hans de Goede
  2021-01-15 13:38       ` Rafael J. Wysocki
  2021-01-15 13:54     ` Rafael J. Wysocki
  1 sibling, 1 reply; 20+ messages in thread
From: Hans de Goede @ 2021-01-15  8:54 UTC (permalink / raw)
  To: Pierre-Louis Bossart, ACPI Devel Mailing List
  Cc: Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko, Rafael J. Wysocki

Hi,

On 1/15/21 1:49 AM, Pierre-Louis Bossart wrote:
> Thanks Hans for your reply, much appreciated.
> 
>> Pierre-Louis, can you see if the following hack helps? :
>>
>> --- a/drivers/acpi/scan.c
>> +++ b/drivers/acpi/scan.c
>> @@ -1939,7 +1939,6 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
>>           /* Bail out if the number of recorded dependencies is not 0. */
>>           if (count > 0) {
>>               acpi_bus_scan_second_pass = true;
>> -            return AE_CTRL_DEPTH;
>>           }
>>       }
>>   @@ -1948,8 +1947,7 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
>>           return AE_CTRL_DEPTH;
>>         acpi_scan_init_hotplug(device);
>> -    if (!check_dep)
>> -        acpi_scan_dep_init(device);
>> +    acpi_scan_dep_init(device);
>>     out:
>>       if (!*adev_p)
> 
> Yep, those 'hacks' solve the boot problem on my device. I tried multiple times and it's completely reproducible.

Ok, so this confirms my earlier findings (with my personal 5.10 + cherry pick tree)
that the problem is not doing 2 scan "rounds" and thus calling e.g.
acpi_bus_attach twice. But the problem is rather with deferring the device-creation
of some devices to the second step.

>> And can you collect an acpidump from the device and either send it to me and Rafael
>> offlist, or upload it somewhere and send us a link ?
> 
> will do

2 more questions, for me on the device where I can recreate this the problem only
happens intermittently. Since you did a successful bisect, I assume that when the
boot fails, it fails every boot, right?   Do you have any special kernel debugging
options enabled, e.g. CONFIG_PAGE_POISONING, which might explain why for you it is
100% reproducable while for me it is intermittent ?  

Also may I ask what the exact model is of the Zotac device you are seeing this on?
(the DMI strings are not helpful with this)

Regards,

Hans


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15  8:54     ` Hans de Goede
@ 2021-01-15 13:38       ` Rafael J. Wysocki
  0 siblings, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2021-01-15 13:38 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Pierre-Louis Bossart, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko, Rafael J. Wysocki

On Fri, Jan 15, 2021 at 9:56 AM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 1/15/21 1:49 AM, Pierre-Louis Bossart wrote:
> > Thanks Hans for your reply, much appreciated.
> >
> >> Pierre-Louis, can you see if the following hack helps? :
> >>
> >> --- a/drivers/acpi/scan.c
> >> +++ b/drivers/acpi/scan.c
> >> @@ -1939,7 +1939,6 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
> >>           /* Bail out if the number of recorded dependencies is not 0. */
> >>           if (count > 0) {
> >>               acpi_bus_scan_second_pass = true;

It would be good to add

+ acpi_handle_info(handle, "Enumeration deferred\n");

here on top of the previous debug changes so we know which device(s) to look at.

> >> -            return AE_CTRL_DEPTH;
> >>           }
> >>       }
> >>   @@ -1948,8 +1947,7 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
> >>           return AE_CTRL_DEPTH;
> >>         acpi_scan_init_hotplug(device);
> >> -    if (!check_dep)
> >> -        acpi_scan_dep_init(device);
> >> +    acpi_scan_dep_init(device);

I think that this change can be made without issues anyway.

AFAICS, this part of acpi_bus_check_add() will only be called once for
every given device anyway and initializing dep_unmet for the devices
for which acpi_scan_check_dep() above returns 0 shouldn't hurt.

> >>     out:
> >>       if (!*adev_p)
> >
> > Yep, those 'hacks' solve the boot problem on my device. I tried multiple times and it's completely reproducible.
>
> Ok, so this confirms my earlier findings (with my personal 5.10 + cherry pick tree)
> that the problem is not doing 2 scan "rounds" and thus calling e.g.
> acpi_bus_attach twice. But the problem is rather with deferring the device-creation
> of some devices to the second step.

That would be my expectation.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15  0:49   ` Pierre-Louis Bossart
  2021-01-15  8:54     ` Hans de Goede
@ 2021-01-15 13:54     ` Rafael J. Wysocki
  2021-01-15 14:52       ` Pierre-Louis Bossart
  1 sibling, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2021-01-15 13:54 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Hans de Goede, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko, Rafael J. Wysocki

On Friday, January 15, 2021 1:49:17 AM CET Pierre-Louis Bossart wrote:
> Thanks Hans for your reply, much appreciated.
> 
> > Pierre-Louis, can you see if the following hack helps? :
> > 
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -1939,7 +1939,6 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
> >   		/* Bail out if the number of recorded dependencies is not 0. */
> >   		if (count > 0) {
> >   			acpi_bus_scan_second_pass = true;
> > -			return AE_CTRL_DEPTH;
> >   		}
> >   	}
> >   
> > @@ -1948,8 +1947,7 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
> >   		return AE_CTRL_DEPTH;
> >   
> >   	acpi_scan_init_hotplug(device);
> > -	if (!check_dep)
> > -		acpi_scan_dep_init(device);
> > +	acpi_scan_dep_init(device);
> >   
> >   out:
> >   	if (!*adev_p)
> 
> Yep, those 'hacks' solve the boot problem on my device. I tried multiple 
> times and it's completely reproducible.
> 
> > And can you collect an acpidump from the device and either send it to me and Rafael
> > offlist, or upload it somewhere and send us a link ?
> 
> will do

In addition to what Hans asked for, can you please build the kernel with the
debug patch below applied (instead of the Hans' debug patch), try to boot
the affected machine with it and see what is missing with respect to booting
the kernel with the two problematic commits reverted?

Also please send me the outout of "dmesg | grep "Enumeration" from the debug
kernel if possible.

---
 drivers/acpi/scan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/acpi/scan.c
===================================================================
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -1951,7 +1951,7 @@ static acpi_status acpi_bus_check_add(ac
 		u32 count = acpi_scan_check_dep(handle);
 		/* Bail out if the number of recorded dependencies is not 0. */
 		if (count > 0) {
-			acpi_bus_scan_second_pass = true;
+			acpi_handle_info(handle, "Enumeration skipped\n");
 			return AE_CTRL_DEPTH;
 		}
 	}




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 13:54     ` Rafael J. Wysocki
@ 2021-01-15 14:52       ` Pierre-Louis Bossart
  2021-01-15 14:55         ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: Pierre-Louis Bossart @ 2021-01-15 14:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Hans de Goede, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko, Rafael J. Wysocki


>> Yep, those 'hacks' solve the boot problem on my device. I tried multiple
>> times and it's completely reproducible.
>>
>>> And can you collect an acpidump from the device and either send it to me and Rafael
>>> offlist, or upload it somewhere and send us a link ?
>>
>> will do
> 
> In addition to what Hans asked for, can you please build the kernel with the
> debug patch below applied (instead of the Hans' debug patch), try to boot
> the affected machine with it and see what is missing with respect to booting
> the kernel with the two problematic commits reverted?

Sorry, not following. Are you asking me to apply the patch below as well 
as revert the two problematic commits? Or just the patch below? the boot 
process is stuck without the reverts and I don't have a serial link to 
see what happens (closed form-factor).

> Also please send me the outout of "dmesg | grep "Enumeration" from the debug
> kernel if possible. >
> ---
>   drivers/acpi/scan.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -1951,7 +1951,7 @@ static acpi_status acpi_bus_check_add(ac
>   		u32 count = acpi_scan_check_dep(handle);
>   		/* Bail out if the number of recorded dependencies is not 0. */
>   		if (count > 0) {
> -			acpi_bus_scan_second_pass = true;
> +			acpi_handle_info(handle, "Enumeration skipped\n");
>   			return AE_CTRL_DEPTH;
>   		}
>   	}
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 14:52       ` Pierre-Louis Bossart
@ 2021-01-15 14:55         ` Rafael J. Wysocki
  2021-01-15 15:09           ` Pierre-Louis Bossart
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2021-01-15 14:55 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Rafael J. Wysocki, Hans de Goede, ACPI Devel Mailing List,
	Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko,
	Rafael J. Wysocki

On Fri, Jan 15, 2021 at 3:52 PM Pierre-Louis Bossart
<pierre-louis.bossart@linux.intel.com> wrote:
>
>
> >> Yep, those 'hacks' solve the boot problem on my device. I tried multiple
> >> times and it's completely reproducible.
> >>
> >>> And can you collect an acpidump from the device and either send it to me and Rafael
> >>> offlist, or upload it somewhere and send us a link ?
> >>
> >> will do
> >
> > In addition to what Hans asked for, can you please build the kernel with the
> > debug patch below applied (instead of the Hans' debug patch), try to boot
> > the affected machine with it and see what is missing with respect to booting
> > the kernel with the two problematic commits reverted?
>
> Sorry, not following. Are you asking me to apply the patch below as well
> as revert the two problematic commits? Or just the patch below?

Just the patch below.

> the boot process is stuck without the reverts and I don't have a serial link to
> see what happens (closed form-factor).

The point is that the patch below may unstuck it, in which case it
should be possible to find out what is missing with respect to the
full successful boot.

> > Also please send me the outout of "dmesg | grep "Enumeration" from the debug
> > kernel if possible. >
> > ---
> >   drivers/acpi/scan.c |    2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/acpi/scan.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/scan.c
> > +++ linux-pm/drivers/acpi/scan.c
> > @@ -1951,7 +1951,7 @@ static acpi_status acpi_bus_check_add(ac
> >               u32 count = acpi_scan_check_dep(handle);
> >               /* Bail out if the number of recorded dependencies is not 0. */
> >               if (count > 0) {
> > -                     acpi_bus_scan_second_pass = true;
> > +                     acpi_handle_info(handle, "Enumeration skipped\n");
> >                       return AE_CTRL_DEPTH;
> >               }
> >       }
> >
> >
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 14:55         ` Rafael J. Wysocki
@ 2021-01-15 15:09           ` Pierre-Louis Bossart
  2021-01-15 15:22             ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: Pierre-Louis Bossart @ 2021-01-15 15:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Hans de Goede, ACPI Devel Mailing List,
	Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko




>>> In addition to what Hans asked for, can you please build the kernel with the
>>> debug patch below applied (instead of the Hans' debug patch), try to boot
>>> the affected machine with it and see what is missing with respect to booting
>>> the kernel with the two problematic commits reverted?
>>
>> Sorry, not following. Are you asking me to apply the patch below as well
>> as revert the two problematic commits? Or just the patch below?
> 
> Just the patch below.
> 
>> the boot process is stuck without the reverts and I don't have a serial link to
>> see what happens (closed form-factor).
> 
> The point is that the patch below may unstuck it, in which case it
> should be possible to find out what is missing with respect to the
> full successful boot.

No luck. I tried twice with the patch below only, and the device is 
still stuck after the 'Loading initial ramdisk ...'.

>>> Also please send me the outout of "dmesg | grep "Enumeration" from the debug
>>> kernel if possible. >
>>> ---
>>>    drivers/acpi/scan.c |    2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Index: linux-pm/drivers/acpi/scan.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/acpi/scan.c
>>> +++ linux-pm/drivers/acpi/scan.c
>>> @@ -1951,7 +1951,7 @@ static acpi_status acpi_bus_check_add(ac
>>>                u32 count = acpi_scan_check_dep(handle);
>>>                /* Bail out if the number of recorded dependencies is not 0. */
>>>                if (count > 0) {
>>> -                     acpi_bus_scan_second_pass = true;
>>> +                     acpi_handle_info(handle, "Enumeration skipped\n");
>>>                        return AE_CTRL_DEPTH;
>>>                }
>>>        }
>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 15:09           ` Pierre-Louis Bossart
@ 2021-01-15 15:22             ` Rafael J. Wysocki
  2021-01-15 15:38               ` Pierre-Louis Bossart
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2021-01-15 15:22 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Rafael J. Wysocki, Hans de Goede, ACPI Devel Mailing List,
	Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko

On Friday, January 15, 2021 4:09:05 PM CET Pierre-Louis Bossart wrote:
> 
> >>> In addition to what Hans asked for, can you please build the kernel with the
> >>> debug patch below applied (instead of the Hans' debug patch), try to boot
> >>> the affected machine with it and see what is missing with respect to booting
> >>> the kernel with the two problematic commits reverted?
> >>
> >> Sorry, not following. Are you asking me to apply the patch below as well
> >> as revert the two problematic commits? Or just the patch below?
> > 
> > Just the patch below.
> > 
> >> the boot process is stuck without the reverts and I don't have a serial link to
> >> see what happens (closed form-factor).
> > 
> > The point is that the patch below may unstuck it, in which case it
> > should be possible to find out what is missing with respect to the
> > full successful boot.
> 
> No luck. I tried twice with the patch below only, and the device is 
> still stuck after the 'Loading initial ramdisk ...'.

Thanks!

This means that skipping the enumeration of a certain device alone is
problematic which is a surprise of sorts.

Let's see what device that may be.

Because the machine booted with the debug patch from Hans, it should also boot
with the one below, so please do that and send the output of

$ dmesg | grep Dependencies

---
 drivers/acpi/scan.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Index: linux-pm/drivers/acpi/scan.c
===================================================================
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -1950,10 +1950,8 @@ static acpi_status acpi_bus_check_add(ac
 	if (type == ACPI_BUS_TYPE_DEVICE && check_dep) {
 		u32 count = acpi_scan_check_dep(handle);
 		/* Bail out if the number of recorded dependencies is not 0. */
-		if (count > 0) {
-			acpi_bus_scan_second_pass = true;
-			return AE_CTRL_DEPTH;
-		}
+		if (count > 0)
+			acpi_handle_info(handle, "Dependencies found\n");
 	}
 
 	acpi_add_single_object(&device, handle, type, sta);
@@ -1961,8 +1959,7 @@ static acpi_status acpi_bus_check_add(ac
 		return AE_CTRL_DEPTH;
 
 	acpi_scan_init_hotplug(device);
-	if (!check_dep)
-		acpi_scan_dep_init(device);
+	acpi_scan_dep_init(device);
 
 out:
 	if (!*adev_p)




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 15:22             ` Rafael J. Wysocki
@ 2021-01-15 15:38               ` Pierre-Louis Bossart
  2021-01-15 16:05                 ` Hans de Goede
  0 siblings, 1 reply; 20+ messages in thread
From: Pierre-Louis Bossart @ 2021-01-15 15:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Hans de Goede, ACPI Devel Mailing List,
	Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko


> This means that skipping the enumeration of a certain device alone is
> problematic which is a surprise of sorts.
> 
> Let's see what device that may be.
> 
> Because the machine booted with the debug patch from Hans, it should also boot
> with the one below, so please do that and send the output of
> 
> $ dmesg | grep Dependencies

Yep, boot success with that patch :-)

root@plb-Zotac:~# dmesg | grep Dependencies
[    0.426722] ACPI: \_SB_.PCI0.SDHB: Dependencies found
[    0.427927] ACPI: \_SB_.PCI0.SDHB.BRCM: Dependencies found
[    0.431863] ACPI: \_SB_.PCI0.SDHC: Dependencies found
[    0.433128] ACPI: \_SB_.PCI0.SHC1: Dependencies found
[    0.466328] ACPI: \_SB_.PCI0.I2C1.BATC: Dependencies found
[    0.478490] ACPI: \_SB_.PCI0.I2C3.TIDR: Dependencies found
[    0.479851] ACPI: \_SB_.PCI0.I2C3.ABAT: Dependencies found
[    0.480756] ACPI: \_SB_.PCI0.I2C4: Dependencies found
[    0.482605] ACPI: \_SB_.PCI0.I2C4.CA10: Dependencies found
[    0.484464] ACPI: \_SB_.PCI0.I2C4.CAM9: Dependencies found
[    0.485769] ACPI: \_SB_.PCI0.I2C4.CAM3: Dependencies found
[    0.487187] ACPI: \_SB_.PCI0.I2C4.CAM4: Dependencies found
[    0.490563] ACPI: \_SB_.PCI0.I2C6.TCS0: Dependencies found
[    0.492673] ACPI: \_SB_.PCI0.I2C6.SYN1: Dependencies found
[    0.494923] ACPI: \_SB_.PCI0.I2C7.PMI1: Dependencies found
[    0.496528] ACPI: \_SB_.PCI0.I2C7.PMI2: Dependencies found
[    0.498111] ACPI: \_SB_.PCI0.I2C7.PMI5: Dependencies found
[    0.499909] ACPI: \_SB_.PCI0.I2C7.PMIF: Dependencies found
[    0.500891] ACPI: \_SB_.PCI0.I2C7.PMIC: Dependencies found
[    0.502822] ACPI: \_SB_.PCI0.I2C7.BMDR: Dependencies found
[    0.504333] ACPI: \_SB_.PCI0.I2C7.WIDR: Dependencies found
[    0.505689] ACPI: \_SB_.PCI0.I2C7.BATC: Dependencies found
[    0.509734] ACPI: \_SB_.PCI0.AMCR: Dependencies found
[    0.510715] ACPI: \_SB_.PCI0.TIMC: Dependencies found
[    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
[    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
[    0.531940] ACPI: \_SB_.WLCH: Dependencies found
[    0.532866] ACPI: \_SB_.WCH2: Dependencies found
[    0.533927] ACPI: \_SB_.FLDM: Dependencies found
[    0.541033] ACPI: \_SB_.BTNS: Dependencies found
[    0.543079] ACPI: \_SB_.TBAD: Dependencies found
[    0.549169] ACPI: \_SB_.UBTC: Dependencies found

I can run more tests as needed.

> ---
>   drivers/acpi/scan.c |    9 +++------
>   1 file changed, 3 insertions(+), 6 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -1950,10 +1950,8 @@ static acpi_status acpi_bus_check_add(ac
>   	if (type == ACPI_BUS_TYPE_DEVICE && check_dep) {
>   		u32 count = acpi_scan_check_dep(handle);
>   		/* Bail out if the number of recorded dependencies is not 0. */
> -		if (count > 0) {
> -			acpi_bus_scan_second_pass = true;
> -			return AE_CTRL_DEPTH;
> -		}
> +		if (count > 0)
> +			acpi_handle_info(handle, "Dependencies found\n");
>   	}
>   
>   	acpi_add_single_object(&device, handle, type, sta);
> @@ -1961,8 +1959,7 @@ static acpi_status acpi_bus_check_add(ac
>   		return AE_CTRL_DEPTH;
>   
>   	acpi_scan_init_hotplug(device);
> -	if (!check_dep)
> -		acpi_scan_dep_init(device);
> +	acpi_scan_dep_init(device);
>   
>   out:
>   	if (!*adev_p)
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 15:38               ` Pierre-Louis Bossart
@ 2021-01-15 16:05                 ` Hans de Goede
  2021-01-15 16:22                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: Hans de Goede @ 2021-01-15 16:05 UTC (permalink / raw)
  To: Pierre-Louis Bossart, Rafael J. Wysocki
  Cc: Rafael J. Wysocki, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko

Hi,

On 1/15/21 4:38 PM, Pierre-Louis Bossart wrote:
> 
>> This means that skipping the enumeration of a certain device alone is
>> problematic which is a surprise of sorts.
>>
>> Let's see what device that may be.
>>
>> Because the machine booted with the debug patch from Hans, it should also boot
>> with the one below, so please do that and send the output of
>>
>> $ dmesg | grep Dependencies
> 
> Yep, boot success with that patch :-)
> 
> root@plb-Zotac:~# dmesg | grep Dependencies
> [    0.426722] ACPI: \_SB_.PCI0.SDHB: Dependencies found
> [    0.427927] ACPI: \_SB_.PCI0.SDHB.BRCM: Dependencies found
> [    0.431863] ACPI: \_SB_.PCI0.SDHC: Dependencies found
> [    0.433128] ACPI: \_SB_.PCI0.SHC1: Dependencies found
> [    0.466328] ACPI: \_SB_.PCI0.I2C1.BATC: Dependencies found
> [    0.478490] ACPI: \_SB_.PCI0.I2C3.TIDR: Dependencies found
> [    0.479851] ACPI: \_SB_.PCI0.I2C3.ABAT: Dependencies found
> [    0.480756] ACPI: \_SB_.PCI0.I2C4: Dependencies found
> [    0.482605] ACPI: \_SB_.PCI0.I2C4.CA10: Dependencies found
> [    0.484464] ACPI: \_SB_.PCI0.I2C4.CAM9: Dependencies found
> [    0.485769] ACPI: \_SB_.PCI0.I2C4.CAM3: Dependencies found
> [    0.487187] ACPI: \_SB_.PCI0.I2C4.CAM4: Dependencies found
> [    0.490563] ACPI: \_SB_.PCI0.I2C6.TCS0: Dependencies found
> [    0.492673] ACPI: \_SB_.PCI0.I2C6.SYN1: Dependencies found
> [    0.494923] ACPI: \_SB_.PCI0.I2C7.PMI1: Dependencies found
> [    0.496528] ACPI: \_SB_.PCI0.I2C7.PMI2: Dependencies found
> [    0.498111] ACPI: \_SB_.PCI0.I2C7.PMI5: Dependencies found
> [    0.499909] ACPI: \_SB_.PCI0.I2C7.PMIF: Dependencies found
> [    0.500891] ACPI: \_SB_.PCI0.I2C7.PMIC: Dependencies found
> [    0.502822] ACPI: \_SB_.PCI0.I2C7.BMDR: Dependencies found
> [    0.504333] ACPI: \_SB_.PCI0.I2C7.WIDR: Dependencies found
> [    0.505689] ACPI: \_SB_.PCI0.I2C7.BATC: Dependencies found
> [    0.509734] ACPI: \_SB_.PCI0.AMCR: Dependencies found
> [    0.510715] ACPI: \_SB_.PCI0.TIMC: Dependencies found
> [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found

Ah, that is enlightening, that is not supposed to happen, that device
has both an _ADR and an _HID method which is not allowed according
to the spec.

Can you try a clean 5.11 kernel (so none of the previous
debug patches) with the following change added:

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 1f27f74cc83c..93954ac3bfcc 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
 	 * 2. ACPI nodes describing USB ports.
 	 * Still, checking for _HID catches more then just these cases ...
 	 */
-	if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
+	if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
+	    acpi_has_method(handle, "_ADR"))
 		return 0;
 
 	status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);


> [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found

And idem. for this one.

That might very well fix this.

Regards,

Hans





> [    0.531940] ACPI: \_SB_.WLCH: Dependencies found
> [    0.532866] ACPI: \_SB_.WCH2: Dependencies found
> [    0.533927] ACPI: \_SB_.FLDM: Dependencies found
> [    0.541033] ACPI: \_SB_.BTNS: Dependencies found
> [    0.543079] ACPI: \_SB_.TBAD: Dependencies found
> [    0.549169] ACPI: \_SB_.UBTC: Dependencies found
> 
> I can run more tests as needed.
> 
>> ---
>>   drivers/acpi/scan.c |    9 +++------
>>   1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> Index: linux-pm/drivers/acpi/scan.c
>> ===================================================================
>> --- linux-pm.orig/drivers/acpi/scan.c
>> +++ linux-pm/drivers/acpi/scan.c
>> @@ -1950,10 +1950,8 @@ static acpi_status acpi_bus_check_add(ac
>>       if (type == ACPI_BUS_TYPE_DEVICE && check_dep) {
>>           u32 count = acpi_scan_check_dep(handle);
>>           /* Bail out if the number of recorded dependencies is not 0. */
>> -        if (count > 0) {
>> -            acpi_bus_scan_second_pass = true;
>> -            return AE_CTRL_DEPTH;
>> -        }
>> +        if (count > 0)
>> +            acpi_handle_info(handle, "Dependencies found\n");
>>       }
>>         acpi_add_single_object(&device, handle, type, sta);
>> @@ -1961,8 +1959,7 @@ static acpi_status acpi_bus_check_add(ac
>>           return AE_CTRL_DEPTH;
>>         acpi_scan_init_hotplug(device);
>> -    if (!check_dep)
>> -        acpi_scan_dep_init(device);
>> +    acpi_scan_dep_init(device);
>>     out:
>>       if (!*adev_p)
>>
>>
>>
> 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 16:05                 ` Hans de Goede
@ 2021-01-15 16:22                   ` Rafael J. Wysocki
  2021-01-15 16:41                     ` Pierre-Louis Bossart
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2021-01-15 16:22 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Pierre-Louis Bossart, Rafael J. Wysocki, Rafael J. Wysocki,
	ACPI Devel Mailing List, Mika Westerberg, Rafael J. Wysocki,
	Andy Shevchenko

On Fri, Jan 15, 2021 at 5:05 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 1/15/21 4:38 PM, Pierre-Louis Bossart wrote:
> >
> >> This means that skipping the enumeration of a certain device alone is
> >> problematic which is a surprise of sorts.
> >>
> >> Let's see what device that may be.
> >>
> >> Because the machine booted with the debug patch from Hans, it should also boot
> >> with the one below, so please do that and send the output of
> >>
> >> $ dmesg | grep Dependencies
> >
> > Yep, boot success with that patch :-)
> >
> > root@plb-Zotac:~# dmesg | grep Dependencies
> > [    0.426722] ACPI: \_SB_.PCI0.SDHB: Dependencies found
> > [    0.427927] ACPI: \_SB_.PCI0.SDHB.BRCM: Dependencies found
> > [    0.431863] ACPI: \_SB_.PCI0.SDHC: Dependencies found
> > [    0.433128] ACPI: \_SB_.PCI0.SHC1: Dependencies found
> > [    0.466328] ACPI: \_SB_.PCI0.I2C1.BATC: Dependencies found
> > [    0.478490] ACPI: \_SB_.PCI0.I2C3.TIDR: Dependencies found
> > [    0.479851] ACPI: \_SB_.PCI0.I2C3.ABAT: Dependencies found
> > [    0.480756] ACPI: \_SB_.PCI0.I2C4: Dependencies found
> > [    0.482605] ACPI: \_SB_.PCI0.I2C4.CA10: Dependencies found
> > [    0.484464] ACPI: \_SB_.PCI0.I2C4.CAM9: Dependencies found
> > [    0.485769] ACPI: \_SB_.PCI0.I2C4.CAM3: Dependencies found
> > [    0.487187] ACPI: \_SB_.PCI0.I2C4.CAM4: Dependencies found
> > [    0.490563] ACPI: \_SB_.PCI0.I2C6.TCS0: Dependencies found
> > [    0.492673] ACPI: \_SB_.PCI0.I2C6.SYN1: Dependencies found
> > [    0.494923] ACPI: \_SB_.PCI0.I2C7.PMI1: Dependencies found
> > [    0.496528] ACPI: \_SB_.PCI0.I2C7.PMI2: Dependencies found
> > [    0.498111] ACPI: \_SB_.PCI0.I2C7.PMI5: Dependencies found
> > [    0.499909] ACPI: \_SB_.PCI0.I2C7.PMIF: Dependencies found
> > [    0.500891] ACPI: \_SB_.PCI0.I2C7.PMIC: Dependencies found
> > [    0.502822] ACPI: \_SB_.PCI0.I2C7.BMDR: Dependencies found
> > [    0.504333] ACPI: \_SB_.PCI0.I2C7.WIDR: Dependencies found
> > [    0.505689] ACPI: \_SB_.PCI0.I2C7.BATC: Dependencies found
> > [    0.509734] ACPI: \_SB_.PCI0.AMCR: Dependencies found
> > [    0.510715] ACPI: \_SB_.PCI0.TIMC: Dependencies found
> > [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
>
> Ah, that is enlightening, that is not supposed to happen, that device
> has both an _ADR and an _HID method which is not allowed according
> to the spec.
>
> Can you try a clean 5.11 kernel (so none of the previous
> debug patches) with the following change added:
>
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 1f27f74cc83c..93954ac3bfcc 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
>          * 2. ACPI nodes describing USB ports.
>          * Still, checking for _HID catches more then just these cases ...
>          */
> -       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
> +       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
> +           acpi_has_method(handle, "_ADR"))
>                 return 0;
>
>         status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
>
>
> > [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
>
> And idem. for this one.
>
> That might very well fix this.

It might.

That said, there are "interesting" dependencies in those ACPI tables
(like on the PMIC which is deferred, because it depends on I2C7 and
GP01 and even some power resources depend on it).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 16:22                   ` Rafael J. Wysocki
@ 2021-01-15 16:41                     ` Pierre-Louis Bossart
  2021-01-15 19:01                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: Pierre-Louis Bossart @ 2021-01-15 16:41 UTC (permalink / raw)
  To: Rafael J. Wysocki, Hans de Goede
  Cc: Rafael J. Wysocki, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko, Bossart, Pierre-louis


>>> [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
>>
>> Ah, that is enlightening, that is not supposed to happen, that device
>> has both an _ADR and an _HID method which is not allowed according
>> to the spec.

it's not an uncommon issue for audio codecs, here's three examples:

             Device (RTK1)
             {
                 Name (_ADR, Zero)  // _ADR: Address
                 Name (_HID, "10EC5670")  // _HID: Hardware ID
                 Name (_CID, "10EC5670")  // _CID: Compatible ID
                 Name (_DDN, "ALC5642")  // _DDN: DOS Device Name

         Device (MAXM)
         {
             Name (_ADR, Zero)  // _ADR: Address
             Name (_HID, "193C9890")  // _HID: Hardware ID
             Name (_CID, "193C9890")  // _CID: Compatible ID
             Name (_DDN, "Maxim 98090 Codec  ")  // _DDN: DOS Device Name

         Device (TISW)
         {
             Name (_ADR, Zero)  // _ADR: Address
             Name (_HID, "104C227E")  // _HID: Hardware ID
             Name (_CID, "104C227E")  // _CID: Compatible ID

It's been that way for years...

>> Can you try a clean 5.11 kernel (so none of the previous
>> debug patches) with the following change added:
>>
>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>> index 1f27f74cc83c..93954ac3bfcc 100644
>> --- a/drivers/acpi/scan.c
>> +++ b/drivers/acpi/scan.c
>> @@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
>>           * 2. ACPI nodes describing USB ports.
>>           * Still, checking for _HID catches more then just these cases ...
>>           */
>> -       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
>> +       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
>> +           acpi_has_method(handle, "_ADR"))
>>                  return 0;
>>
>>          status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
>>
>>
>>> [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
>>
>> And idem. for this one.
>>
>> That might very well fix this.

Nope, no luck with this patch. boot still stuck.

> It might.
> 
> That said, there are "interesting" dependencies in those ACPI tables
> (like on the PMIC which is deferred, because it depends on I2C7 and
> GP01 and even some power resources depend on it).
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 16:41                     ` Pierre-Louis Bossart
@ 2021-01-15 19:01                       ` Rafael J. Wysocki
  2021-01-15 19:06                         ` Rafael J. Wysocki
                                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2021-01-15 19:01 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Rafael J. Wysocki, Hans de Goede, ACPI Devel Mailing List,
	Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko, Bossart,
	Pierre-louis

On Friday, January 15, 2021 5:41:57 PM CET Pierre-Louis Bossart wrote:
> 
> >>> [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
> >>
> >> Ah, that is enlightening, that is not supposed to happen, that device
> >> has both an _ADR and an _HID method which is not allowed according
> >> to the spec.
> 
> it's not an uncommon issue for audio codecs, here's three examples:
> 
>              Device (RTK1)
>              {
>                  Name (_ADR, Zero)  // _ADR: Address
>                  Name (_HID, "10EC5670")  // _HID: Hardware ID
>                  Name (_CID, "10EC5670")  // _CID: Compatible ID
>                  Name (_DDN, "ALC5642")  // _DDN: DOS Device Name
> 
>          Device (MAXM)
>          {
>              Name (_ADR, Zero)  // _ADR: Address
>              Name (_HID, "193C9890")  // _HID: Hardware ID
>              Name (_CID, "193C9890")  // _CID: Compatible ID
>              Name (_DDN, "Maxim 98090 Codec  ")  // _DDN: DOS Device Name
> 
>          Device (TISW)
>          {
>              Name (_ADR, Zero)  // _ADR: Address
>              Name (_HID, "104C227E")  // _HID: Hardware ID
>              Name (_CID, "104C227E")  // _CID: Compatible ID
> 
> It's been that way for years...
> 
> >> Can you try a clean 5.11 kernel (so none of the previous
> >> debug patches) with the following change added:
> >>
> >> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> >> index 1f27f74cc83c..93954ac3bfcc 100644
> >> --- a/drivers/acpi/scan.c
> >> +++ b/drivers/acpi/scan.c
> >> @@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
> >>           * 2. ACPI nodes describing USB ports.
> >>           * Still, checking for _HID catches more then just these cases ...
> >>           */
> >> -       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
> >> +       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
> >> +           acpi_has_method(handle, "_ADR"))
> >>                  return 0;
> >>
> >>          status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
> >>
> >>
> >>> [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
> >>
> >> And idem. for this one.
> >>
> >> That might very well fix this.
> 
> Nope, no luck with this patch. boot still stuck.

OK, thanks!

Now, there is a theory to test and some more debug work to do.

First, the kernel should not crash outright if some ACPI device objects are
missing which evidently happens here.  There may be some problems resulting
from that, but the crash indicates a code bug in the kernel.

Apparently, something expects the device objects to be there so badly, that it
crashes right away when they aren't there.  One of the issues that may cause
that to happen are mistakes around the acpi_bus_get_device() usage and I found
two of them, so below is a patch to test.

Please apply to plain 5.11-rc3 (or -rc4 when it is out) and see if that makes
any difference.

---
 drivers/acpi/scan.c         |    3 +--
 drivers/usb/core/usb-acpi.c |    3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/acpi/scan.c
===================================================================
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -2120,8 +2120,7 @@ void acpi_walk_dep_device_list(acpi_hand
 	mutex_lock(&acpi_dep_list_lock);
 	list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) {
 		if (dep->supplier == handle) {
-			acpi_bus_get_device(dep->consumer, &adev);
-			if (!adev)
+			if (acpi_bus_get_device(dep->consumer, &adev))
 				continue;
 
 			adev->dep_unmet--;
Index: linux-pm/drivers/usb/core/usb-acpi.c
===================================================================
--- linux-pm.orig/drivers/usb/core/usb-acpi.c
+++ linux-pm/drivers/usb/core/usb-acpi.c
@@ -163,10 +163,9 @@ usb_acpi_get_companion_for_port(struct u
 	} else {
 		parent_handle = usb_get_hub_port_acpi_handle(udev->parent,
 							     udev->portnum);
-		if (!parent_handle)
+		if (!parent_handle || acpi_bus_get_device(parent_handle, &adev))
 			return NULL;
 
-		acpi_bus_get_device(parent_handle, &adev);
 		port1 = port_dev->portnum;
 	}
 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 19:01                       ` Rafael J. Wysocki
@ 2021-01-15 19:06                         ` Rafael J. Wysocki
  2021-01-15 20:48                         ` Hans de Goede
  2021-01-15 21:57                         ` Hans de Goede
  2 siblings, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2021-01-15 19:06 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Rafael J. Wysocki, Hans de Goede, ACPI Devel Mailing List,
	Mika Westerberg, Rafael J. Wysocki, Andy Shevchenko, Bossart,
	Pierre-louis

On Friday, January 15, 2021 8:01:19 PM CET Rafael J. Wysocki wrote:
> On Friday, January 15, 2021 5:41:57 PM CET Pierre-Louis Bossart wrote:
> > 
> > >>> [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
> > >>
> > >> Ah, that is enlightening, that is not supposed to happen, that device
> > >> has both an _ADR and an _HID method which is not allowed according
> > >> to the spec.
> > 
> > it's not an uncommon issue for audio codecs, here's three examples:
> > 
> >              Device (RTK1)
> >              {
> >                  Name (_ADR, Zero)  // _ADR: Address
> >                  Name (_HID, "10EC5670")  // _HID: Hardware ID
> >                  Name (_CID, "10EC5670")  // _CID: Compatible ID
> >                  Name (_DDN, "ALC5642")  // _DDN: DOS Device Name
> > 
> >          Device (MAXM)
> >          {
> >              Name (_ADR, Zero)  // _ADR: Address
> >              Name (_HID, "193C9890")  // _HID: Hardware ID
> >              Name (_CID, "193C9890")  // _CID: Compatible ID
> >              Name (_DDN, "Maxim 98090 Codec  ")  // _DDN: DOS Device Name
> > 
> >          Device (TISW)
> >          {
> >              Name (_ADR, Zero)  // _ADR: Address
> >              Name (_HID, "104C227E")  // _HID: Hardware ID
> >              Name (_CID, "104C227E")  // _CID: Compatible ID
> > 
> > It's been that way for years...
> > 
> > >> Can you try a clean 5.11 kernel (so none of the previous
> > >> debug patches) with the following change added:
> > >>
> > >> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > >> index 1f27f74cc83c..93954ac3bfcc 100644
> > >> --- a/drivers/acpi/scan.c
> > >> +++ b/drivers/acpi/scan.c
> > >> @@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
> > >>           * 2. ACPI nodes describing USB ports.
> > >>           * Still, checking for _HID catches more then just these cases ...
> > >>           */
> > >> -       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
> > >> +       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
> > >> +           acpi_has_method(handle, "_ADR"))
> > >>                  return 0;
> > >>
> > >>          status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
> > >>
> > >>
> > >>> [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
> > >>
> > >> And idem. for this one.
> > >>
> > >> That might very well fix this.
> > 
> > Nope, no luck with this patch. boot still stuck.
> 
> OK, thanks!
> 
> Now, there is a theory to test and some more debug work to do.
> 
> First, the kernel should not crash outright if some ACPI device objects are
> missing which evidently happens here.  There may be some problems resulting
> from that, but the crash indicates a code bug in the kernel.
> 
> Apparently, something expects the device objects to be there so badly, that it
> crashes right away when they aren't there.  One of the issues that may cause
> that to happen are mistakes around the acpi_bus_get_device() usage and I found
> two of them, so below is a patch to test.
> 
> Please apply to plain 5.11-rc3 (or -rc4 when it is out) and see if that makes
> any difference.
> 
> ---
>  drivers/acpi/scan.c         |    3 +--
>  drivers/usb/core/usb-acpi.c |    3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2120,8 +2120,7 @@ void acpi_walk_dep_device_list(acpi_hand
>  	mutex_lock(&acpi_dep_list_lock);
>  	list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) {
>  		if (dep->supplier == handle) {
> -			acpi_bus_get_device(dep->consumer, &adev);
> -			if (!adev)
> +			if (acpi_bus_get_device(dep->consumer, &adev))
>  				continue;
>  
>  			adev->dep_unmet--;
> Index: linux-pm/drivers/usb/core/usb-acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/usb/core/usb-acpi.c
> +++ linux-pm/drivers/usb/core/usb-acpi.c
> @@ -163,10 +163,9 @@ usb_acpi_get_companion_for_port(struct u
>  	} else {
>  		parent_handle = usb_get_hub_port_acpi_handle(udev->parent,
>  							     udev->portnum);
> -		if (!parent_handle)
> +		if (!parent_handle || acpi_bus_get_device(parent_handle, &adev))
>  			return NULL;
>  
> -		acpi_bus_get_device(parent_handle, &adev);
>  		port1 = port_dev->portnum;
>  	}
>  

The above need to be fixed even if they don't make any difference in your case.

Anyway, if they don't make any difference for you, please boot plain 5.11-rc[34]
with the debug patch below applied and grep the dmesg output for "Dependencies"
again.

The check added to acpi_scan_check_dep() reduces the number of devices that may
be affected quite a bit.

---
 drivers/acpi/scan.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/acpi/scan.c
===================================================================
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -1867,7 +1867,8 @@ static u32 acpi_scan_check_dep(acpi_hand
 	 * 2. ACPI nodes describing USB ports.
 	 * Still, checking for _HID catches more then just these cases ...
 	 */
-	if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
+	if (!acpi_has_method(handle, "_DEP") || acpi_has_method(handle, "_ADR")
+	    || !acpi_has_method(handle, "_HID"))
 		return 0;
 
 	status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
@@ -1950,10 +1951,8 @@ static acpi_status acpi_bus_check_add(ac
 	if (type == ACPI_BUS_TYPE_DEVICE && check_dep) {
 		u32 count = acpi_scan_check_dep(handle);
 		/* Bail out if the number of recorded dependencies is not 0. */
-		if (count > 0) {
-			acpi_bus_scan_second_pass = true;
-			return AE_CTRL_DEPTH;
-		}
+		if (count > 0)
+			acpi_handle_info(handle, "Dependencies found\n");
 	}
 
 	acpi_add_single_object(&device, handle, type, sta);
@@ -1961,8 +1960,7 @@ static acpi_status acpi_bus_check_add(ac
 		return AE_CTRL_DEPTH;
 
 	acpi_scan_init_hotplug(device);
-	if (!check_dep)
-		acpi_scan_dep_init(device);
+	acpi_scan_dep_init(device);
 
 out:
 	if (!*adev_p)




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 19:01                       ` Rafael J. Wysocki
  2021-01-15 19:06                         ` Rafael J. Wysocki
@ 2021-01-15 20:48                         ` Hans de Goede
  2021-01-15 21:57                         ` Hans de Goede
  2 siblings, 0 replies; 20+ messages in thread
From: Hans de Goede @ 2021-01-15 20:48 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pierre-Louis Bossart
  Cc: Rafael J. Wysocki, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko, Bossart, Pierre-louis

Hi,

On 1/15/21 8:01 PM, Rafael J. Wysocki wrote:
> On Friday, January 15, 2021 5:41:57 PM CET Pierre-Louis Bossart wrote:
>>
>>>>> [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
>>>>
>>>> Ah, that is enlightening, that is not supposed to happen, that device
>>>> has both an _ADR and an _HID method which is not allowed according
>>>> to the spec.
>>
>> it's not an uncommon issue for audio codecs, here's three examples:
>>
>>              Device (RTK1)
>>              {
>>                  Name (_ADR, Zero)  // _ADR: Address
>>                  Name (_HID, "10EC5670")  // _HID: Hardware ID
>>                  Name (_CID, "10EC5670")  // _CID: Compatible ID
>>                  Name (_DDN, "ALC5642")  // _DDN: DOS Device Name
>>
>>          Device (MAXM)
>>          {
>>              Name (_ADR, Zero)  // _ADR: Address
>>              Name (_HID, "193C9890")  // _HID: Hardware ID
>>              Name (_CID, "193C9890")  // _CID: Compatible ID
>>              Name (_DDN, "Maxim 98090 Codec  ")  // _DDN: DOS Device Name
>>
>>          Device (TISW)
>>          {
>>              Name (_ADR, Zero)  // _ADR: Address
>>              Name (_HID, "104C227E")  // _HID: Hardware ID
>>              Name (_CID, "104C227E")  // _CID: Compatible ID
>>
>> It's been that way for years...
>>
>>>> Can you try a clean 5.11 kernel (so none of the previous
>>>> debug patches) with the following change added:
>>>>
>>>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>>>> index 1f27f74cc83c..93954ac3bfcc 100644
>>>> --- a/drivers/acpi/scan.c
>>>> +++ b/drivers/acpi/scan.c
>>>> @@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
>>>>           * 2. ACPI nodes describing USB ports.
>>>>           * Still, checking for _HID catches more then just these cases ...
>>>>           */
>>>> -       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
>>>> +       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
>>>> +           acpi_has_method(handle, "_ADR"))
>>>>                  return 0;
>>>>
>>>>          status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
>>>>
>>>>
>>>>> [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
>>>>
>>>> And idem. for this one.
>>>>
>>>> That might very well fix this.
>>
>> Nope, no luck with this patch. boot still stuck.
> 
> OK, thanks!
> 
> Now, there is a theory to test and some more debug work to do.
> 
> First, the kernel should not crash outright if some ACPI device objects are
> missing which evidently happens here.  There may be some problems resulting
> from that, but the crash indicates a code bug in the kernel.
> 
> Apparently, something expects the device objects to be there so badly, that it
> crashes right away when they aren't there.  One of the issues that may cause
> that to happen are mistakes around the acpi_bus_get_device() usage and I found
> two of them, so below is a patch to test.
> 
> Please apply to plain 5.11-rc3 (or -rc4 when it is out) and see if that makes
> any difference.
> 
> ---
>  drivers/acpi/scan.c         |    3 +--
>  drivers/usb/core/usb-acpi.c |    3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2120,8 +2120,7 @@ void acpi_walk_dep_device_list(acpi_hand
>  	mutex_lock(&acpi_dep_list_lock);
>  	list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) {
>  		if (dep->supplier == handle) {
> -			acpi_bus_get_device(dep->consumer, &adev);
> -			if (!adev)
> +			if (acpi_bus_get_device(dep->consumer, &adev))
>  				continue;
>  
>  			adev->dep_unmet--;

Oh, OOOhh, good catch I've been staring at these exact lines multiple times,
my "spidey sense" telling me that the problem was likely something like this:

1. The addition of the acpi_device gets deferred because of the _DEP
list, this means that there are now entries for it on the acpi_dep_list

2. Later during the first pass, or before the handle is checked again
during the second pass, acpi_walk_dep_device_list() gets called because
the _DEP is now resolved.

3. My theory was this would lead to doing driver attach twice or some
such, but that is not possible...

But instead we are following a pointer which points to whatever the memory
used by the:

        struct acpi_device *adev;

local variable on the stack points to; and it seems, at least with
my compiler / kernel .config that the stack layout is such that the
stack memory does contain a valid pointer (from some previous functions
stack frame) and then whatever that points to gets used as an acpi_device
and likely gets mangled a bit. Which explains the memory-corruption like
behavior which I've been seeing.

Specifically I think that this happening when the MMC controller
addition gets deferred to the second step:

[    0.426722] ACPI: \_SB_.PCI0.SDHB: Dependencies found
[    0.427927] ACPI: \_SB_.PCI0.SDHB.BRCM: Dependencies found
[    0.431863] ACPI: \_SB_.PCI0.SDHC: Dependencies found
[    0.433128] ACPI: \_SB_.PCI0.SHC1: Dependencies found


I'll verify that this fixes both my reproducers (5.10 + backport on one device,
5.11-rc3 on another dev) by seeing if I can now boot 10 times in a row
successfully. But I'm pretty hopeful that this will fix them.

I'm also hopeful that this will fix Pierre-Louis' case too.


> Index: linux-pm/drivers/usb/core/usb-acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/usb/core/usb-acpi.c
> +++ linux-pm/drivers/usb/core/usb-acpi.c
> @@ -163,10 +163,9 @@ usb_acpi_get_companion_for_port(struct u
>  	} else {
>  		parent_handle = usb_get_hub_port_acpi_handle(udev->parent,
>  							     udev->portnum);
> -		if (!parent_handle)
> +		if (!parent_handle || acpi_bus_get_device(parent_handle, &adev))
>  			return NULL;
>  
> -		acpi_bus_get_device(parent_handle, &adev);
>  		port1 = port_dev->portnum;
>  	}
>  

I'm pretty sure that we are not actually hitting this one, but we should
definitely still fix it.

Regards,

Hans


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 19:01                       ` Rafael J. Wysocki
  2021-01-15 19:06                         ` Rafael J. Wysocki
  2021-01-15 20:48                         ` Hans de Goede
@ 2021-01-15 21:57                         ` Hans de Goede
       [not found]                           ` <56b732f4-9a24-688e-7cc7-6c2522d173c9@linux.intel.com>
  2021-01-16 12:26                           ` Hans de Goede
  2 siblings, 2 replies; 20+ messages in thread
From: Hans de Goede @ 2021-01-15 21:57 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pierre-Louis Bossart
  Cc: Rafael J. Wysocki, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko, Bossart, Pierre-louis

Hi,

On 1/15/21 8:01 PM, Rafael J. Wysocki wrote:
> On Friday, January 15, 2021 5:41:57 PM CET Pierre-Louis Bossart wrote:
>>
>>>>> [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
>>>>
>>>> Ah, that is enlightening, that is not supposed to happen, that device
>>>> has both an _ADR and an _HID method which is not allowed according
>>>> to the spec.
>>
>> it's not an uncommon issue for audio codecs, here's three examples:
>>
>>              Device (RTK1)
>>              {
>>                  Name (_ADR, Zero)  // _ADR: Address
>>                  Name (_HID, "10EC5670")  // _HID: Hardware ID
>>                  Name (_CID, "10EC5670")  // _CID: Compatible ID
>>                  Name (_DDN, "ALC5642")  // _DDN: DOS Device Name
>>
>>          Device (MAXM)
>>          {
>>              Name (_ADR, Zero)  // _ADR: Address
>>              Name (_HID, "193C9890")  // _HID: Hardware ID
>>              Name (_CID, "193C9890")  // _CID: Compatible ID
>>              Name (_DDN, "Maxim 98090 Codec  ")  // _DDN: DOS Device Name
>>
>>          Device (TISW)
>>          {
>>              Name (_ADR, Zero)  // _ADR: Address
>>              Name (_HID, "104C227E")  // _HID: Hardware ID
>>              Name (_CID, "104C227E")  // _CID: Compatible ID
>>
>> It's been that way for years...
>>
>>>> Can you try a clean 5.11 kernel (so none of the previous
>>>> debug patches) with the following change added:
>>>>
>>>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>>>> index 1f27f74cc83c..93954ac3bfcc 100644
>>>> --- a/drivers/acpi/scan.c
>>>> +++ b/drivers/acpi/scan.c
>>>> @@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
>>>>           * 2. ACPI nodes describing USB ports.
>>>>           * Still, checking for _HID catches more then just these cases ...
>>>>           */
>>>> -       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
>>>> +       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
>>>> +           acpi_has_method(handle, "_ADR"))
>>>>                  return 0;
>>>>
>>>>          status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
>>>>
>>>>
>>>>> [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
>>>>
>>>> And idem. for this one.
>>>>
>>>> That might very well fix this.
>>
>> Nope, no luck with this patch. boot still stuck.
> 
> OK, thanks!
> 
> Now, there is a theory to test and some more debug work to do.
> 
> First, the kernel should not crash outright if some ACPI device objects are
> missing which evidently happens here.  There may be some problems resulting
> from that, but the crash indicates a code bug in the kernel.
> 
> Apparently, something expects the device objects to be there so badly, that it
> crashes right away when they aren't there.  One of the issues that may cause
> that to happen are mistakes around the acpi_bus_get_device() usage and I found
> two of them, so below is a patch to test.
> 
> Please apply to plain 5.11-rc3 (or -rc4 when it is out) and see if that makes
> any difference.
> 
> ---
>  drivers/acpi/scan.c         |    3 +--
>  drivers/usb/core/usb-acpi.c |    3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/acpi/scan.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2120,8 +2120,7 @@ void acpi_walk_dep_device_list(acpi_hand
>  	mutex_lock(&acpi_dep_list_lock);
>  	list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) {
>  		if (dep->supplier == handle) {
> -			acpi_bus_get_device(dep->consumer, &adev);
> -			if (!adev)
> +			if (acpi_bus_get_device(dep->consumer, &adev))
>  				continue;
>  
>  			adev->dep_unmet--;
> Index: linux-pm/drivers/usb/core/usb-acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/usb/core/usb-acpi.c
> +++ linux-pm/drivers/usb/core/usb-acpi.c
> @@ -163,10 +163,9 @@ usb_acpi_get_companion_for_port(struct u
>  	} else {
>  		parent_handle = usb_get_hub_port_acpi_handle(udev->parent,
>  							     udev->portnum);
> -		if (!parent_handle)
> +		if (!parent_handle || acpi_bus_get_device(parent_handle, &adev))
>  			return NULL;
>  
> -		acpi_bus_get_device(parent_handle, &adev);
>  		port1 = port_dev->portnum;
>  	}
>  

I can confirm that these changes fix the intermittent boot issue I had with
5.11-rc3 on the Minix Neo z83-4. It is getting a bit late here, so I will
test my second (also intermittent) reproducer tomorrow.

Regards,

Hans



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
       [not found]                           ` <56b732f4-9a24-688e-7cc7-6c2522d173c9@linux.intel.com>
@ 2021-01-16 11:18                             ` Hans de Goede
  0 siblings, 0 replies; 20+ messages in thread
From: Hans de Goede @ 2021-01-16 11:18 UTC (permalink / raw)
  To: Pierre-Louis Bossart, Rafael J. Wysocki, Pierre-Louis Bossart
  Cc: Rafael J. Wysocki, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko

Hi,

On 1/15/21 11:43 PM, Pierre-Louis Bossart wrote:
>>> OK, thanks!
>>>
>>> Now, there is a theory to test and some more debug work to do.
>>>
>>> First, the kernel should not crash outright if some ACPI device objects are
>>> missing which evidently happens here.  There may be some problems resulting
>>> from that, but the crash indicates a code bug in the kernel.
>>>
>>> Apparently, something expects the device objects to be there so badly, that it
>>> crashes right away when they aren't there.  One of the issues that may cause
>>> that to happen are mistakes around the acpi_bus_get_device() usage and I found
>>> two of them, so below is a patch to test.
>>>
>>> Please apply to plain 5.11-rc3 (or -rc4 when it is out) and see if that makes
>>> any difference.
>>>
>>> ---
>>>  drivers/acpi/scan.c         |    3 +--
>>>  drivers/usb/core/usb-acpi.c |    3 +--
>>>  2 files changed, 2 insertions(+), 4 deletions(-)
>>>
>>> Index: linux-pm/drivers/acpi/scan.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/acpi/scan.c
>>> +++ linux-pm/drivers/acpi/scan.c
>>> @@ -2120,8 +2120,7 @@ void acpi_walk_dep_device_list(acpi_hand
>>>  	mutex_lock(&acpi_dep_list_lock);
>>>  	list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) {
>>>  		if (dep->supplier == handle) {
>>> -			acpi_bus_get_device(dep->consumer, &adev);
>>> -			if (!adev)
>>> +			if (acpi_bus_get_device(dep->consumer, &adev))
>>>  				continue;
>>>  
>>>  			adev->dep_unmet--;
>>> Index: linux-pm/drivers/usb/core/usb-acpi.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/usb/core/usb-acpi.c
>>> +++ linux-pm/drivers/usb/core/usb-acpi.c
>>> @@ -163,10 +163,9 @@ usb_acpi_get_companion_for_port(struct u
>>>  	} else {
>>>  		parent_handle = usb_get_hub_port_acpi_handle(udev->parent,
>>>  							     udev->portnum);
>>> -		if (!parent_handle)
>>> +		if (!parent_handle || acpi_bus_get_device(parent_handle, &adev))
>>>  			return NULL;
>>>  
>>> -		acpi_bus_get_device(parent_handle, &adev);
>>>  		port1 = port_dev->portnum;
>>>  	}
>>>  
>> I can confirm that these changes fix the intermittent boot issue I had with
>> 5.11-rc3 on the Minix Neo z83-4. It is getting a bit late here, so I will
>> test my second (also intermittent) reproducer tomorrow.
> 
> Success on my side as well. I can boot and here is the updated list of dependencies (shorter than for the last test)

It is shorter because of this part of the debugging patch which Rafael send:

--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
 	 * 2. ACPI nodes describing USB ports.
 	 * Still, checking for _HID catches more then just these cases ...
 	 */
-	if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
+	if (!acpi_has_method(handle, "_DEP") || acpi_has_method(handle, "_ADR")
+	    || !acpi_has_method(handle, "_HID"))
 		return 0;
 
 	status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);


So that is expected.

> I attached the diff I tested to make sure I didn't miss anything.

Erm, it looks like you also applied the debug patch from Rafael's last email, that
was only intended to be applied in case things still did not work I believe.

This bit from the debug-patch:

@@ -1937,10 +1938,8 @@ static acpi_status acpi_bus_check_add(acpi_handle handle, bool check_dep,
 	if (type == ACPI_BUS_TYPE_DEVICE && check_dep) {
 		u32 count = acpi_scan_check_dep(handle);
 		/* Bail out if the number of recorded dependencies is not 0. */
-		if (count > 0) {
-			acpi_bus_scan_second_pass = true;
-			return AE_CTRL_DEPTH;
-		}
+		if (count > 0)
+			acpi_handle_info(handle, "Dependencies found\n");
 	}
 
 	acpi_add_single_object(&device, handle, type, sta);

Will cause all devices to be added during the first scan pass, like my
initial hack/test patch. Can you drop the debug patch and test with just the 2
changes from Rafael's previous mail please ?

Regards,

Hans




> 
> Thanks
> 
> -Pierre
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3
  2021-01-15 21:57                         ` Hans de Goede
       [not found]                           ` <56b732f4-9a24-688e-7cc7-6c2522d173c9@linux.intel.com>
@ 2021-01-16 12:26                           ` Hans de Goede
  1 sibling, 0 replies; 20+ messages in thread
From: Hans de Goede @ 2021-01-16 12:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pierre-Louis Bossart
  Cc: Rafael J. Wysocki, ACPI Devel Mailing List, Mika Westerberg,
	Rafael J. Wysocki, Andy Shevchenko, Bossart, Pierre-louis

Hi,

On 1/15/21 10:57 PM, Hans de Goede wrote:
> Hi,
> 
> On 1/15/21 8:01 PM, Rafael J. Wysocki wrote:
>> On Friday, January 15, 2021 5:41:57 PM CET Pierre-Louis Bossart wrote:
>>>
>>>>>> [    0.516336] ACPI: \_SB_.PCI0.BRCM: Dependencies found
>>>>>
>>>>> Ah, that is enlightening, that is not supposed to happen, that device
>>>>> has both an _ADR and an _HID method which is not allowed according
>>>>> to the spec.
>>>
>>> it's not an uncommon issue for audio codecs, here's three examples:
>>>
>>>              Device (RTK1)
>>>              {
>>>                  Name (_ADR, Zero)  // _ADR: Address
>>>                  Name (_HID, "10EC5670")  // _HID: Hardware ID
>>>                  Name (_CID, "10EC5670")  // _CID: Compatible ID
>>>                  Name (_DDN, "ALC5642")  // _DDN: DOS Device Name
>>>
>>>          Device (MAXM)
>>>          {
>>>              Name (_ADR, Zero)  // _ADR: Address
>>>              Name (_HID, "193C9890")  // _HID: Hardware ID
>>>              Name (_CID, "193C9890")  // _CID: Compatible ID
>>>              Name (_DDN, "Maxim 98090 Codec  ")  // _DDN: DOS Device Name
>>>
>>>          Device (TISW)
>>>          {
>>>              Name (_ADR, Zero)  // _ADR: Address
>>>              Name (_HID, "104C227E")  // _HID: Hardware ID
>>>              Name (_CID, "104C227E")  // _CID: Compatible ID
>>>
>>> It's been that way for years...
>>>
>>>>> Can you try a clean 5.11 kernel (so none of the previous
>>>>> debug patches) with the following change added:
>>>>>
>>>>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>>>>> index 1f27f74cc83c..93954ac3bfcc 100644
>>>>> --- a/drivers/acpi/scan.c
>>>>> +++ b/drivers/acpi/scan.c
>>>>> @@ -1854,7 +1854,8 @@ static u32 acpi_scan_check_dep(acpi_handle handle)
>>>>>           * 2. ACPI nodes describing USB ports.
>>>>>           * Still, checking for _HID catches more then just these cases ...
>>>>>           */
>>>>> -       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID"))
>>>>> +       if (!acpi_has_method(handle, "_DEP") || !acpi_has_method(handle, "_HID") ||
>>>>> +           acpi_has_method(handle, "_ADR"))
>>>>>                  return 0;
>>>>>
>>>>>          status = acpi_evaluate_reference(handle, "_DEP", NULL, &dep_devices);
>>>>>
>>>>>
>>>>>> [    0.517490] ACPI: \_SB_.PCI0.LNPW: Dependencies found
>>>>>
>>>>> And idem. for this one.
>>>>>
>>>>> That might very well fix this.
>>>
>>> Nope, no luck with this patch. boot still stuck.
>>
>> OK, thanks!
>>
>> Now, there is a theory to test and some more debug work to do.
>>
>> First, the kernel should not crash outright if some ACPI device objects are
>> missing which evidently happens here.  There may be some problems resulting
>> from that, but the crash indicates a code bug in the kernel.
>>
>> Apparently, something expects the device objects to be there so badly, that it
>> crashes right away when they aren't there.  One of the issues that may cause
>> that to happen are mistakes around the acpi_bus_get_device() usage and I found
>> two of them, so below is a patch to test.
>>
>> Please apply to plain 5.11-rc3 (or -rc4 when it is out) and see if that makes
>> any difference.
>>
>> ---
>>  drivers/acpi/scan.c         |    3 +--
>>  drivers/usb/core/usb-acpi.c |    3 +--
>>  2 files changed, 2 insertions(+), 4 deletions(-)
>>
>> Index: linux-pm/drivers/acpi/scan.c
>> ===================================================================
>> --- linux-pm.orig/drivers/acpi/scan.c
>> +++ linux-pm/drivers/acpi/scan.c
>> @@ -2120,8 +2120,7 @@ void acpi_walk_dep_device_list(acpi_hand
>>  	mutex_lock(&acpi_dep_list_lock);
>>  	list_for_each_entry_safe(dep, tmp, &acpi_dep_list, node) {
>>  		if (dep->supplier == handle) {
>> -			acpi_bus_get_device(dep->consumer, &adev);
>> -			if (!adev)
>> +			if (acpi_bus_get_device(dep->consumer, &adev))
>>  				continue;
>>  
>>  			adev->dep_unmet--;
>> Index: linux-pm/drivers/usb/core/usb-acpi.c
>> ===================================================================
>> --- linux-pm.orig/drivers/usb/core/usb-acpi.c
>> +++ linux-pm/drivers/usb/core/usb-acpi.c
>> @@ -163,10 +163,9 @@ usb_acpi_get_companion_for_port(struct u
>>  	} else {
>>  		parent_handle = usb_get_hub_port_acpi_handle(udev->parent,
>>  							     udev->portnum);
>> -		if (!parent_handle)
>> +		if (!parent_handle || acpi_bus_get_device(parent_handle, &adev))
>>  			return NULL;
>>  
>> -		acpi_bus_get_device(parent_handle, &adev);
>>  		port1 = port_dev->portnum;
>>  	}
>>  
> 
> I can confirm that these changes fix the intermittent boot issue I had with
> 5.11-rc3 on the Minix Neo z83-4. It is getting a bit late here, so I will
> test my second (also intermittent) reproducer tomorrow.

Good news, I can confirm that these 2 changes fix this on my other reproducer too.

FYI: My other reporducer was 5.10.2 with the ACPI scan changes backported for testing
on a Lenovo Yoga Tablet 2 1051L.

Regards,

Hans


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-01-16 12:27 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 21:55 ACPI scan regression -> Boot fail on Cherrytrail w/ 5.11-rc3 Pierre-Louis Bossart
2021-01-14 23:34 ` Hans de Goede
2021-01-15  0:49   ` Pierre-Louis Bossart
2021-01-15  8:54     ` Hans de Goede
2021-01-15 13:38       ` Rafael J. Wysocki
2021-01-15 13:54     ` Rafael J. Wysocki
2021-01-15 14:52       ` Pierre-Louis Bossart
2021-01-15 14:55         ` Rafael J. Wysocki
2021-01-15 15:09           ` Pierre-Louis Bossart
2021-01-15 15:22             ` Rafael J. Wysocki
2021-01-15 15:38               ` Pierre-Louis Bossart
2021-01-15 16:05                 ` Hans de Goede
2021-01-15 16:22                   ` Rafael J. Wysocki
2021-01-15 16:41                     ` Pierre-Louis Bossart
2021-01-15 19:01                       ` Rafael J. Wysocki
2021-01-15 19:06                         ` Rafael J. Wysocki
2021-01-15 20:48                         ` Hans de Goede
2021-01-15 21:57                         ` Hans de Goede
     [not found]                           ` <56b732f4-9a24-688e-7cc7-6c2522d173c9@linux.intel.com>
2021-01-16 11:18                             ` Hans de Goede
2021-01-16 12:26                           ` Hans de Goede

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.