linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
       [not found] <623c13ec.1c69fb81.8cbdb.5a7a@mx.google.com>
@ 2022-03-24 17:52 ` Mark Brown
  2022-03-24 20:34   ` Hans de Goede
                     ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Mark Brown @ 2022-03-24 17:52 UTC (permalink / raw)
  To: Bjorn Helgaas, Hans de Goede, Rafael J . Wysocki, Mika Westerberg
  Cc: kernelci-results, bot, gtucker, linux-pci

[-- Attachment #1: Type: text/plain, Size: 12335 bytes --]

On Wed, Mar 23, 2022 at 11:47:08PM -0700, KernelCI bot wrote:

The KernelCI bisection bot has identified commit 5949965ec9340cfc0e
("x86/PCI: Preserve host bridge windows completely covered by E820")
as causing a boot regression in next on asus-C523NA-A20057-coral (a
Chromebook AIUI).  Unfortunately there's no useful output when starting
the kernel.  I've left the full report below including links to the web
dashboard.

The last successful boot in -next had this log:

   https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html

I'd also note that the machine hp-x360-12b-n4000-octopus appears to have
started failing at the same time with similar symptoms, failing log:

   https://storage.kernelci.org/next/master/next-20220324/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-hp-x360-12b-n4000-octopus.html

and passing log:

   https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-hp-x360-12b-n4000-octopus.html

though we didn't get a bisect for that yet.  That's also a Chromebook.

> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on asus-C523NA-A20057-coral
> 
> Summary:
>   Start:      f8833a2b2356 Add linux-next specific files for 20220322
>   Plain log:  https://storage.kernelci.org/next/master/next-20220322/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.txt
>   HTML log:   https://storage.kernelci.org/next/master/next-20220322/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
>   Result:     5949965ec934 x86/PCI: Preserve host bridge windows completely covered by E820
> 
> Checks:
>   revert:     PASS
>   verify:     PASS
> 
> Parameters:
>   Tree:       next
>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch:     master
>   Target:     asus-C523NA-A20057-coral
>   CPU arch:   x86_64
>   Lab:        lab-collabora
>   Compiler:   gcc-10
>   Config:     x86_64_defconfig+x86-chromebook
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> -------------------------------------------------------------------------------
> commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Thu Mar 3 18:03:30 2022 -0600
> 
>     x86/PCI: Preserve host bridge windows completely covered by E820
>     
>     Many folks have reported PCI devices not working.  It could affect any
>     device, but most reports are for Thunderbolt controllers on Lenovo Yoga and
>     Clevo Barebone laptops and the touchpad on Lenovo IdeaPads.
>     
>     In every report, a region in the E820 table entirely encloses a PCI host
>     bridge window from _CRS, and because of 4dc2287c1805 ("x86: avoid E820
>     regions when allocating address space"), we ignore the entire window,
>     preventing us from assigning space to PCI devices.
>     
>     For example, the dmesg log [2] from bug report [1] shows:
>     
>       BIOS-e820: [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>       pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>       pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit]
>     
>     The efi=debug dmesg log [3] from the same report shows the EFI memory map
>     entries that created the E820 map:
>     
>       efi: mem47: [Reserved |   |WB|WT|WC|UC] range=[0x4bc50000-0x5fffffff]
>       efi: mem48: [Reserved |   |WB|  |  |UC] range=[0x60000000-0x60ffffff]
>       efi: mem49: [Reserved |   |  |  |  |  ] range=[0x61000000-0x653fffff]
>       efi: mem50: [MMIO     |RUN|  |  |  |UC] range=[0x65400000-0xcfffffff]
>     
>     4dc2287c1805 ("x86: avoid E820 regions when allocating address space")
>     works around issues where _CRS contains non-window address space that can't
>     be used for PCI devices.  It does this by removing E820 regions from host
>     bridge windows.  But in these reports, the E820 region covers the entire
>     window, so 4dc2287c1805 makes it completely unusable.
>     
>     Per UEFI v2.8, sec 7.2, the EfiMemoryMappedIO type means:
>     
>       Used by system firmware to request that a memory-mapped IO region be
>       mapped by the OS to a virtual address so it can be accessed by EFI
>       runtime services.
>     
>     A host bridge window is definitely a memory-mapped IO region, and EFI
>     runtime services may need to access it, so I don't think we can argue that
>     this is a firmware defect.
>     
>     Instead, change the 4dc2287c1805 strategy so it only removes E820 regions
>     when they overlap *part* of a host bridge window on the assumption that a
>     partial overlap is really register space, not part of the window proper.
>     
>     If an E820 region covers the entire window from _CRS, assume the _CRS
>     window is correct and do nothing.
>     
>     [1] https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>     [2] https://bugzilla.redhat.com/attachment.cgi?id=1711424
>     [3] https://bugzilla.redhat.com/attachment.cgi?id=1861407
>     
>     BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
>     BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=214259
>     BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>     BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
>     BugLink: https://bugs.launchpad.net/bugs/1878279
>     BugLink: https://bugs.launchpad.net/bugs/1931715
>     BugLink: https://bugs.launchpad.net/bugs/1932069
>     BugLink: https://bugs.launchpad.net/bugs/1921649
>     Fixes: 4dc2287c1805 ("x86: avoid E820 regions when allocating address space")
>     Link: https://lore.kernel.org/r/20220228105259.230903-1-hdegoede@redhat.com
>     Based-on-patch-by: Hans de Goede <hdegoede@redhat.com>
>     Link: https://lore.kernel.org/r/20220304035110.988712-4-helgaas@kernel.org
>     Reported-by: Benoit Grégoire <benoitg@coeus.ca>   # BZ 206459
>     Reported-by: wse@tuxedocomputers.com              # BZ 214259
>     Tested-by: Matt Hansen <2lprbe78@duck.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>     Reviewed-by: Hans de Goede <hdegoede@redhat.com>
>     Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>     Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> index 7378ea146976..90203217c359 100644
> --- a/arch/x86/kernel/resource.c
> +++ b/arch/x86/kernel/resource.c
> @@ -39,6 +39,21 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>  		e820_start = entry->addr;
>  		e820_end = entry->addr + entry->size - 1;
>  
> +		/*
> +		 * If an E820 entry covers just part of the resource, we
> +		 * assume E820 is telling us about something like host
> +		 * bridge register space that is unavailable for PCI
> +		 * devices.  But if it covers the *entire* resource, it's
> +		 * more likely just telling us that this is MMIO space, and
> +		 * that doesn't need to be removed.
> +		 */
> +		if (e820_start <= avail->start && avail->end <= e820_end) {
> +			dev_info(dev, "resource %pR fully covered by e820 entry [mem %#010Lx-%#010Lx]\n",
> +				 avail, e820_start, e820_end);
> +
> +			continue;
> +		}
> +
>  		resource_clip(avail, e820_start, e820_end);
>  		if (orig.start != avail->start || orig.end != avail->end) {
>  			dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
> -------------------------------------------------------------------------------
> 
> 
> Git bisection log:
> 
> -------------------------------------------------------------------------------
> git bisect start
> # good: [5628b8de1228436d47491c662dc521bc138a3d43] Merge tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
> git bisect good 5628b8de1228436d47491c662dc521bc138a3d43
> # bad: [f8833a2b23562be2dae91775127c8014c44d8566] Add linux-next specific files for 20220322
> git bisect bad f8833a2b23562be2dae91775127c8014c44d8566
> # bad: [d2de72259f3d22054272217eac92e624835bfc3b] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> git bisect bad d2de72259f3d22054272217eac92e624835bfc3b
> # bad: [5920db3e4b50218dcf2101f3d87c3b69a1120981] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git
> git bisect bad 5920db3e4b50218dcf2101f3d87c3b69a1120981
> # bad: [b579dc07dce4637b7f2a3fb84394ebbd6666a81f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
> git bisect bad b579dc07dce4637b7f2a3fb84394ebbd6666a81f
> # bad: [b5475dd9fab03e6867abad00ecb98e0d3827ad31] Merge branch 'for-next' of git://git.armlinux.org.uk/~rmk/linux-arm.git
> git bisect bad b5475dd9fab03e6867abad00ecb98e0d3827ad31
> # good: [7b72f3bb0907319e15765ae9dcf1f15fdd112bcf] Merge remote-tracking branch 'asoc/for-5.17' into asoc-linus
> git bisect good 7b72f3bb0907319e15765ae9dcf1f15fdd112bcf
> # bad: [077dc6bc0658177057bfd69ef3a990e6d8d32146] Merge branch 'gpio/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
> git bisect bad 077dc6bc0658177057bfd69ef3a990e6d8d32146
> # good: [fd11727eec0dd95ee1b7d8f9f10ee60678eecc29] crypto: hisilicon/qm - fix memset during queues clearing
> git bisect good fd11727eec0dd95ee1b7d8f9f10ee60678eecc29
> # good: [646b907e1559f006c79a752ee3eebe220ceb983d] Merge tag 'asoc-v5.18' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
> git bisect good 646b907e1559f006c79a752ee3eebe220ceb983d
> # bad: [f8ed0b7c999405bd12ab9ebb0765e2baa7eb6184] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
> git bisect bad f8ed0b7c999405bd12ab9ebb0765e2baa7eb6184
> # good: [9fd75b66b8f68498454d685dc4ba13192ae069b0] ax25: Fix refcount leaks caused by ax25_cb_del()
> git bisect good 9fd75b66b8f68498454d685dc4ba13192ae069b0
> # good: [3fd177beee75eb2d7e5b19992e8c90eb1a141432] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
> git bisect good 3fd177beee75eb2d7e5b19992e8c90eb1a141432
> # good: [09005bef55291a99b491a47ce676dfb4f40f8edd] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
> git bisect good 09005bef55291a99b491a47ce676dfb4f40f8edd
> # good: [d13f73e9108a75209d03217d60462f51092499fe] x86/PCI: Log host bridge window clipping for E820 regions
> git bisect good d13f73e9108a75209d03217d60462f51092499fe
> # bad: [5949965ec9340cfc0e65f7d8a576b660b26e2535] x86/PCI: Preserve host bridge windows completely covered by E820
> git bisect bad 5949965ec9340cfc0e65f7d8a576b660b26e2535
> # first bad commit: [5949965ec9340cfc0e65f7d8a576b660b26e2535] x86/PCI: Preserve host bridge windows completely covered by E820
> -------------------------------------------------------------------------------
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> View/Reply Online (#25006): https://groups.io/g/kernelci-results/message/25006
> Mute This Topic: https://groups.io/mt/89994186/1131744
> Group Owner: kernelci-results+owner@groups.io
> Unsubscribe: https://groups.io/g/kernelci-results/unsub [broonie@kernel.org]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-24 17:52 ` next/master bisection: baseline.login on asus-C523NA-A20057-coral Mark Brown
@ 2022-03-24 20:34   ` Hans de Goede
  2022-03-24 22:19     ` Mark Brown
  2022-03-24 23:08     ` Bjorn Helgaas
  2022-03-29 22:14   ` Bjorn Helgaas
  2022-04-05 23:53   ` Bjorn Helgaas
  2 siblings, 2 replies; 22+ messages in thread
From: Hans de Goede @ 2022-03-24 20:34 UTC (permalink / raw)
  To: Mark Brown, Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg
  Cc: kernelci-results, bot, gtucker, linux-pci

Hi Mark,

Thank you for the report.

On 3/24/22 18:52, Mark Brown wrote:
> On Wed, Mar 23, 2022 at 11:47:08PM -0700, KernelCI bot wrote:
> 
> The KernelCI bisection bot has identified commit 5949965ec9340cfc0e
> ("x86/PCI: Preserve host bridge windows completely covered by E820")
> as causing a boot regression in next on asus-C523NA-A20057-coral (a
> Chromebook AIUI).  Unfortunately there's no useful output when starting
> the kernel.  I've left the full report below including links to the web
> dashboard.
> 
> The last successful boot in -next had this log:
> 
>    https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html

So the interesting bits from this log are:

 1839 17:54:41.406548  <6>[    0.000000] BIOS-provided physical RAM map:
 1840 17:54:41.413121  <6>[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] type 16
 1841 17:54:41.419712  <6>[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
 1842 17:54:41.430192  <6>[    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
 1843 17:54:41.436207  <6>[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000fffffff] usable
 1844 17:54:41.446353  <6>[    0.000000] BIOS-e820: [mem 0x0000000010000000-0x0000000012150fff] reserved
 1845 17:54:41.453290  <6>[    0.000000] BIOS-e820: [mem 0x0000000012151000-0x000000007a9fcfff] usable
 1846 17:54:41.459966  <6>[    0.000000] BIOS-e820: [mem 0x000000007a9fd000-0x000000007affffff] type 16
 1847 17:54:41.469549  <6>[    0.000000] BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
 1848 17:54:41.476685  <6>[    0.000000] BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
 1849 17:54:41.486439  <6>[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
 1850 17:54:41.492994  <6>[    0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed17fff] reserved
 1851 17:54:41.503008  <6>[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
...
 2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
 2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]

Since the main [mem 0x7b800000-0xe0000000 window] is not fully covered by a single e820 entry, for that
resource there should be no change.

But the ISA MMIO window: [mem 0x000a0000-0x000bffff window] is fully covered by:

BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved

So that will now become available as memory to assign some resources to, where before it was
not.

So I guess we should try adding a patch to skip the "fully covered" tests for ISA MMIO space
and see if that helps ?

Bjorn do you agree?

Mark, if one of use writes a test patch, can you get that Asus machine to boot a
kernel build from next + the test patch ?

Regards,

Hans





> 
> I'd also note that the machine hp-x360-12b-n4000-octopus appears to have
> started failing at the same time with similar symptoms, failing log:
> 
>    https://storage.kernelci.org/next/master/next-20220324/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-hp-x360-12b-n4000-octopus.html
> 
> and passing log:
> 
>    https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-hp-x360-12b-n4000-octopus.html
> 
> though we didn't get a bisect for that yet.  That's also a Chromebook.
> 
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>> * This automated bisection report was sent to you on the basis  *
>> * that you may be involved with the breaking commit it has      *
>> * found.  No manual investigation has been done to verify it,   *
>> * and the root cause of the problem may be somewhere else.      *
>> *                                                               *
>> * If you do send a fix, please include this trailer:            *
>> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
>> *                                                               *
>> * Hope this helps!                                              *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> next/master bisection: baseline.login on asus-C523NA-A20057-coral
>>
>> Summary:
>>   Start:      f8833a2b2356 Add linux-next specific files for 20220322
>>   Plain log:  https://storage.kernelci.org/next/master/next-20220322/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.txt
>>   HTML log:   https://storage.kernelci.org/next/master/next-20220322/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
>>   Result:     5949965ec934 x86/PCI: Preserve host bridge windows completely covered by E820
>>
>> Checks:
>>   revert:     PASS
>>   verify:     PASS
>>
>> Parameters:
>>   Tree:       next
>>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>>   Branch:     master
>>   Target:     asus-C523NA-A20057-coral
>>   CPU arch:   x86_64
>>   Lab:        lab-collabora
>>   Compiler:   gcc-10
>>   Config:     x86_64_defconfig+x86-chromebook
>>   Test case:  baseline.login
>>
>> Breaking commit found:
>>
>> -------------------------------------------------------------------------------
>> commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
>> Author: Bjorn Helgaas <bhelgaas@google.com>
>> Date:   Thu Mar 3 18:03:30 2022 -0600
>>
>>     x86/PCI: Preserve host bridge windows completely covered by E820
>>     
>>     Many folks have reported PCI devices not working.  It could affect any
>>     device, but most reports are for Thunderbolt controllers on Lenovo Yoga and
>>     Clevo Barebone laptops and the touchpad on Lenovo IdeaPads.
>>     
>>     In every report, a region in the E820 table entirely encloses a PCI host
>>     bridge window from _CRS, and because of 4dc2287c1805 ("x86: avoid E820
>>     regions when allocating address space"), we ignore the entire window,
>>     preventing us from assigning space to PCI devices.
>>     
>>     For example, the dmesg log [2] from bug report [1] shows:
>>     
>>       BIOS-e820: [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>>       pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>>       pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit]
>>     
>>     The efi=debug dmesg log [3] from the same report shows the EFI memory map
>>     entries that created the E820 map:
>>     
>>       efi: mem47: [Reserved |   |WB|WT|WC|UC] range=[0x4bc50000-0x5fffffff]
>>       efi: mem48: [Reserved |   |WB|  |  |UC] range=[0x60000000-0x60ffffff]
>>       efi: mem49: [Reserved |   |  |  |  |  ] range=[0x61000000-0x653fffff]
>>       efi: mem50: [MMIO     |RUN|  |  |  |UC] range=[0x65400000-0xcfffffff]
>>     
>>     4dc2287c1805 ("x86: avoid E820 regions when allocating address space")
>>     works around issues where _CRS contains non-window address space that can't
>>     be used for PCI devices.  It does this by removing E820 regions from host
>>     bridge windows.  But in these reports, the E820 region covers the entire
>>     window, so 4dc2287c1805 makes it completely unusable.
>>     
>>     Per UEFI v2.8, sec 7.2, the EfiMemoryMappedIO type means:
>>     
>>       Used by system firmware to request that a memory-mapped IO region be
>>       mapped by the OS to a virtual address so it can be accessed by EFI
>>       runtime services.
>>     
>>     A host bridge window is definitely a memory-mapped IO region, and EFI
>>     runtime services may need to access it, so I don't think we can argue that
>>     this is a firmware defect.
>>     
>>     Instead, change the 4dc2287c1805 strategy so it only removes E820 regions
>>     when they overlap *part* of a host bridge window on the assumption that a
>>     partial overlap is really register space, not part of the window proper.
>>     
>>     If an E820 region covers the entire window from _CRS, assume the _CRS
>>     window is correct and do nothing.
>>     
>>     [1] https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>>     [2] https://bugzilla.redhat.com/attachment.cgi?id=1711424
>>     [3] https://bugzilla.redhat.com/attachment.cgi?id=1861407
>>     
>>     BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
>>     BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=214259
>>     BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>>     BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
>>     BugLink: https://bugs.launchpad.net/bugs/1878279
>>     BugLink: https://bugs.launchpad.net/bugs/1931715
>>     BugLink: https://bugs.launchpad.net/bugs/1932069
>>     BugLink: https://bugs.launchpad.net/bugs/1921649
>>     Fixes: 4dc2287c1805 ("x86: avoid E820 regions when allocating address space")
>>     Link: https://lore.kernel.org/r/20220228105259.230903-1-hdegoede@redhat.com
>>     Based-on-patch-by: Hans de Goede <hdegoede@redhat.com>
>>     Link: https://lore.kernel.org/r/20220304035110.988712-4-helgaas@kernel.org
>>     Reported-by: Benoit Grégoire <benoitg@coeus.ca>   # BZ 206459
>>     Reported-by: wse@tuxedocomputers.com              # BZ 214259
>>     Tested-by: Matt Hansen <2lprbe78@duck.com>
>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>     Reviewed-by: Hans de Goede <hdegoede@redhat.com>
>>     Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>>     Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>> index 7378ea146976..90203217c359 100644
>> --- a/arch/x86/kernel/resource.c
>> +++ b/arch/x86/kernel/resource.c
>> @@ -39,6 +39,21 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>>  		e820_start = entry->addr;
>>  		e820_end = entry->addr + entry->size - 1;
>>  
>> +		/*
>> +		 * If an E820 entry covers just part of the resource, we
>> +		 * assume E820 is telling us about something like host
>> +		 * bridge register space that is unavailable for PCI
>> +		 * devices.  But if it covers the *entire* resource, it's
>> +		 * more likely just telling us that this is MMIO space, and
>> +		 * that doesn't need to be removed.
>> +		 */
>> +		if (e820_start <= avail->start && avail->end <= e820_end) {
>> +			dev_info(dev, "resource %pR fully covered by e820 entry [mem %#010Lx-%#010Lx]\n",
>> +				 avail, e820_start, e820_end);
>> +
>> +			continue;
>> +		}
>> +
>>  		resource_clip(avail, e820_start, e820_end);
>>  		if (orig.start != avail->start || orig.end != avail->end) {
>>  			dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
>> -------------------------------------------------------------------------------
>>
>>
>> Git bisection log:
>>
>> -------------------------------------------------------------------------------
>> git bisect start
>> # good: [5628b8de1228436d47491c662dc521bc138a3d43] Merge tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
>> git bisect good 5628b8de1228436d47491c662dc521bc138a3d43
>> # bad: [f8833a2b23562be2dae91775127c8014c44d8566] Add linux-next specific files for 20220322
>> git bisect bad f8833a2b23562be2dae91775127c8014c44d8566
>> # bad: [d2de72259f3d22054272217eac92e624835bfc3b] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
>> git bisect bad d2de72259f3d22054272217eac92e624835bfc3b
>> # bad: [5920db3e4b50218dcf2101f3d87c3b69a1120981] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git
>> git bisect bad 5920db3e4b50218dcf2101f3d87c3b69a1120981
>> # bad: [b579dc07dce4637b7f2a3fb84394ebbd6666a81f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
>> git bisect bad b579dc07dce4637b7f2a3fb84394ebbd6666a81f
>> # bad: [b5475dd9fab03e6867abad00ecb98e0d3827ad31] Merge branch 'for-next' of git://git.armlinux.org.uk/~rmk/linux-arm.git
>> git bisect bad b5475dd9fab03e6867abad00ecb98e0d3827ad31
>> # good: [7b72f3bb0907319e15765ae9dcf1f15fdd112bcf] Merge remote-tracking branch 'asoc/for-5.17' into asoc-linus
>> git bisect good 7b72f3bb0907319e15765ae9dcf1f15fdd112bcf
>> # bad: [077dc6bc0658177057bfd69ef3a990e6d8d32146] Merge branch 'gpio/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
>> git bisect bad 077dc6bc0658177057bfd69ef3a990e6d8d32146
>> # good: [fd11727eec0dd95ee1b7d8f9f10ee60678eecc29] crypto: hisilicon/qm - fix memset during queues clearing
>> git bisect good fd11727eec0dd95ee1b7d8f9f10ee60678eecc29
>> # good: [646b907e1559f006c79a752ee3eebe220ceb983d] Merge tag 'asoc-v5.18' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
>> git bisect good 646b907e1559f006c79a752ee3eebe220ceb983d
>> # bad: [f8ed0b7c999405bd12ab9ebb0765e2baa7eb6184] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
>> git bisect bad f8ed0b7c999405bd12ab9ebb0765e2baa7eb6184
>> # good: [9fd75b66b8f68498454d685dc4ba13192ae069b0] ax25: Fix refcount leaks caused by ax25_cb_del()
>> git bisect good 9fd75b66b8f68498454d685dc4ba13192ae069b0
>> # good: [3fd177beee75eb2d7e5b19992e8c90eb1a141432] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
>> git bisect good 3fd177beee75eb2d7e5b19992e8c90eb1a141432
>> # good: [09005bef55291a99b491a47ce676dfb4f40f8edd] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
>> git bisect good 09005bef55291a99b491a47ce676dfb4f40f8edd
>> # good: [d13f73e9108a75209d03217d60462f51092499fe] x86/PCI: Log host bridge window clipping for E820 regions
>> git bisect good d13f73e9108a75209d03217d60462f51092499fe
>> # bad: [5949965ec9340cfc0e65f7d8a576b660b26e2535] x86/PCI: Preserve host bridge windows completely covered by E820
>> git bisect bad 5949965ec9340cfc0e65f7d8a576b660b26e2535
>> # first bad commit: [5949965ec9340cfc0e65f7d8a576b660b26e2535] x86/PCI: Preserve host bridge windows completely covered by E820
>> -------------------------------------------------------------------------------
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Groups.io Links: You receive all messages sent to this group.
>> View/Reply Online (#25006): https://groups.io/g/kernelci-results/message/25006
>> Mute This Topic: https://groups.io/mt/89994186/1131744
>> Group Owner: kernelci-results+owner@groups.io
>> Unsubscribe: https://groups.io/g/kernelci-results/unsub [broonie@kernel.org]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-24 20:34   ` Hans de Goede
@ 2022-03-24 22:19     ` Mark Brown
  2022-03-28 12:54       ` Hans de Goede
  2022-03-24 23:08     ` Bjorn Helgaas
  1 sibling, 1 reply; 22+ messages in thread
From: Mark Brown @ 2022-03-24 22:19 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:

> Mark, if one of use writes a test patch, can you get that Asus machine to boot a
> kernel build from next + the test patch ?

I can't directly unfortunately as the board is in Collabora's lab but
Guillaume (who's already CCed) ought to be able to, and I can generally
prod and try to get that done.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-24 20:34   ` Hans de Goede
  2022-03-24 22:19     ` Mark Brown
@ 2022-03-24 23:08     ` Bjorn Helgaas
  1 sibling, 0 replies; 22+ messages in thread
From: Bjorn Helgaas @ 2022-03-24 23:08 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Mark Brown, Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:
> Hi Mark,
> 
> Thank you for the report.
> 
> On 3/24/22 18:52, Mark Brown wrote:
> > On Wed, Mar 23, 2022 at 11:47:08PM -0700, KernelCI bot wrote:
> > 
> > The KernelCI bisection bot has identified commit 5949965ec9340cfc0e
> > ("x86/PCI: Preserve host bridge windows completely covered by E820")
> > as causing a boot regression in next on asus-C523NA-A20057-coral (a
> > Chromebook AIUI).  Unfortunately there's no useful output when starting
> > the kernel.  I've left the full report below including links to the web
> > dashboard.
> > 
> > The last successful boot in -next had this log:
> > 
> >    https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
> 
> So the interesting bits from this log are:
> 
>  1839 17:54:41.406548  <6>[    0.000000] BIOS-provided physical RAM map:
>  1840 17:54:41.413121  <6>[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] type 16
>  1841 17:54:41.419712  <6>[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
>  1842 17:54:41.430192  <6>[    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
>  1843 17:54:41.436207  <6>[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000fffffff] usable
>  1844 17:54:41.446353  <6>[    0.000000] BIOS-e820: [mem 0x0000000010000000-0x0000000012150fff] reserved
>  1845 17:54:41.453290  <6>[    0.000000] BIOS-e820: [mem 0x0000000012151000-0x000000007a9fcfff] usable
>  1846 17:54:41.459966  <6>[    0.000000] BIOS-e820: [mem 0x000000007a9fd000-0x000000007affffff] type 16
>  1847 17:54:41.469549  <6>[    0.000000] BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>  1848 17:54:41.476685  <6>[    0.000000] BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>  1849 17:54:41.486439  <6>[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>  1850 17:54:41.492994  <6>[    0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed17fff] reserved
>  1851 17:54:41.503008  <6>[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
> ...
>  2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>  2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
> 
> Since the main [mem 0x7b800000-0xe0000000 window] is not fully
> covered by a single e820 entry, for that resource there should be no
> change.
> 
> But the ISA MMIO window: [mem 0x000a0000-0x000bffff window] is fully
> covered by:
> 
> BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
> 
> So that will now become available as memory to assign some resources
> to, where before it was not.
> 
> So I guess we should try adding a patch to skip the "fully covered"
> tests for ISA MMIO space and see if that helps ?
> 
> Bjorn do you agree?

Maybe.  Certainly 5949965ec934 should only make a difference if
there's an E820 entry that completely covers a resource.  In that
"completely covered" case we used to clip and now we don't.

I didn't try to work out all the possible clipping cases.  The
[mem 0x000a0000-0x000bffff window] is certainly one, but there could
be others.  remove_e820_regions() was added by 4dc2287c1805 ("x86:
avoid E820 regions when allocating address space"), and it was not
intended to protect the 0xa0000-0xbffff region, so I expect there
should be another reason why we don't allocate from that area.

I assume that since the bot bisected this, there should be successful
boot logs from the commit preceding 5949965ec9340cfc0e, right?

  # good: [d13f73e9108a75209d03217d60462f51092499fe] x86/PCI: Log host bridge window clipping for E820 regions

Those logs should show all the places we clip, and then we could work
out which place(s) are affected by 5949965ec934.

It's unfortunate that 4dc2287c1805 ("x86: avoid E820 regions
when allocating address space") put remove_e820_regions() in the
generic allocation path, when I think what we really wanted was just
to clip PCI host bridge windows.

I'll be on vacation until Monday, so won't be able to spend much time
until then.

Bjorn

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-24 22:19     ` Mark Brown
@ 2022-03-28 12:54       ` Hans de Goede
  2022-03-29 18:44         ` Guillaume Tucker
  2022-03-30 11:35         ` Bjorn Helgaas
  0 siblings, 2 replies; 22+ messages in thread
From: Hans de Goede @ 2022-03-28 12:54 UTC (permalink / raw)
  To: Mark Brown
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

Hi,

On 3/24/22 23:19, Mark Brown wrote:
> On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:
> 
>> Mark, if one of use writes a test patch, can you get that Asus machine to boot a
>> kernel build from next + the test patch ?
> 
> I can't directly unfortunately as the board is in Collabora's lab but
> Guillaume (who's already CCed) ought to be able to, and I can generally
> prod and try to get that done.

Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
attached patch added on top a try on the asus-C523NA-A20057-coral machine please
and see if that makes it boot again ?

Regards,

Hans

[-- Attachment #2: 0001-x86-PCI-Limit-e820-entry-fully-covers-window-check-t.patch --]
[-- Type: text/x-patch, Size: 1809 bytes --]

From b8080a6d2d889847900e1408f71d0c01c73f5c94 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Mon, 28 Mar 2022 14:47:41 +0200
Subject: [PATCH] x86/PCI: Limit "e820 entry fully covers window" check to non
 ISA MMIO addresses

Commit FIXME ("x86/PCI: Preserve host bridge windows completely
covered by E820") added a check to skip e820 table entries which
fully cover a PCI bride's memory window when clipping PCI bridge
memory windows.

This check also caused ISA MMIO windows to not get clipped when
fully covered, which is causing some coreboot based Chromebooks
to not boot.

Modify the fully covered check to not apply to ISA MMIO windows.

Fixes: FIXME ("x86/PCI: Preserve host bridge windows completely covered by E820")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 arch/x86/kernel/resource.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
index 6be82e16e5f4..d9ec913619c3 100644
--- a/arch/x86/kernel/resource.c
+++ b/arch/x86/kernel/resource.c
@@ -46,8 +46,12 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
 		 * devices.  But if it covers the *entire* resource, it's
 		 * more likely just telling us that this is MMIO space, and
 		 * that doesn't need to be removed.
+		 * Note this *entire* resource covering check is only
+		 * intended for 32 bit memory resources for the 16 bit
+		 * isa window we always apply the e820 entries.
 		 */
-		if (e820_start <= avail->start && avail->end <= e820_end) {
+		if (avail->start >= ISA_END_ADDRESS &&
+		    e820_start <= avail->start && avail->end <= e820_end) {
 			dev_info(dev, "resource %pR fully covered by e820 entry [mem %#010Lx-%#010Lx]\n",
 				 avail, e820_start, e820_end);
 			continue;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-28 12:54       ` Hans de Goede
@ 2022-03-29 18:44         ` Guillaume Tucker
  2022-04-04 19:44           ` Guillaume Tucker
       [not found]           ` <16E2C910B4947F17.5433@groups.io>
  2022-03-30 11:35         ` Bjorn Helgaas
  1 sibling, 2 replies; 22+ messages in thread
From: Guillaume Tucker @ 2022-03-29 18:44 UTC (permalink / raw)
  To: Hans de Goede, Mark Brown
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

On 28/03/2022 13:54, Hans de Goede wrote:
> Hi,
> 
> On 3/24/22 23:19, Mark Brown wrote:
>> On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:
>>
>>> Mark, if one of use writes a test patch, can you get that Asus machine to boot a
>>> kernel build from next + the test patch ?
>>
>> I can't directly unfortunately as the board is in Collabora's lab but
>> Guillaume (who's already CCed) ought to be able to, and I can generally
>> prod and try to get that done.
> 
> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
> and see if that makes it boot again ?

Sorry I've been busy with a conference.  Sure, will put that
through KernelCI tomorrow and let you know the outcome.

Thanks,
Guillaume


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-24 17:52 ` next/master bisection: baseline.login on asus-C523NA-A20057-coral Mark Brown
  2022-03-24 20:34   ` Hans de Goede
@ 2022-03-29 22:14   ` Bjorn Helgaas
  2022-04-05 23:53   ` Bjorn Helgaas
  2 siblings, 0 replies; 22+ messages in thread
From: Bjorn Helgaas @ 2022-03-29 22:14 UTC (permalink / raw)
  To: Mark Brown
  Cc: Bjorn Helgaas, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, kernelci-results, bot, gtucker, linux-pci

On Thu, Mar 24, 2022 at 05:52:19PM +0000, Mark Brown wrote:
> On Wed, Mar 23, 2022 at 11:47:08PM -0700, KernelCI bot wrote:
> 
> The KernelCI bisection bot has identified commit 5949965ec9340cfc0e
> ("x86/PCI: Preserve host bridge windows completely covered by E820")
> as causing a boot regression in next on asus-C523NA-A20057-coral (a
> Chromebook AIUI).  Unfortunately there's no useful output when starting
> the kernel.  I've left the full report below including links to the web
> dashboard.

Details for the archives, since I got these via private email while
traveling (thanks, Guillaume!):

> On 28/03/2022 09:08, Guillaume Tucker wrote:
>> On 27/03/2022 21:38, Bjorn Helgaas wrote:
>>> I dropped other recipients because I'm traveling and can't easily
>>> send plain text email.
>>>
>>> If there are logs of the last good commit from these bisects, could
>>> you add links to the thread?
>>
>> The logs from each bisection step aren't kept in KernelCI but they
>> could be found in the test lab archives directly, I'll take a look.
>>
>> Otherwise, details for this regression can be found here:
>>
>>   https://linux.kernelci.org/test/case/id/6239d0afe9d42800692172dd/
>
> Actually here's all the test jobs for this bisection:
>
>  https://lava.collabora.co.uk/scheduler/device_type/asus-C523NA-A20057-coral?dt_search=lava-bisection-161
>
> The last passing one is from iteration 13:
>
>  https://lava.collabora.co.uk/scheduler/job/5937945
>
> I've attached the full log as a text file since the web UI for
> the lab is sometimes very slow.  Hope this helps.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-28 12:54       ` Hans de Goede
  2022-03-29 18:44         ` Guillaume Tucker
@ 2022-03-30 11:35         ` Bjorn Helgaas
  2022-04-04  8:45           ` Hans de Goede
  1 sibling, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2022-03-30 11:35 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Mark Brown, Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

On Mon, Mar 28, 2022 at 02:54:42PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 3/24/22 23:19, Mark Brown wrote:
> > On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:
> > 
> >> Mark, if one of use writes a test patch, can you get that Asus machine to boot a
> >> kernel build from next + the test patch ?
> > 
> > I can't directly unfortunately as the board is in Collabora's lab but
> > Guillaume (who's already CCed) ought to be able to, and I can generally
> > prod and try to get that done.
> 
> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
> and see if that makes it boot again ?
> 
> Regards,
> 
> Hans

> From b8080a6d2d889847900e1408f71d0c01c73f5c94 Mon Sep 17 00:00:00 2001
> From: Hans de Goede <hdegoede@redhat.com>
> Date: Mon, 28 Mar 2022 14:47:41 +0200
> Subject: [PATCH] x86/PCI: Limit "e820 entry fully covers window" check to non
>  ISA MMIO addresses
> 
> Commit FIXME ("x86/PCI: Preserve host bridge windows completely
> covered by E820") added a check to skip e820 table entries which
> fully cover a PCI bride's memory window when clipping PCI bridge
> memory windows.
> 
> This check also caused ISA MMIO windows to not get clipped when
> fully covered, which is causing some coreboot based Chromebooks
> to not boot.
> 
> Modify the fully covered check to not apply to ISA MMIO windows.

I'd like to include URLs to the kernelci results unless they are
ephemeral.  There's a lot of valuable information in these:

  Asus C523NA-A20057-coral with the last good commit:
  https://lava.collabora.co.uk/scheduler/job/5937945

  https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
  https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-hp-x360-12b-n4000-octopus.html

> Fixes: FIXME ("x86/PCI: Preserve host bridge windows completely covered by E820")
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  arch/x86/kernel/resource.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> index 6be82e16e5f4..d9ec913619c3 100644
> --- a/arch/x86/kernel/resource.c
> +++ b/arch/x86/kernel/resource.c
> @@ -46,8 +46,12 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>  		 * devices.  But if it covers the *entire* resource, it's
>  		 * more likely just telling us that this is MMIO space, and
>  		 * that doesn't need to be removed.
> +		 * Note this *entire* resource covering check is only
> +		 * intended for 32 bit memory resources for the 16 bit
> +		 * isa window we always apply the e820 entries.
>  		 */
> -		if (e820_start <= avail->start && avail->end <= e820_end) {
> +		if (avail->start >= ISA_END_ADDRESS &&

What is the justification for needing to check ISA_END_ADDRESS here?
The commit log basically says "this makes it work", which isn't very
satisfying.

The Asus log of the last good commit shows:

  PCI: 00:0d.0 [8086/5a92] enabled
  constrain_resources: PCI: 00:0d.0 10 base d0000000 limit d0ffffff mem (fixed)
  ...
  BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
  BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
  BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
  ...
  acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
  acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
  acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
  acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
  acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])

It looks like _CRS gave us [mem 0x80000000-0xe0000000 window], which
is one byte too big (it should end at 0xdfffffff).

From the firmware part of the log, it looks like 00:0d.0 is a hidden
device that consumes [mem d0000000-0xd0ffffff].  Linux doesn't
enumerate 00:0d.0, so firmware should have carved that out of the [mem
0x80000000-0xe0000000 window] in _CRS.

We don't have a log with 5949965ec934 ("x86/PCI: Preserve host bridge
windows completely covered by E820") applied, but I think it would
show this:

  acpi PNP0A08:00: resource [mem 0x000a0000-0x000bffff window] fully covered by e820 entry [mem 0x000a0000-0x000fffff]
  acpi PNP0A08:00: resource [mem 0x7b800000-0x7fffffff window] fully covered by e820 entry [mem 0x7b000000-0x7fffffff]

instead of clipping those windows.  But none of the devices we
enumerate appears to be using either of those windows.

We do have this:

  pci 0000:00:18.2: reg 0x10: [mem 0xde000000-0xde000fff 64bit]
  pci 0000:00:18.2: reg 0x18: [mem 0xc2b31000-0xc2b31fff 64bit]
  pci 0000:00:18.2: can't claim BAR 0 [mem 0xde000000-0xde000fff 64bit]: no compatible bridge window
  pci 0000:00:18.2: BAR 0: assigned [mem 0x80000000-0x80000fff 64bit]

Where the original [mem 0xde000000-0xde000fff 64bit] assignment was
perfectly legal, but we clipped [mem 0x80000000-0xe0000000 window] to
[mem 0x80000000-0xcfffffff window] instead of just punching a hole for
the 00:0d.0 carve-out.

Maybe 5949965ec934 puts 00:18.2 BAR 0 somewhere that doesn't work,
or maybe the clipping to [mem 0x00100000-0x000bffff window] or
[mem 0x80000000-0x7fffffff window] doesn't work as expected?
They are supposed to be interpreted as "empty", but certainly
resource_size([0x00100000-0x000bffff]) is != 0.

> +		    e820_start <= avail->start && avail->end <= e820_end) {
>  			dev_info(dev, "resource %pR fully covered by e820 entry [mem %#010Lx-%#010Lx]\n",
>  				 avail, e820_start, e820_end);
>  			continue;
> -- 
> 2.35.1
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-30 11:35         ` Bjorn Helgaas
@ 2022-04-04  8:45           ` Hans de Goede
  2022-04-06  0:19             ` Bjorn Helgaas
  0 siblings, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2022-04-04  8:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mark Brown, Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

Hi,

On 3/30/22 13:35, Bjorn Helgaas wrote:
> On Mon, Mar 28, 2022 at 02:54:42PM +0200, Hans de Goede wrote:
>> Hi,
>>
>> On 3/24/22 23:19, Mark Brown wrote:
>>> On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:
>>>
>>>> Mark, if one of use writes a test patch, can you get that Asus machine to boot a
>>>> kernel build from next + the test patch ?
>>>
>>> I can't directly unfortunately as the board is in Collabora's lab but
>>> Guillaume (who's already CCed) ought to be able to, and I can generally
>>> prod and try to get that done.
>>
>> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
>> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
>> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
>> and see if that makes it boot again ?
>>
>> Regards,
>>
>> Hans
> 
>> From b8080a6d2d889847900e1408f71d0c01c73f5c94 Mon Sep 17 00:00:00 2001
>> From: Hans de Goede <hdegoede@redhat.com>
>> Date: Mon, 28 Mar 2022 14:47:41 +0200
>> Subject: [PATCH] x86/PCI: Limit "e820 entry fully covers window" check to non
>>  ISA MMIO addresses
>>
>> Commit FIXME ("x86/PCI: Preserve host bridge windows completely
>> covered by E820") added a check to skip e820 table entries which
>> fully cover a PCI bride's memory window when clipping PCI bridge
>> memory windows.
>>
>> This check also caused ISA MMIO windows to not get clipped when
>> fully covered, which is causing some coreboot based Chromebooks
>> to not boot.
>>
>> Modify the fully covered check to not apply to ISA MMIO windows.
> 
> I'd like to include URLs to the kernelci results unless they are
> ephemeral.  There's a lot of valuable information in these:
> 
>   Asus C523NA-A20057-coral with the last good commit:
>   https://lava.collabora.co.uk/scheduler/job/5937945
> 
>   https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
>   https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-hp-x360-12b-n4000-octopus.html

Ok, I'll include this in any future patches for this.

> 
>> Fixes: FIXME ("x86/PCI: Preserve host bridge windows completely covered by E820")
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>>  arch/x86/kernel/resource.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>> index 6be82e16e5f4..d9ec913619c3 100644
>> --- a/arch/x86/kernel/resource.c
>> +++ b/arch/x86/kernel/resource.c
>> @@ -46,8 +46,12 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>>  		 * devices.  But if it covers the *entire* resource, it's
>>  		 * more likely just telling us that this is MMIO space, and
>>  		 * that doesn't need to be removed.
>> +		 * Note this *entire* resource covering check is only
>> +		 * intended for 32 bit memory resources for the 16 bit
>> +		 * isa window we always apply the e820 entries.
>>  		 */
>> -		if (e820_start <= avail->start && avail->end <= e820_end) {
>> +		if (avail->start >= ISA_END_ADDRESS &&
> 
> What is the justification for needing to check ISA_END_ADDRESS here?
> The commit log basically says "this makes it work", which isn't very
> satisfying.

I did not have a log with the:

>   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
>   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
>   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]

messages. Instead I was looking at this log:

https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html

With the following messages (as I quoted higher up in the email-thread):

"""
 1839 17:54:41.406548  <6>[    0.000000] BIOS-provided physical RAM map:
 1840 17:54:41.413121  <6>[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] type 16
 1841 17:54:41.419712  <6>[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
 1842 17:54:41.430192  <6>[    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
 1843 17:54:41.436207  <6>[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000fffffff] usable
 1844 17:54:41.446353  <6>[    0.000000] BIOS-e820: [mem 0x0000000010000000-0x0000000012150fff] reserved
 1845 17:54:41.453290  <6>[    0.000000] BIOS-e820: [mem 0x0000000012151000-0x000000007a9fcfff] usable
 1846 17:54:41.459966  <6>[    0.000000] BIOS-e820: [mem 0x000000007a9fd000-0x000000007affffff] type 16
 1847 17:54:41.469549  <6>[    0.000000] BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
 1848 17:54:41.476685  <6>[    0.000000] BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
 1849 17:54:41.486439  <6>[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
 1850 17:54:41.492994  <6>[    0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed17fff] reserved
 1851 17:54:41.503008  <6>[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
...
 2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
 2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
"""

###

What I find weird here is that this boot with a somewhat earlier kernel has:

 2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
 2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]

Where as the boot with the clipped messages has:

<6>[    0.313705] acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
<6>[    0.314702] acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
<6>[    0.315747] PCI host bridge to bus 0000:00
<6>[    0.316118] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
<6>[    0.316703] pci_bus 0000:00: root bus resource [io  0x1000-0xffff window]
<6>[    0.317298] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
<6>[    0.317703] pci_bus 0000:00: root bus resource [bus 00-ff]

So in the boot with the clipped messages we are getting 3 windows from _CRS
where as before we were getting only 2?  I know that we are now applying
the clipping directly when we are parsing the resources. So I guess that
before we somehow also merged the 2 resources which are back to back together
before the "root bus resource" messages get printed. This caused me to just
see the "root bus resource [mem 0x7b800000-0xe0000000 window]" which is
not fully covered which is why I focused on the ISA MMIO window.

> The Asus log of the last good commit shows:
> 
>   PCI: 00:0d.0 [8086/5a92] enabled
>   constrain_resources: PCI: 00:0d.0 10 base d0000000 limit d0ffffff mem (fixed)
>   ...
>   BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>   BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>   BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>   ...
>   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
>   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
>   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
>   acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
>   acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
> 
> It looks like _CRS gave us [mem 0x80000000-0xe0000000 window], which
> is one byte too big (it should end at 0xdfffffff).

Yeah but that gets clipped off anyways, so that should not matter.

s> 
> From the firmware part of the log, it looks like 00:0d.0 is a hidden
> device that consumes [mem d0000000-0xd0ffffff].  Linux doesn't
> enumerate 00:0d.0, so firmware should have carved that out of the [mem
> 0x80000000-0xe0000000 window] in _CRS.
> 
> We don't have a log with 5949965ec934 ("x86/PCI: Preserve host bridge
> windows completely covered by E820") applied, but I think it would
> show this:
> 
>   acpi PNP0A08:00: resource [mem 0x000a0000-0x000bffff window] fully covered by e820 entry [mem 0x000a0000-0x000fffff]
>   acpi PNP0A08:00: resource [mem 0x7b800000-0x7fffffff window] fully covered by e820 entry [mem 0x7b000000-0x7fffffff]
> 
> instead of clipping those windows.  But none of the devices we
> enumerate appears to be using either of those windows.

Not with a working kernel no, because they are clipped of, but
with the don't clip fully-covered _CRS windows change, the 
[mem 0x7b000000-0x7fffffff] all of a sudden becomes fair game
to assign BARs to.

I agree that we will get a fully-covered msg for that one with
the patch, which would change:

[    0.317298] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]

to:

[    0.317298] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]

and I believe that likely is our culprit.

So to fix this I guess that we first need to merge back-to-back
windows coming from _CRS into a single window, before calling
remove_e820_regions()

That would pass [mem 0x7b800000-0xe0000000 window] to remove_e820_regions()
in a single call (as I expected from the logs), which should result in
both the top and the bottom still getting clipped as before.

I've been looking around in the code a but I could not quickly find
a helper to do the back-to-back resource merging before calling
remove_e820_regions(), any suggestions for this?

Regards,

Hans







> 
> We do have this:
> 
>   pci 0000:00:18.2: reg 0x10: [mem 0xde000000-0xde000fff 64bit]
>   pci 0000:00:18.2: reg 0x18: [mem 0xc2b31000-0xc2b31fff 64bit]
>   pci 0000:00:18.2: can't claim BAR 0 [mem 0xde000000-0xde000fff 64bit]: no compatible bridge window
>   pci 0000:00:18.2: BAR 0: assigned [mem 0x80000000-0x80000fff 64bit]
> 
> Where the original [mem 0xde000000-0xde000fff 64bit] assignment was
> perfectly legal, but we clipped [mem 0x80000000-0xe0000000 window] to
> [mem 0x80000000-0xcfffffff window] instead of just punching a hole for
> the 00:0d.0 carve-out.
> 
> Maybe 5949965ec934 puts 00:18.2 BAR 0 somewhere that doesn't work,
> or maybe the clipping to [mem 0x00100000-0x000bffff window] or
> [mem 0x80000000-0x7fffffff window] doesn't work as expected?
> They are supposed to be interpreted as "empty", but certainly
> resource_size([0x00100000-0x000bffff]) is != 0.
> 
>> +		    e820_start <= avail->start && avail->end <= e820_end) {
>>  			dev_info(dev, "resource %pR fully covered by e820 entry [mem %#010Lx-%#010Lx]\n",
>>  				 avail, e820_start, e820_end);
>>  			continue;
>> -- 
>> 2.35.1
>>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-29 18:44         ` Guillaume Tucker
@ 2022-04-04 19:44           ` Guillaume Tucker
  2022-04-05  8:13             ` Hans de Goede
  2022-04-05 17:57             ` Bjorn Helgaas
       [not found]           ` <16E2C910B4947F17.5433@groups.io>
  1 sibling, 2 replies; 22+ messages in thread
From: Guillaume Tucker @ 2022-04-04 19:44 UTC (permalink / raw)
  To: Hans de Goede, Mark Brown
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg, linux-pci,
	kernelci, kernelci-results

+kernelci

On 29/03/2022 19:44, Guillaume Tucker wrote:
> On 28/03/2022 13:54, Hans de Goede wrote:
>> Hi,
>>
>> On 3/24/22 23:19, Mark Brown wrote:
>>> On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:
>>>
>>>> Mark, if one of use writes a test patch, can you get that Asus machine to boot a
>>>> kernel build from next + the test patch ?
>>>
>>> I can't directly unfortunately as the board is in Collabora's lab but
>>> Guillaume (who's already CCed) ought to be able to, and I can generally
>>> prod and try to get that done.
>>
>> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
>> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
>> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
>> and see if that makes it boot again ?
> 
> Sorry I've been busy with a conference.  Sure, will put that
> through KernelCI tomorrow and let you know the outcome.

Well the issue seems to have been fixed on mainline, unless it's
intermittent.  In any case, next-20220404 is booting fine:

  https://linux.kernelci.org/test/plan/id/624aed811a5acd09adae071e/

Last time it was seen to fail was next-20220330:

  https://linux.kernelci.org/test/plan/id/62442f68e30d6f89a4ae06b7/


Ironically, the KernelCI staging linux-next job with the patches
mentioned in your previous email applied is now failing:

  https://staging.kernelci.org/test/plan/id/624b2d3b923f532dc305f4c7/

The kernel branch being used is:

  https://github.com/kernelci/linux/commits/staging-next


I haven't checked the logs or investigated any further, this is
just a quick summary based on the boot test results.

Please let us know if we should drop these patches or try
anything else.  I'll be on holiday for the rest of the week but
others can pick things up if needed.

Thanks,
Guillaume

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
       [not found]           ` <16E2C910B4947F17.5433@groups.io>
@ 2022-04-04 19:48             ` Guillaume Tucker
  0 siblings, 0 replies; 22+ messages in thread
From: Guillaume Tucker @ 2022-04-04 19:48 UTC (permalink / raw)
  To: Hans de Goede, Mark Brown
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg, linux-pci,
	kernelci, kernelci-results

On 04/04/2022 20:44, Guillaume Tucker wrote:
> Well the issue seems to have been fixed on mainline

Sorry, I meant linux-next.  It is also booting on mainline but I
don't think the regression ever went further than linux-next:

  https://linux.kernelci.org/test/plan/id/6246654e60a1cb470cae0680/

Guillaume

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-04 19:44           ` Guillaume Tucker
@ 2022-04-05  8:13             ` Hans de Goede
  2022-04-05 17:57             ` Bjorn Helgaas
  1 sibling, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2022-04-05  8:13 UTC (permalink / raw)
  To: Guillaume Tucker, Mark Brown
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg, linux-pci,
	kernelci, kernelci-results

Hi All,

On 4/4/22 21:44, Guillaume Tucker wrote:
> +kernelci
> 
> On 29/03/2022 19:44, Guillaume Tucker wrote:
>> On 28/03/2022 13:54, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 3/24/22 23:19, Mark Brown wrote:
>>>> On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:
>>>>
>>>>> Mark, if one of use writes a test patch, can you get that Asus machine to boot a
>>>>> kernel build from next + the test patch ?
>>>>
>>>> I can't directly unfortunately as the board is in Collabora's lab but
>>>> Guillaume (who's already CCed) ought to be able to, and I can generally
>>>> prod and try to get that done.
>>>
>>> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
>>> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
>>> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
>>> and see if that makes it boot again ?
>>
>> Sorry I've been busy with a conference.  Sure, will put that
>> through KernelCI tomorrow and let you know the outcome.
> 
> Well the issue seems to have been fixed on mainline, unless it's
> intermittent.  In any case, next-20220404 is booting fine:
> 
>   https://linux.kernelci.org/test/plan/id/624aed811a5acd09adae071e/
> 
> Last time it was seen to fail was next-20220330:
> 
>   https://linux.kernelci.org/test/plan/id/62442f68e30d6f89a4ae06b7/
> 
> 
> Ironically, the KernelCI staging linux-next job with the patches
> mentioned in your previous email applied is now failing:
> 
>   https://staging.kernelci.org/test/plan/id/624b2d3b923f532dc305f4c7/
> 
> The kernel branch being used is:
> 
>   https://github.com/kernelci/linux/commits/staging-next
> 
> 
> I haven't checked the logs or investigated any further, this is
> just a quick summary based on the boot test results.
> 
> Please let us know if we should drop these patches or try
> anything else.  I'll be on holiday for the rest of the week but
> others can pick things up if needed.

The reason why next and mainline are building now is because
the patch the bisect pointed out never made it into mainline
and Bjorn has dropped it from -next.

I fully expect that -next with 

https://lore.kernel.org/linux-acpi/20220304035110.988712-4-helgaas@kernel.org/

or mainline with the entire series from that link applied will
still not boot.

But we do need that last patch to fix various issues on
other boards.

See my previous reply in this thread:
https://lore.kernel.org/linux-pci/76c5de03-a3a4-8444-d7f6-496fa119d830@redhat.com/

for some further analysis of what I think is happening here.

As mentioned there I believe this can be fixed by merging
back-to-back resources into a single resource before calling
remove_e820_regions() but I could not find a good example / helper
code to do the merging of the resources.

If someone can give me some pointers wrt this I can try to
come up with something and then provide a set of patches
for testing on the asus-C523NA-A20057-coral .

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-04 19:44           ` Guillaume Tucker
  2022-04-05  8:13             ` Hans de Goede
@ 2022-04-05 17:57             ` Bjorn Helgaas
  1 sibling, 0 replies; 22+ messages in thread
From: Bjorn Helgaas @ 2022-04-05 17:57 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Hans de Goede, Mark Brown, Bjorn Helgaas, Rafael J . Wysocki,
	Mika Westerberg, linux-pci, kernelci, kernelci-results

On Mon, Apr 04, 2022 at 08:44:41PM +0100, Guillaume Tucker wrote:
> On 29/03/2022 19:44, Guillaume Tucker wrote:
> > On 28/03/2022 13:54, Hans de Goede wrote:
> >> On 3/24/22 23:19, Mark Brown wrote:
> >>> On Thu, Mar 24, 2022 at 09:34:30PM +0100, Hans de Goede wrote:
> >> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
> >> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
> >> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
> >> and see if that makes it boot again ?
> > 
> > Sorry I've been busy with a conference.  Sure, will put that
> > through KernelCI tomorrow and let you know the outcome.
> 
> Well the issue seems to have been fixed on mainline, unless it's
> intermittent.  In any case, next-20220404 is booting fine:
> 
>   https://linux.kernelci.org/test/plan/id/624aed811a5acd09adae071e/
> 
> Last time it was seen to fail was next-20220330:
> 
>   https://linux.kernelci.org/test/plan/id/62442f68e30d6f89a4ae06b7/

This is because I dropped 5949965ec934 ("x86/PCI: Preserve host bridge
windows completely covered by E820") from the PCI tree starting with
next-20220401 because it causes the regression.  So I expect
next-20220404 to boot fine (next-20220401 should boot fine as well; I
don't know whether that was tested).

The gory details:

  20220330 should fail; it includes:
    5949965ec934 ("x86/PCI: Preserve host bridge windows completely covered by E820")
    d13f73e9108a ("x86/PCI: Log host bridge window clipping for E820 regions")
    9c253994c5ba ("x86/PCI: Eliminate remove_e820_regions() common subexpressions")
    ffb217a13a2e ("Linux 5.17-rc7")

  20220331 should fail; it includes:
    18146f25ac66 ("PCI: hv: Remove unused hv_set_msi_entry_from_desc()")
    5949965ec934 ("x86/PCI: Preserve host bridge windows completely covered by E820")
    d13f73e9108a ("x86/PCI: Log host bridge window clipping for E820 regions")
    9c253994c5ba ("x86/PCI: Eliminate remove_e820_regions() common subexpressions")
    ffb217a13a2e ("Linux 5.17-rc7")

  20220401 should boot; it includes:
    1c6cec4ab487 ("x86/PCI: Log host bridge window clipping for E820 regions")
    b2922e67d233 ("x86/PCI: Eliminate remove_e820_regions() common subexpressions")
    22ef7ee3eeb2 ("PCI: hv: Remove unused hv_set_msi_entry_from_desc()")
    148a65047695 ("Merge tag 'pci-v5.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci")

  20220404 should boot; it includes:
    22ef7ee3eeb2 ("PCI: hv: Remove unused hv_set_msi_entry_from_desc()")
    148a65047695 ("Merge tag 'pci-v5.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci")

> Ironically, the KernelCI staging linux-next job with the patches
> mentioned in your previous email applied is now failing:
> 
>   https://staging.kernelci.org/test/plan/id/624b2d3b923f532dc305f4c7/

This says we tested commit 1aceacc82d3f, which I guess is the
staging-next-20220404.1 tag at https://github.com/kernelci/linux.git.
It took me a while to find the commit history, but
https://github.com/kernelci/linux/commits/1aceacc82d3f says this
includes:

  0a0c05a90278 x86/PCI: Limit "e820 entry fully covers window" check to non ISA MMIO
  b5fd57109d22 x86/PCI: Preserve host bridge windows completely covered by E820

So the proposed fix (0a0c05a90278) apparently didn't work.

Bjorn

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-03-24 17:52 ` next/master bisection: baseline.login on asus-C523NA-A20057-coral Mark Brown
  2022-03-24 20:34   ` Hans de Goede
  2022-03-29 22:14   ` Bjorn Helgaas
@ 2022-04-05 23:53   ` Bjorn Helgaas
  2022-04-06 18:59     ` Bjorn Helgaas
  2 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2022-04-05 23:53 UTC (permalink / raw)
  To: Mark Brown
  Cc: Bjorn Helgaas, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, kernelci-results, bot, gtucker, linux-pci

On Thu, Mar 24, 2022 at 05:52:19PM +0000, Mark Brown wrote:
> On Wed, Mar 23, 2022 at 11:47:08PM -0700, KernelCI bot wrote:
> 
> The KernelCI bisection bot has identified commit 5949965ec9340cfc0e
> ("x86/PCI: Preserve host bridge windows completely covered by E820")
> as causing a boot regression in next on asus-C523NA-A20057-coral (a
> Chromebook AIUI).  Unfortunately there's no useful output when starting
> the kernel.  I've left the full report below including links to the web
> dashboard.
> 
> The last successful boot in -next had this log:
> 
>    https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
> 
> I'd also note that the machine hp-x360-12b-n4000-octopus appears to have
> started failing at the same time with similar symptoms, failing log:
> 
>    https://storage.kernelci.org/next/master/next-20220324/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-hp-x360-12b-n4000-octopus.html

Is there any way to get the contents of:

  /sys/firmware/acpi/tables/DSDT
  /sys/firmware/acpi/tables/SSDT*

from these Chromebooks?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-04  8:45           ` Hans de Goede
@ 2022-04-06  0:19             ` Bjorn Helgaas
  2022-04-11  9:54               ` Hans de Goede
  0 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2022-04-06  0:19 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Mark Brown, Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

On Mon, Apr 04, 2022 at 10:45:10AM +0200, Hans de Goede wrote:
> On 3/30/22 13:35, Bjorn Helgaas wrote:
> > On Mon, Mar 28, 2022 at 02:54:42PM +0200, Hans de Goede wrote:

> >> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
> >> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
> >> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
> >> and see if that makes it boot again ?

> >> From b8080a6d2d889847900e1408f71d0c01c73f5c94 Mon Sep 17 00:00:00 2001
> >> From: Hans de Goede <hdegoede@redhat.com>
> >> Date: Mon, 28 Mar 2022 14:47:41 +0200
> >> Subject: [PATCH] x86/PCI: Limit "e820 entry fully covers window" check to non
> >>  ISA MMIO addresses
> >>
> >> Commit FIXME ("x86/PCI: Preserve host bridge windows completely
> >> covered by E820") added a check to skip e820 table entries which
> >> fully cover a PCI bride's memory window when clipping PCI bridge
> >> memory windows.
> >>
> >> This check also caused ISA MMIO windows to not get clipped when
> >> fully covered, which is causing some coreboot based Chromebooks
> >> to not boot.
> >>
> >> Modify the fully covered check to not apply to ISA MMIO windows.

> >> Fixes: FIXME ("x86/PCI: Preserve host bridge windows completely covered by E820")
> >> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> >> ---
> >>  arch/x86/kernel/resource.c | 6 +++++-
> >>  1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> >> index 6be82e16e5f4..d9ec913619c3 100644
> >> --- a/arch/x86/kernel/resource.c
> >> +++ b/arch/x86/kernel/resource.c
> >> @@ -46,8 +46,12 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
> >>  		 * devices.  But if it covers the *entire* resource, it's
> >>  		 * more likely just telling us that this is MMIO space, and
> >>  		 * that doesn't need to be removed.
> >> +		 * Note this *entire* resource covering check is only
> >> +		 * intended for 32 bit memory resources for the 16 bit
> >> +		 * isa window we always apply the e820 entries.
> >>  		 */
> >> -		if (e820_start <= avail->start && avail->end <= e820_end) {
> >> +		if (avail->start >= ISA_END_ADDRESS &&
> > 
> > What is the justification for needing to check ISA_END_ADDRESS here?
> > The commit log basically says "this makes it work", which isn't very
> > satisfying.
> 
> I did not have a log with the:
> 
> >   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
> >   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
> >   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
> 
> messages. Instead I was looking at this log:
> 
> https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
> 
> With the following messages (as I quoted higher up in the email-thread):
> 
> """
>  1839 17:54:41.406548  <6>[    0.000000] BIOS-provided physical RAM map:
>  1840 17:54:41.413121  <6>[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] type 16
>  1841 17:54:41.419712  <6>[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
>  1842 17:54:41.430192  <6>[    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
>  1843 17:54:41.436207  <6>[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000fffffff] usable
>  1844 17:54:41.446353  <6>[    0.000000] BIOS-e820: [mem 0x0000000010000000-0x0000000012150fff] reserved
>  1845 17:54:41.453290  <6>[    0.000000] BIOS-e820: [mem 0x0000000012151000-0x000000007a9fcfff] usable
>  1846 17:54:41.459966  <6>[    0.000000] BIOS-e820: [mem 0x000000007a9fd000-0x000000007affffff] type 16
>  1847 17:54:41.469549  <6>[    0.000000] BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>  1848 17:54:41.476685  <6>[    0.000000] BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>  1849 17:54:41.486439  <6>[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>  1850 17:54:41.492994  <6>[    0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed17fff] reserved
>  1851 17:54:41.503008  <6>[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
> ...
>  2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>  2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
> """
> 
> ###
> 
> What I find weird here is that this boot with a somewhat earlier kernel has:
> 
>  2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>  2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
> 
> Where as the boot with the clipped messages has:
> 
> <6>[    0.313705] acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
> <6>[    0.314702] acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
> <6>[    0.315747] PCI host bridge to bus 0000:00
> <6>[    0.316118] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> <6>[    0.316703] pci_bus 0000:00: root bus resource [io  0x1000-0xffff window]
> <6>[    0.317298] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
> <6>[    0.317703] pci_bus 0000:00: root bus resource [bus 00-ff]
> 
> So in the boot with the clipped messages we are getting 3 windows from _CRS
> where as before we were getting only 2?  I know that we are now applying
> the clipping directly when we are parsing the resources. So I guess that
> before we somehow also merged the 2 resources which are back to back together
> before the "root bus resource" messages get printed. This caused me to just
> see the "root bus resource [mem 0x7b800000-0xe0000000 window]" which is
> not fully covered which is why I focused on the ISA MMIO window.

Yes, we do merge adjacent windows together.  See 7c3855c423b1 ("PCI:
Coalesce host bridge contiguous apertures") [1].  This is because our
BAR assignment isn't smart enough to assign space from two ajacent
resources to one BAR.

We have (at least) three apertures, and the latter two would be merged
together:

  acpi PNP0A08:00: ... [mem 0x000a0000-0x000bffff window] ...
  acpi PNP0A08:00: ... [mem 0x7b800000-0x7fffffff window] ...
  acpi PNP0A08:00: ... [mem 0x80000000-0xe0000000 window] ...

The boot at [2] was with 5.17.0-rc7-next-20220310, which includes
7f7b4236f204 ("x86/PCI: Ignore E820 reservations for bridge windows on
newer systems") [3], so we ignored E820 completely and we found two
windows (the VGA framebuffer and the big merged window):

  Linux version 5.17.0-rc7-next-20220310 (KernelCI@build-j608383-x86-64-gcc-10-x86-64-defconfig-x86-chromebooc26pc) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT Fri Mar 11 17:23:28 UTC 2022
  pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
  pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]

The boot at [4] was with d13f73e9108a ("x86/PCI: Log host bridge
window clipping for E820 regions") [5].  In addition to logging,
d13f73e9108a also does the clipping *before* the merging:

  Linux version 5.17.0-rc7 (KernelCI@0bd4b548bde7) (gcc (Debian 10.2.1-6) 
  acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
  acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
  acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
  pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]

Here we clipped the VGA framebuffer and [mem 0x7b800000-0x7fffffff]
completely out, so we ignored them, and we clipped the big window to
avoid [mem 0xd0000000-0xd0ffffff], so all we have left is
[mem 0x80000000-0xcfffffff].

> > The Asus log of the last good commit shows:
> > 
> >   PCI: 00:0d.0 [8086/5a92] enabled
> >   constrain_resources: PCI: 00:0d.0 10 base d0000000 limit d0ffffff mem (fixed)
> >   ...
> >   BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
> >   BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
> >   BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
> >   ...
> >   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
> >   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
> >   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
> >   acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
> >   acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])

> > From the firmware part of the log, it looks like 00:0d.0 is a hidden
> > device that consumes [mem d0000000-0xd0ffffff].  Linux doesn't
> > enumerate 00:0d.0, so firmware should have carved that out of the [mem
> > 0x80000000-0xe0000000 window] in _CRS.
> > 
> > We don't have a log with 5949965ec934 ("x86/PCI: Preserve host bridge
> > windows completely covered by E820") applied, but I think it would
> > show this:
> > 
> >   acpi PNP0A08:00: resource [mem 0x000a0000-0x000bffff window] fully covered by e820 entry [mem 0x000a0000-0x000fffff]
> >   acpi PNP0A08:00: resource [mem 0x7b800000-0x7fffffff window] fully covered by e820 entry [mem 0x7b000000-0x7fffffff]
> > 
> > instead of clipping those windows.  But none of the devices we
> > enumerate appears to be using either of those windows.
> 
> Not with a working kernel no, because they are clipped of, but
> with the don't clip fully-covered _CRS windows change, the 
> [mem 0x7b000000-0x7fffffff] all of a sudden becomes fair game
> to assign BARs to.
> 
> I agree that we will get a fully-covered msg for that one with
> the patch, which would change:
> 
> [    0.317298] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
> 
> to:
> 
> [    0.317298] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]
> 
> and I believe that likely is our culprit.

I think you're probably right.  We started with this:

  BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
  BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
  BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
  BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
  acpi PNP0A08:00: ... [mem 0x000a0000-0x000bffff window] ...
  acpi PNP0A08:00: ... [mem 0x7b800000-0x7fffffff window] ...
  acpi PNP0A08:00: ... [mem 0x80000000-0xe0000000 window] ...

After 5949965ec934, clipping will give us this:

  pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
  pci_bus 0000:00: root bus resource [mem 0x7b800000-0x7fffffff window]
  pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]

and merging will give us this:

  pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
  pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]

BIOS left a 00:18.2 BAR here [6]:

  pci 0000:00:18.2: reg 0x10: [mem 0xde000000-0xde000fff 64bit]

That BAR is outside the windows we know about, so we'll move it,
probably to 0x7b800000 and maybe it doesn't work there.

> So to fix this I guess that we first need to merge back-to-back
> windows coming from _CRS into a single window, before calling
> remove_e820_regions()
> 
> That would pass [mem 0x7b800000-0xe0000000 window] to
> remove_e820_regions() in a single call (as I expected from the
> logs), which should result in both the top and the bottom still
> getting clipped as before.

So I think the progression is:

  1) Remove anything mentioned in E820 from _CRS (4dc2287c1805 [7]).
     This worked around some issues on Dell systems.

  2) Remove things mentioned in E820 unless they cover the entire
     window (5949965ec934 [8])

  3) Coalesce adjacent _CRS windows, *then* remove things mentioned in
     E820 unless they cover the entire (coalesced) window (current
     proposal)

Even 3) leaves us with the 00:18.2 BAR above that will be moved when
it doesn't need to be.  That could lead us to something like this:

  4) Coalesce adjacent _CRS windows, *then* remove things mentioned in
     E820 unless they cover the entire (coalesced) window (current
     proposal), but punch holes instead of lopping entire sections, so 
     we would end up with these windows:

      pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
      pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]
      pci_bus 0000:00: root bus resource [mem 0xd0100000-0xdfffffff window]

But I don't think this is leading to a maintainable result.  We
shouldn't be using E820 at all in an ACPI system (and again, the fact
that we *do* use it is my fault, and I'll take my beatings).  We need
to *reduce* or at least contain that E820 usage instead of expanding
it.

[1] https://git.kernel.org/linus/7c3855c423b1
[2] https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html#L2030
[3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/patch/?id=7f7b4236f204
[4] https://lava.collabora.co.uk/scheduler/job/5937945#L2023
[5] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/patch/?id=d13f73e9108a
[6] https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html#L2084
[7] https://git.kernel.org/linus/4dc2287c1805
[8] https://lore.kernel.org/all/20220304035110.988712-4-helgaas@kernel.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-05 23:53   ` Bjorn Helgaas
@ 2022-04-06 18:59     ` Bjorn Helgaas
  2022-04-06 19:37       ` Mark Brown
  0 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2022-04-06 18:59 UTC (permalink / raw)
  To: Mark Brown
  Cc: Bjorn Helgaas, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, kernelci-results, bot, gtucker, linux-pci

On Tue, Apr 05, 2022 at 06:53:17PM -0500, Bjorn Helgaas wrote:
> On Thu, Mar 24, 2022 at 05:52:19PM +0000, Mark Brown wrote:
> > On Wed, Mar 23, 2022 at 11:47:08PM -0700, KernelCI bot wrote:
> > 
> > The KernelCI bisection bot has identified commit 5949965ec9340cfc0e
> > ("x86/PCI: Preserve host bridge windows completely covered by E820")
> > as causing a boot regression in next on asus-C523NA-A20057-coral (a
> > Chromebook AIUI).  Unfortunately there's no useful output when starting
> > the kernel.  I've left the full report below including links to the web
> > dashboard.
> > 
> > The last successful boot in -next had this log:
> > 
> >    https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
> > 
> > I'd also note that the machine hp-x360-12b-n4000-octopus appears to have
> > started failing at the same time with similar symptoms, failing log:
> > 
> >    https://storage.kernelci.org/next/master/next-20220324/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-hp-x360-12b-n4000-octopus.html
> 
> Is there any way to get the contents of:
> 
>   /sys/firmware/acpi/tables/DSDT
>   /sys/firmware/acpi/tables/SSDT*
> 
> from these Chromebooks?

Is there hope for this, or should I look for another way to get this
information?

Thanks,
  Bjorn

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-06 18:59     ` Bjorn Helgaas
@ 2022-04-06 19:37       ` Mark Brown
  2022-04-06 20:11         ` Guillaume Tucker
  2022-04-06 20:56         ` Guenter Roeck
  0 siblings, 2 replies; 22+ messages in thread
From: Mark Brown @ 2022-04-06 19:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, kernelci-results, bot, gtucker, linux-pci,
	Guenter Roeck

[-- Attachment #1: Type: text/plain, Size: 521 bytes --]

On Wed, Apr 06, 2022 at 01:59:31PM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 05, 2022 at 06:53:17PM -0500, Bjorn Helgaas wrote:

> > Is there any way to get the contents of:

> >   /sys/firmware/acpi/tables/DSDT
> >   /sys/firmware/acpi/tables/SSDT*

> > from these Chromebooks?

> Is there hope for this, or should I look for another way to get this
> information?

I believe Guillaume is out of office this week.  Copying in Guenter as
well since he's on the ChromeOS team in case he can help or knows
someone who can.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-06 19:37       ` Mark Brown
@ 2022-04-06 20:11         ` Guillaume Tucker
  2022-04-07 15:17           ` Denys Fedoryshchenko
  2022-04-06 20:56         ` Guenter Roeck
  1 sibling, 1 reply; 22+ messages in thread
From: Guillaume Tucker @ 2022-04-06 20:11 UTC (permalink / raw)
  To: Mark Brown, Bjorn Helgaas
  Cc: Bjorn Helgaas, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, kernelci-results, linux-pci, Guenter Roeck,
	kernelci, Michał Gałka, Denys

+kernelci +Michał +Denys

On 06/04/2022 20:37, Mark Brown wrote:
> On Wed, Apr 06, 2022 at 01:59:31PM -0500, Bjorn Helgaas wrote:
>> On Tue, Apr 05, 2022 at 06:53:17PM -0500, Bjorn Helgaas wrote:
> 
>>> Is there any way to get the contents of:
> 
>>>   /sys/firmware/acpi/tables/DSDT
>>>   /sys/firmware/acpi/tables/SSDT*
> 
>>> from these Chromebooks?
> 
>> Is there hope for this, or should I look for another way to get this
>> information?
> 
> I believe Guillaume is out of office this week.  Copying in Guenter as
> well since he's on the ChromeOS team in case he can help or knows
> someone who can.

Someone with access to the Collabora LAVA lab can also send a
custom job to try and get this information.

I'm back to work on Monday.

Thanks,
Guillaume

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-06 19:37       ` Mark Brown
  2022-04-06 20:11         ` Guillaume Tucker
@ 2022-04-06 20:56         ` Guenter Roeck
  1 sibling, 0 replies; 22+ messages in thread
From: Guenter Roeck @ 2022-04-06 20:56 UTC (permalink / raw)
  To: Mark Brown
  Cc: Bjorn Helgaas, Bjorn Helgaas, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, kernelci-results, bot, gtucker, linux-pci

On Wed, Apr 06, 2022 at 08:37:26PM +0100, Mark Brown wrote:
> On Wed, Apr 06, 2022 at 01:59:31PM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 05, 2022 at 06:53:17PM -0500, Bjorn Helgaas wrote:
> 
> > > Is there any way to get the contents of:
> 
> > >   /sys/firmware/acpi/tables/DSDT
> > >   /sys/firmware/acpi/tables/SSDT*
> 
> > > from these Chromebooks?
> 
> > Is there hope for this, or should I look for another way to get this
> > information?
> 
> I believe Guillaume is out of office this week.  Copying in Guenter as
> well since he's on the ChromeOS team in case he can help or knows
> someone who can.

I _think_ the source should be in
https://chromium.googlesource.com/chromiumos/third_party/coreboot,
branch firmware-coral-10068.B,
in src/mainboard/google/reef/variants/coral/devicetree.cb.

Does this help, or do you need the actual binary devicetree file(s) ?

Guenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-06 20:11         ` Guillaume Tucker
@ 2022-04-07 15:17           ` Denys Fedoryshchenko
  0 siblings, 0 replies; 22+ messages in thread
From: Denys Fedoryshchenko @ 2022-04-07 15:17 UTC (permalink / raw)
  To: Guillaume Tucker, Mark Brown, Bjorn Helgaas
  Cc: Bjorn Helgaas, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, kernelci-results, linux-pci, Guenter Roeck,
	kernelci, Michał Gałka

[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]

On Wed, 2022-04-06 at 21:11 +0100, Guillaume Tucker wrote:
> +kernelci +Michał +Denys
> 
> On 06/04/2022 20:37, Mark Brown wrote:
> > On Wed, Apr 06, 2022 at 01:59:31PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 05, 2022 at 06:53:17PM -0500, Bjorn Helgaas wrote:
> > 
> > > > Is there any way to get the contents of:
> > 
> > > >   /sys/firmware/acpi/tables/DSDT
> > > >   /sys/firmware/acpi/tables/SSDT*
> > 
> > > > from these Chromebooks?
> > 
> > > Is there hope for this, or should I look for another way to get
> > > this
> > > information?
> > 
> > I believe Guillaume is out of office this week.  Copying in Guenter
> > as
> > well since he's on the ChromeOS team in case he can help or knows
> > someone who can.
> 
> Someone with access to the Collabora LAVA lab can also send a
> custom job to try and get this information.
> 
> I'm back to work on Monday.
> 
> Thanks,
> Guillaume
Hi All

Device-type: asus-C523NA-A20057-coral
Thats all i got, it is only one SSDT table.

Please let me know if anything else needed.



Best regards,
Denys Fedoryshchenko


[-- Attachment #2: acpi.tgz --]
[-- Type: application/x-compressed-tar, Size: 7150 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-06  0:19             ` Bjorn Helgaas
@ 2022-04-11  9:54               ` Hans de Goede
  2022-04-11  9:57                 ` Hans de Goede
  0 siblings, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2022-04-11  9:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mark Brown, Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

Hi Bjorn,

On 4/6/22 02:19, Bjorn Helgaas wrote:
> On Mon, Apr 04, 2022 at 10:45:10AM +0200, Hans de Goede wrote:
>> On 3/30/22 13:35, Bjorn Helgaas wrote:
>>> On Mon, Mar 28, 2022 at 02:54:42PM +0200, Hans de Goede wrote:
> 
>>>> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
>>>> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
>>>> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
>>>> and see if that makes it boot again ?
> 
>>>> From b8080a6d2d889847900e1408f71d0c01c73f5c94 Mon Sep 17 00:00:00 2001
>>>> From: Hans de Goede <hdegoede@redhat.com>
>>>> Date: Mon, 28 Mar 2022 14:47:41 +0200
>>>> Subject: [PATCH] x86/PCI: Limit "e820 entry fully covers window" check to non
>>>>  ISA MMIO addresses
>>>>
>>>> Commit FIXME ("x86/PCI: Preserve host bridge windows completely
>>>> covered by E820") added a check to skip e820 table entries which
>>>> fully cover a PCI bride's memory window when clipping PCI bridge
>>>> memory windows.
>>>>
>>>> This check also caused ISA MMIO windows to not get clipped when
>>>> fully covered, which is causing some coreboot based Chromebooks
>>>> to not boot.
>>>>
>>>> Modify the fully covered check to not apply to ISA MMIO windows.
> 
>>>> Fixes: FIXME ("x86/PCI: Preserve host bridge windows completely covered by E820")
>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>> ---
>>>>  arch/x86/kernel/resource.c | 6 +++++-
>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>>>> index 6be82e16e5f4..d9ec913619c3 100644
>>>> --- a/arch/x86/kernel/resource.c
>>>> +++ b/arch/x86/kernel/resource.c
>>>> @@ -46,8 +46,12 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>>>>  		 * devices.  But if it covers the *entire* resource, it's
>>>>  		 * more likely just telling us that this is MMIO space, and
>>>>  		 * that doesn't need to be removed.
>>>> +		 * Note this *entire* resource covering check is only
>>>> +		 * intended for 32 bit memory resources for the 16 bit
>>>> +		 * isa window we always apply the e820 entries.
>>>>  		 */
>>>> -		if (e820_start <= avail->start && avail->end <= e820_end) {
>>>> +		if (avail->start >= ISA_END_ADDRESS &&
>>>
>>> What is the justification for needing to check ISA_END_ADDRESS here?
>>> The commit log basically says "this makes it work", which isn't very
>>> satisfying.
>>
>> I did not have a log with the:
>>
>>>   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
>>>   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
>>>   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
>>
>> messages. Instead I was looking at this log:
>>
>> https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
>>
>> With the following messages (as I quoted higher up in the email-thread):
>>
>> """
>>  1839 17:54:41.406548  <6>[    0.000000] BIOS-provided physical RAM map:
>>  1840 17:54:41.413121  <6>[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] type 16
>>  1841 17:54:41.419712  <6>[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
>>  1842 17:54:41.430192  <6>[    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
>>  1843 17:54:41.436207  <6>[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000fffffff] usable
>>  1844 17:54:41.446353  <6>[    0.000000] BIOS-e820: [mem 0x0000000010000000-0x0000000012150fff] reserved
>>  1845 17:54:41.453290  <6>[    0.000000] BIOS-e820: [mem 0x0000000012151000-0x000000007a9fcfff] usable
>>  1846 17:54:41.459966  <6>[    0.000000] BIOS-e820: [mem 0x000000007a9fd000-0x000000007affffff] type 16
>>  1847 17:54:41.469549  <6>[    0.000000] BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>>  1848 17:54:41.476685  <6>[    0.000000] BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>>  1849 17:54:41.486439  <6>[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>>  1850 17:54:41.492994  <6>[    0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed17fff] reserved
>>  1851 17:54:41.503008  <6>[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
>> ...
>>  2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>>  2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
>> """
>>
>> ###
>>
>> What I find weird here is that this boot with a somewhat earlier kernel has:
>>
>>  2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>>  2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
>>
>> Where as the boot with the clipped messages has:
>>
>> <6>[    0.313705] acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
>> <6>[    0.314702] acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
>> <6>[    0.315747] PCI host bridge to bus 0000:00
>> <6>[    0.316118] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
>> <6>[    0.316703] pci_bus 0000:00: root bus resource [io  0x1000-0xffff window]
>> <6>[    0.317298] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
>> <6>[    0.317703] pci_bus 0000:00: root bus resource [bus 00-ff]
>>
>> So in the boot with the clipped messages we are getting 3 windows from _CRS
>> where as before we were getting only 2?  I know that we are now applying
>> the clipping directly when we are parsing the resources. So I guess that
>> before we somehow also merged the 2 resources which are back to back together
>> before the "root bus resource" messages get printed. This caused me to just
>> see the "root bus resource [mem 0x7b800000-0xe0000000 window]" which is
>> not fully covered which is why I focused on the ISA MMIO window.
> 
> Yes, we do merge adjacent windows together.  See 7c3855c423b1 ("PCI:
> Coalesce host bridge contiguous apertures") [1].  This is because our
> BAR assignment isn't smart enough to assign space from two ajacent
> resources to one BAR.
> 
> We have (at least) three apertures, and the latter two would be merged
> together:
> 
>   acpi PNP0A08:00: ... [mem 0x000a0000-0x000bffff window] ...
>   acpi PNP0A08:00: ... [mem 0x7b800000-0x7fffffff window] ...
>   acpi PNP0A08:00: ... [mem 0x80000000-0xe0000000 window] ...
> 
> The boot at [2] was with 5.17.0-rc7-next-20220310, which includes
> 7f7b4236f204 ("x86/PCI: Ignore E820 reservations for bridge windows on
> newer systems") [3], so we ignored E820 completely and we found two
> windows (the VGA framebuffer and the big merged window):
> 
>   Linux version 5.17.0-rc7-next-20220310 (KernelCI@build-j608383-x86-64-gcc-10-x86-64-defconfig-x86-chromebooc26pc) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT Fri Mar 11 17:23:28 UTC 2022
>   pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>   pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
> 
> The boot at [4] was with d13f73e9108a ("x86/PCI: Log host bridge
> window clipping for E820 regions") [5].  In addition to logging,
> d13f73e9108a also does the clipping *before* the merging:
> 
>   Linux version 5.17.0-rc7 (KernelCI@0bd4b548bde7) (gcc (Debian 10.2.1-6) 
>   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
>   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
>   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
>   pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
> 
> Here we clipped the VGA framebuffer and [mem 0x7b800000-0x7fffffff]
> completely out, so we ignored them, and we clipped the big window to
> avoid [mem 0xd0000000-0xd0ffffff], so all we have left is
> [mem 0x80000000-0xcfffffff].
> 
>>> The Asus log of the last good commit shows:
>>>
>>>   PCI: 00:0d.0 [8086/5a92] enabled
>>>   constrain_resources: PCI: 00:0d.0 10 base d0000000 limit d0ffffff mem (fixed)
>>>   ...
>>>   BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>>>   BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>>>   BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>>>   ...
>>>   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
>>>   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
>>>   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
>>>   acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
>>>   acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
> 
>>> From the firmware part of the log, it looks like 00:0d.0 is a hidden
>>> device that consumes [mem d0000000-0xd0ffffff].  Linux doesn't
>>> enumerate 00:0d.0, so firmware should have carved that out of the [mem
>>> 0x80000000-0xe0000000 window] in _CRS.
>>>
>>> We don't have a log with 5949965ec934 ("x86/PCI: Preserve host bridge
>>> windows completely covered by E820") applied, but I think it would
>>> show this:
>>>
>>>   acpi PNP0A08:00: resource [mem 0x000a0000-0x000bffff window] fully covered by e820 entry [mem 0x000a0000-0x000fffff]
>>>   acpi PNP0A08:00: resource [mem 0x7b800000-0x7fffffff window] fully covered by e820 entry [mem 0x7b000000-0x7fffffff]
>>>
>>> instead of clipping those windows.  But none of the devices we
>>> enumerate appears to be using either of those windows.
>>
>> Not with a working kernel no, because they are clipped of, but
>> with the don't clip fully-covered _CRS windows change, the 
>> [mem 0x7b000000-0x7fffffff] all of a sudden becomes fair game
>> to assign BARs to.
>>
>> I agree that we will get a fully-covered msg for that one with
>> the patch, which would change:
>>
>> [    0.317298] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
>>
>> to:
>>
>> [    0.317298] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]
>>
>> and I believe that likely is our culprit.
> 
> I think you're probably right.  We started with this:
> 
>   BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
>   BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>   BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>   BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>   acpi PNP0A08:00: ... [mem 0x000a0000-0x000bffff window] ...
>   acpi PNP0A08:00: ... [mem 0x7b800000-0x7fffffff window] ...
>   acpi PNP0A08:00: ... [mem 0x80000000-0xe0000000 window] ...
> 
> After 5949965ec934, clipping will give us this:
> 
>   pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>   pci_bus 0000:00: root bus resource [mem 0x7b800000-0x7fffffff window]
>   pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
> 
> and merging will give us this:
> 
>   pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>   pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]
> 
> BIOS left a 00:18.2 BAR here [6]:
> 
>   pci 0000:00:18.2: reg 0x10: [mem 0xde000000-0xde000fff 64bit]
> 
> That BAR is outside the windows we know about, so we'll move it,
> probably to 0x7b800000 and maybe it doesn't work there.
> 
>> So to fix this I guess that we first need to merge back-to-back
>> windows coming from _CRS into a single window, before calling
>> remove_e820_regions()
>>
>> That would pass [mem 0x7b800000-0xe0000000 window] to
>> remove_e820_regions() in a single call (as I expected from the
>> logs), which should result in both the top and the bottom still
>> getting clipped as before.
> 
> So I think the progression is:
> 
>   1) Remove anything mentioned in E820 from _CRS (4dc2287c1805 [7]).
>      This worked around some issues on Dell systems.
> 
>   2) Remove things mentioned in E820 unless they cover the entire
>      window (5949965ec934 [8])
> 
>   3) Coalesce adjacent _CRS windows, *then* remove things mentioned in
>      E820 unless they cover the entire (coalesced) window (current
>      proposal)
> 
> Even 3) leaves us with the 00:18.2 BAR above that will be moved when
> it doesn't need to be.

Right, but we currently already move it right, so this would not
be a regression?

> That could lead us to something like this:
> 
>   4) Coalesce adjacent _CRS windows, *then* remove things mentioned in
>      E820 unless they cover the entire (coalesced) window (current
>      proposal), but punch holes instead of lopping entire sections, so 
>      we would end up with these windows:
> 
>       pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>       pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]
>       pci_bus 0000:00: root bus resource [mem 0xd0100000-0xdfffffff window]
> 
> But I don't think this is leading to a maintainable result.  We
> shouldn't be using E820 at all in an ACPI system (and again, the fact
> that we *do* use it is my fault, and I'll take my beatings).  We need
> to *reduce* or at least contain that E820 usage instead of expanding
> it.

The problem is that as both the Lenovo X1 carbon 3th gen (IIRC)
regression as well as this regression shows, that not taking the
E820 reservations into account at all leads to regressions left
and right.

So it seems that not removing them is not really an option.

Also note that:

>   2) Remove things mentioned in E820 unless they cover the entire
>      window (5949965ec934 [8])
> 

and:

>   3) Coalesce adjacent _CRS windows, *then* remove things mentioned in
>      E820 unless they cover the entire (coalesced) window (current
>      proposal)

Makes use use the E820 reservations less, since we now skip them in
the cover entire window case. So this does follow your reduce
E820 usage direction, but in a fine-grained manner so as to not
cause regressions.

Regards,

Hans




> 
> [1] https://git.kernel.org/linus/7c3855c423b1
> [2] https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html#L2030
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/patch/?id=7f7b4236f204
> [4] https://lava.collabora.co.uk/scheduler/job/5937945#L2023
> [5] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/patch/?id=d13f73e9108a
> [6] https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html#L2084
> [7] https://git.kernel.org/linus/4dc2287c1805
> [8] https://lore.kernel.org/all/20220304035110.988712-4-helgaas@kernel.org/
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: next/master bisection: baseline.login on asus-C523NA-A20057-coral
  2022-04-11  9:54               ` Hans de Goede
@ 2022-04-11  9:57                 ` Hans de Goede
  0 siblings, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2022-04-11  9:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mark Brown, Bjorn Helgaas, Rafael J . Wysocki, Mika Westerberg,
	kernelci-results, bot, gtucker, linux-pci

Hi,

On 4/11/22 11:54, Hans de Goede wrote:
> Hi Bjorn,
> 
> On 4/6/22 02:19, Bjorn Helgaas wrote:
>> On Mon, Apr 04, 2022 at 10:45:10AM +0200, Hans de Goede wrote:
>>> On 3/30/22 13:35, Bjorn Helgaas wrote:
>>>> On Mon, Mar 28, 2022 at 02:54:42PM +0200, Hans de Goede wrote:
>>
>>>>> Ok, Guillaume, can you try a kernel with commit 5949965ec9340cfc0e65f7d8a576b660b26e2535
>>>>> ("x86/PCI: Preserve host bridge windows completely covered by E820") + the 
>>>>> attached patch added on top a try on the asus-C523NA-A20057-coral machine please
>>>>> and see if that makes it boot again ?
>>
>>>>> From b8080a6d2d889847900e1408f71d0c01c73f5c94 Mon Sep 17 00:00:00 2001
>>>>> From: Hans de Goede <hdegoede@redhat.com>
>>>>> Date: Mon, 28 Mar 2022 14:47:41 +0200
>>>>> Subject: [PATCH] x86/PCI: Limit "e820 entry fully covers window" check to non
>>>>>  ISA MMIO addresses
>>>>>
>>>>> Commit FIXME ("x86/PCI: Preserve host bridge windows completely
>>>>> covered by E820") added a check to skip e820 table entries which
>>>>> fully cover a PCI bride's memory window when clipping PCI bridge
>>>>> memory windows.
>>>>>
>>>>> This check also caused ISA MMIO windows to not get clipped when
>>>>> fully covered, which is causing some coreboot based Chromebooks
>>>>> to not boot.
>>>>>
>>>>> Modify the fully covered check to not apply to ISA MMIO windows.
>>
>>>>> Fixes: FIXME ("x86/PCI: Preserve host bridge windows completely covered by E820")
>>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>>> ---
>>>>>  arch/x86/kernel/resource.c | 6 +++++-
>>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>>>>> index 6be82e16e5f4..d9ec913619c3 100644
>>>>> --- a/arch/x86/kernel/resource.c
>>>>> +++ b/arch/x86/kernel/resource.c
>>>>> @@ -46,8 +46,12 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>>>>>  		 * devices.  But if it covers the *entire* resource, it's
>>>>>  		 * more likely just telling us that this is MMIO space, and
>>>>>  		 * that doesn't need to be removed.
>>>>> +		 * Note this *entire* resource covering check is only
>>>>> +		 * intended for 32 bit memory resources for the 16 bit
>>>>> +		 * isa window we always apply the e820 entries.
>>>>>  		 */
>>>>> -		if (e820_start <= avail->start && avail->end <= e820_end) {
>>>>> +		if (avail->start >= ISA_END_ADDRESS &&
>>>>
>>>> What is the justification for needing to check ISA_END_ADDRESS here?
>>>> The commit log basically says "this makes it work", which isn't very
>>>> satisfying.
>>>
>>> I did not have a log with the:
>>>
>>>>   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
>>>>   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
>>>>   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
>>>
>>> messages. Instead I was looking at this log:
>>>
>>> https://storage.kernelci.org/next/master/next-20220310/x86_64/x86_64_defconfig+x86-chromebook/gcc-10/lab-collabora/baseline-asus-C523NA-A20057-coral.html
>>>
>>> With the following messages (as I quoted higher up in the email-thread):
>>>
>>> """
>>>  1839 17:54:41.406548  <6>[    0.000000] BIOS-provided physical RAM map:
>>>  1840 17:54:41.413121  <6>[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] type 16
>>>  1841 17:54:41.419712  <6>[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
>>>  1842 17:54:41.430192  <6>[    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
>>>  1843 17:54:41.436207  <6>[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000fffffff] usable
>>>  1844 17:54:41.446353  <6>[    0.000000] BIOS-e820: [mem 0x0000000010000000-0x0000000012150fff] reserved
>>>  1845 17:54:41.453290  <6>[    0.000000] BIOS-e820: [mem 0x0000000012151000-0x000000007a9fcfff] usable
>>>  1846 17:54:41.459966  <6>[    0.000000] BIOS-e820: [mem 0x000000007a9fd000-0x000000007affffff] type 16
>>>  1847 17:54:41.469549  <6>[    0.000000] BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>>>  1848 17:54:41.476685  <6>[    0.000000] BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>>>  1849 17:54:41.486439  <6>[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>>>  1850 17:54:41.492994  <6>[    0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed17fff] reserved
>>>  1851 17:54:41.503008  <6>[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
>>> ...
>>>  2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>>>  2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
>>> """
>>>
>>> ###
>>>
>>> What I find weird here is that this boot with a somewhat earlier kernel has:
>>>
>>>  2030 17:54:42.809183  <6>[    0.313771] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>>>  2031 17:54:42.819092  <6>[    0.314424] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
>>>
>>> Where as the boot with the clipped messages has:
>>>
>>> <6>[    0.313705] acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
>>> <6>[    0.314702] acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
>>> <6>[    0.315747] PCI host bridge to bus 0000:00
>>> <6>[    0.316118] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
>>> <6>[    0.316703] pci_bus 0000:00: root bus resource [io  0x1000-0xffff window]
>>> <6>[    0.317298] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
>>> <6>[    0.317703] pci_bus 0000:00: root bus resource [bus 00-ff]
>>>
>>> So in the boot with the clipped messages we are getting 3 windows from _CRS
>>> where as before we were getting only 2?  I know that we are now applying
>>> the clipping directly when we are parsing the resources. So I guess that
>>> before we somehow also merged the 2 resources which are back to back together
>>> before the "root bus resource" messages get printed. This caused me to just
>>> see the "root bus resource [mem 0x7b800000-0xe0000000 window]" which is
>>> not fully covered which is why I focused on the ISA MMIO window.
>>
>> Yes, we do merge adjacent windows together.  See 7c3855c423b1 ("PCI:
>> Coalesce host bridge contiguous apertures") [1].  This is because our
>> BAR assignment isn't smart enough to assign space from two ajacent
>> resources to one BAR.
>>
>> We have (at least) three apertures, and the latter two would be merged
>> together:
>>
>>   acpi PNP0A08:00: ... [mem 0x000a0000-0x000bffff window] ...
>>   acpi PNP0A08:00: ... [mem 0x7b800000-0x7fffffff window] ...
>>   acpi PNP0A08:00: ... [mem 0x80000000-0xe0000000 window] ...
>>
>> The boot at [2] was with 5.17.0-rc7-next-20220310, which includes
>> 7f7b4236f204 ("x86/PCI: Ignore E820 reservations for bridge windows on
>> newer systems") [3], so we ignored E820 completely and we found two
>> windows (the VGA framebuffer and the big merged window):
>>
>>   Linux version 5.17.0-rc7-next-20220310 (KernelCI@build-j608383-x86-64-gcc-10-x86-64-defconfig-x86-chromebooc26pc) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT Fri Mar 11 17:23:28 UTC 2022
>>   pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>>   pci_bus 0000:00: root bus resource [mem 0x7b800000-0xe0000000 window]
>>
>> The boot at [4] was with d13f73e9108a ("x86/PCI: Log host bridge
>> window clipping for E820 regions") [5].  In addition to logging,
>> d13f73e9108a also does the clipping *before* the merging:
>>
>>   Linux version 5.17.0-rc7 (KernelCI@0bd4b548bde7) (gcc (Debian 10.2.1-6) 
>>   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
>>   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
>>   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
>>   pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
>>
>> Here we clipped the VGA framebuffer and [mem 0x7b800000-0x7fffffff]
>> completely out, so we ignored them, and we clipped the big window to
>> avoid [mem 0xd0000000-0xd0ffffff], so all we have left is
>> [mem 0x80000000-0xcfffffff].
>>
>>>> The Asus log of the last good commit shows:
>>>>
>>>>   PCI: 00:0d.0 [8086/5a92] enabled
>>>>   constrain_resources: PCI: 00:0d.0 10 base d0000000 limit d0ffffff mem (fixed)
>>>>   ...
>>>>   BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>>>>   BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>>>>   BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>>>>   ...
>>>>   acpi PNP0A08:00: clipped [mem 0x000a0000-0x000bffff window] to [mem 0x00100000-0x000bffff window] for e820 entry [mem 0x000a0000-0x000fffff]
>>>>   acpi PNP0A08:00: clipped [mem 0x7b800000-0x7fffffff window] to [mem 0x80000000-0x7fffffff window] for e820 entry [mem 0x7b000000-0x7fffffff]
>>>>   acpi PNP0A08:00: clipped [mem 0x80000000-0xe0000000 window] to [mem 0x80000000-0xcfffffff window] for e820 entry [mem 0xd0000000-0xd0ffffff]
>>>>   acpi PNP0A08:00: ignoring host bridge window [mem 0x00100000-0x000bffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
>>>>   acpi PNP0A08:00: ignoring host bridge window [mem 0x80000000-0x7fffffff window] (conflicts with PCI mem [mem 0x00000000-0x7fffffffff])
>>
>>>> From the firmware part of the log, it looks like 00:0d.0 is a hidden
>>>> device that consumes [mem d0000000-0xd0ffffff].  Linux doesn't
>>>> enumerate 00:0d.0, so firmware should have carved that out of the [mem
>>>> 0x80000000-0xe0000000 window] in _CRS.
>>>>
>>>> We don't have a log with 5949965ec934 ("x86/PCI: Preserve host bridge
>>>> windows completely covered by E820") applied, but I think it would
>>>> show this:
>>>>
>>>>   acpi PNP0A08:00: resource [mem 0x000a0000-0x000bffff window] fully covered by e820 entry [mem 0x000a0000-0x000fffff]
>>>>   acpi PNP0A08:00: resource [mem 0x7b800000-0x7fffffff window] fully covered by e820 entry [mem 0x7b000000-0x7fffffff]
>>>>
>>>> instead of clipping those windows.  But none of the devices we
>>>> enumerate appears to be using either of those windows.
>>>
>>> Not with a working kernel no, because they are clipped of, but
>>> with the don't clip fully-covered _CRS windows change, the 
>>> [mem 0x7b000000-0x7fffffff] all of a sudden becomes fair game
>>> to assign BARs to.
>>>
>>> I agree that we will get a fully-covered msg for that one with
>>> the patch, which would change:
>>>
>>> [    0.317298] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
>>>
>>> to:
>>>
>>> [    0.317298] pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]
>>>
>>> and I believe that likely is our culprit.
>>
>> I think you're probably right.  We started with this:
>>
>>   BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
>>   BIOS-e820: [mem 0x000000007b000000-0x000000007fffffff] reserved
>>   BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
>>   BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>>   acpi PNP0A08:00: ... [mem 0x000a0000-0x000bffff window] ...
>>   acpi PNP0A08:00: ... [mem 0x7b800000-0x7fffffff window] ...
>>   acpi PNP0A08:00: ... [mem 0x80000000-0xe0000000 window] ...
>>
>> After 5949965ec934, clipping will give us this:
>>
>>   pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>>   pci_bus 0000:00: root bus resource [mem 0x7b800000-0x7fffffff window]
>>   pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window]
>>
>> and merging will give us this:
>>
>>   pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>>   pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]
>>
>> BIOS left a 00:18.2 BAR here [6]:
>>
>>   pci 0000:00:18.2: reg 0x10: [mem 0xde000000-0xde000fff 64bit]
>>
>> That BAR is outside the windows we know about, so we'll move it,
>> probably to 0x7b800000 and maybe it doesn't work there.
>>
>>> So to fix this I guess that we first need to merge back-to-back
>>> windows coming from _CRS into a single window, before calling
>>> remove_e820_regions()
>>>
>>> That would pass [mem 0x7b800000-0xe0000000 window] to
>>> remove_e820_regions() in a single call (as I expected from the
>>> logs), which should result in both the top and the bottom still
>>> getting clipped as before.
>>
>> So I think the progression is:
>>
>>   1) Remove anything mentioned in E820 from _CRS (4dc2287c1805 [7]).
>>      This worked around some issues on Dell systems.
>>
>>   2) Remove things mentioned in E820 unless they cover the entire
>>      window (5949965ec934 [8])
>>
>>   3) Coalesce adjacent _CRS windows, *then* remove things mentioned in
>>      E820 unless they cover the entire (coalesced) window (current
>>      proposal)
>>
>> Even 3) leaves us with the 00:18.2 BAR above that will be moved when
>> it doesn't need to be.
> 
> Right, but we currently already move it right, so this would not
> be a regression?
> 
>> That could lead us to something like this:
>>
>>   4) Coalesce adjacent _CRS windows, *then* remove things mentioned in
>>      E820 unless they cover the entire (coalesced) window (current
>>      proposal), but punch holes instead of lopping entire sections, so 
>>      we would end up with these windows:
>>
>>       pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>>       pci_bus 0000:00: root bus resource [mem 0x7b800000-0xcfffffff window]
>>       pci_bus 0000:00: root bus resource [mem 0xd0100000-0xdfffffff window]
>>
>> But I don't think this is leading to a maintainable result.  We
>> shouldn't be using E820 at all in an ACPI system (and again, the fact
>> that we *do* use it is my fault, and I'll take my beatings).  We need
>> to *reduce* or at least contain that E820 usage instead of expanding
>> it.
> 
> The problem is that as both the Lenovo X1 carbon 3th gen (IIRC)
> regression as well as this regression shows, that not taking the
> E820 reservations into account at all leads to regressions left
> and right.
> 
> So it seems that not removing them is not really an option.
> 
> Also note that:
> 
>>   2) Remove things mentioned in E820 unless they cover the entire
>>      window (5949965ec934 [8])
>>
> 
> and:
> 
>>   3) Coalesce adjacent _CRS windows, *then* remove things mentioned in
>>      E820 unless they cover the entire (coalesced) window (current
>>      proposal)
> 
> Makes use use the E820 reservations less, since we now skip them in
> the cover entire window case. So this does follow your reduce
> E820 usage direction, but in a fine-grained manner so as to not
> cause regressions.

p.s.

Another option would be to go back to one of my initial patches for
this where we completely disable clipping _CRS windows based on
the E820 reservations on select models based on DMI matching the
models. This would at least allow us to finally fix the touchpad /
thunderbolt hotplug issues plaguing various Lenovo laptops, without
risking regressions elsewhere.

I'm starting to think that going the DMI quirk route here is not
such a bad idea after all...

Regards,

Hans

 


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-04-11  9:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <623c13ec.1c69fb81.8cbdb.5a7a@mx.google.com>
2022-03-24 17:52 ` next/master bisection: baseline.login on asus-C523NA-A20057-coral Mark Brown
2022-03-24 20:34   ` Hans de Goede
2022-03-24 22:19     ` Mark Brown
2022-03-28 12:54       ` Hans de Goede
2022-03-29 18:44         ` Guillaume Tucker
2022-04-04 19:44           ` Guillaume Tucker
2022-04-05  8:13             ` Hans de Goede
2022-04-05 17:57             ` Bjorn Helgaas
     [not found]           ` <16E2C910B4947F17.5433@groups.io>
2022-04-04 19:48             ` Guillaume Tucker
2022-03-30 11:35         ` Bjorn Helgaas
2022-04-04  8:45           ` Hans de Goede
2022-04-06  0:19             ` Bjorn Helgaas
2022-04-11  9:54               ` Hans de Goede
2022-04-11  9:57                 ` Hans de Goede
2022-03-24 23:08     ` Bjorn Helgaas
2022-03-29 22:14   ` Bjorn Helgaas
2022-04-05 23:53   ` Bjorn Helgaas
2022-04-06 18:59     ` Bjorn Helgaas
2022-04-06 19:37       ` Mark Brown
2022-04-06 20:11         ` Guillaume Tucker
2022-04-07 15:17           ` Denys Fedoryshchenko
2022-04-06 20:56         ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).