qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Ard Biesheuvel <ardb@kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Jiahui Cen" <cenjiahui@huawei.com>,
	"Ard Biesheuvel" <ardb+tianocore@kernel.org>,
	qemu-devel@nongnu.org, "Bjorn Helgaas" <bhelgaas@google.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>
Subject: Re: aarch64 efi boot failures with qemu 6.0+
Date: Wed, 28 Jul 2021 07:03:35 -0700	[thread overview]
Message-ID: <80674caa-817a-8be0-2122-fe543ec08a50@roeck-us.net> (raw)
In-Reply-To: <CAMj1kXFi43BiaG3pheqDLp_uqFpiS327mMaoc-NOt3HuoS5xsw@mail.gmail.com>

On 7/28/21 6:25 AM, Ard Biesheuvel wrote:
> On Wed, 28 Jul 2021 at 15:11, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Tue, Jul 27, 2021 at 12:36:03PM +0200, Igor Mammedov wrote:
>>> On Tue, 27 Jul 2021 05:01:23 -0400
>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>
>>>> On Mon, Jul 26, 2021 at 10:12:38PM -0700, Guenter Roeck wrote:
>>>>> On 7/26/21 9:45 PM, Michael S. Tsirkin wrote:
>>>>>> On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:
>>>>>>> (cc Bjorn)
>>>>>>>
>>>>>>> On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On 7/26/21 12:56 AM, Guenter Roeck wrote:
>>>>>>>>> On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:
>>>>>>>>>> On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> starting with qemu v6.0, some of my aarch64 efi boot tests no longer
>>>>>>>>>>> work. Analysis shows that PCI devices with IO ports do not instantiate
>>>>>>>>>>> in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem affects
>>>>>>>>>>> (at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
>>>>>>>>>>> affects
>>>>>>>>>>> aarch64, not x86/x86_64.
>>>>>>>>>>>
>>>>>>>>>>> I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
>>>>>>>>>>> keep firmware resource map"). Since this commit, PCI device BAR
>>>>>>>>>>> allocation has changed. Taking tulip as example, the kernel reports
>>>>>>>>>>> the following PCI bar assignments when running qemu v5.2.
>>>>>>>>>>>
>>>>>>>>>>> [    3.921801] pci 0000:00:01.0: [1011:0019] type 00 class 0x020000
>>>>>>>>>>> [    3.922207] pci 0000:00:01.0: reg 0x10: [io  0x0000-0x007f]
>>>>>>>>>>> [    3.922505] pci 0000:00:01.0: reg 0x14: [mem 0x10000000-0x1000007f]
>>>>>>>
>>>>>>> IIUC, these lines are read back from the BARs
>>>>>>>
>>>>>>>>>>> [    3.927111] pci 0000:00:01.0: BAR 0: assigned [io  0x1000-0x107f]
>>>>>>>>>>> [    3.927455] pci 0000:00:01.0: BAR 1: assigned [mem
>>>>>>>>>>> 0x10000000-0x1000007f]
>>>>>>>>>>>
>>>>>>>
>>>>>>> ... and this is the assignment created by the kernel.
>>>>>>>
>>>>>>>>>>> With qemu v6.0, the assignment is reported as follows.
>>>>>>>>>>>
>>>>>>>>>>> [    3.922887] pci 0000:00:01.0: [1011:0019] type 00 class 0x020000
>>>>>>>>>>> [    3.923278] pci 0000:00:01.0: reg 0x10: [io  0x0000-0x007f]
>>>>>>>>>>> [    3.923451] pci 0000:00:01.0: reg 0x14: [mem 0x10000000-0x1000007f]
>>>>>>>>>>>
>>>>>>>
>>>>>>> The problem here is that Linux, for legacy reasons, does not support
>>>>>>> I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
>>>>>>> rejected.
>>>>>>>
>>>>>>> This might make sense on x86, where legacy I/O ports may exist, but on
>>>>>>> other architectures, this makes no sense.
>>>>>>
>>>>>>
>>>>>> Fixing Linux makes sense but OTOH EFI probably shouldn't create mappings
>>>>>> that trip up existing guests, right?
>>>>>>
>>>>>
>>>>> I think it is difficult to draw a line. Sure, maybe EFI should not create
>>>>> such mappings, but then maybe qemu should not suddenly start to enforce
>>>>> those mappings for existing guests either.
>>>>
>>>> I would say both. But about QEMU actually I think you have a point here.
>>>> Re-reading the spec:
>>>>
>>>> 0: No (The operating system shall not ignore the PCI configuration that firmware has done
>>>> at boot time. However, the operating system is free to configure the devices in this hierarchy
>>>> that have not been configured by the firmware. There may be a reduced level of hot plug
>>>> capability support in this hierarchy due to resource constraints. This situation is the same as
>>>> the legacy situation where this _DSM is not provided.)
>>>> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
>>>> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
>>>>
>>>>
>>>> I think I misread the spec previously, and understood it to mean that
>>>> 1 means must ignore. In fact 1 gives the most flexibility.
>>>> So why are we suddenly telling the guest it must not override
>>>> firmware?
>>>>
>>>> The commit log says
>>>>      The diffences could result in resource assignment failure.
>>>>
>>>> which is kind of vague ...
>>>>
>>>> Jiahui Cen, Igor, what do you think about it?
>>>> I'm inclined to revert 0cf8882fd06ba0aeb1e90fa6f23fce85504d7e14
>>>> at least for now.
>>> Looking at patch history, it seems consensus was that it's better to
>>> enforce firmware allocations.
>>>
>>> Also letting OS do as it pleases might break PCI devices that
>>> don't tolerate reallocation. ex: firmware initializes PCI device
>>> IO/BARs and then fetches ACPI tables, which get patched with
>>> assigned resources.
>>>
>>> to me returning 0 seems to be correct choice.
>>> In addition resource hinting also works via firmware allocations,
>>> if we revert the commit it might change those configs.
>>
>>
>> Well if firmware people now tell us their allocations were never
>> intended for guest OS use then maybe we should not intervene.
>>
> 
> DSM #5 was introduced to permit firmware running on x86_64 systems to
> boot 32-bit OSes (read Windows) unmodified, while still leaving
> enlightened, 64-bit OSes the opportunity to reorganize the BARs if
> there is sufficient space in the resource windows, and if the OS runs
> in long mode so it can address all of it.
> 
> This is why the default-if-absent according to the spec is '0', and I
> already explained up-thread why arm64 deviates from this.
> 
> But Igor has a point: there are cases where especially bus numbers
> should not be touched, as firmware tables consumed by the OS may carry
> b/d/f identifiers for things like SMMU pass through, where changing
> the bus numbers obviously invalidates this information.
> 
> These are exceptional cases, though, and I would argue that these
> should be considered individually, rather than setting DSM #5 to 0x0
> simply because there might be cases where not doing so could
> theoretically break things, given that doing so has proven to break
> things.
> 
> 
>> As others noted the original commit was kind of vague:
>>
>> 1. it said "Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>> that firmware has done at boot time could handle the differences."
>> which is not what the spec says and not what the patch did -
>> guest os does not ignore configuration even without this,
>> it is just allowed to change it.
>>
>>
>> 2. is says could result but does not report whether that happened in the
>> field.
>>
>>
>> Given this causes a regression I'm inclined to just revert for now.
>> We can figure it out for the next release.
>>
> 
> For a revert of commit 0cf8882fd06ba0aeb1e90fa6f23fce85504d7e14, feel
> free to include
> 
> Acked-by: Ard Biesheuvel <ardb@kernel.org>
> 

and:

Tested-by: Guenter Roeck <linux@roeck-us.net>

> and please also involve me if any future debates on this subject flare up again.
> 

Same here.

Thanks,
Guenter


  reply	other threads:[~2021-07-28 14:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-24 18:52 Guenter Roeck
2021-07-25 22:14 ` Michael S. Tsirkin
2021-07-25 22:56   ` Guenter Roeck
2021-07-26  9:08     ` Philippe Mathieu-Daudé
2021-07-26 16:00       ` Ard Biesheuvel
2021-07-26 21:16         ` Bjorn Helgaas
2021-07-26 21:31           ` Bjorn Helgaas
2021-07-27  4:22             ` Guenter Roeck
2021-07-27 14:25               ` Bjorn Helgaas
2021-07-27  4:45         ` Michael S. Tsirkin
2021-07-27  5:12           ` Guenter Roeck
2021-07-27  7:04             ` Ard Biesheuvel
2021-07-27  9:02               ` Michael S. Tsirkin
2021-07-27  9:30               ` Michael S. Tsirkin
2021-07-27  9:50                 ` Ard Biesheuvel
2021-07-27 10:07                   ` Michael S. Tsirkin
2021-07-27 10:14                     ` Ard Biesheuvel
2021-07-27 11:18                 ` Guenter Roeck
2021-07-27  9:01             ` Michael S. Tsirkin
2021-07-27 10:36               ` Igor Mammedov
2021-07-27 11:32                 ` Guenter Roeck
2021-07-28 13:11                 ` Michael S. Tsirkin
2021-07-28 13:25                   ` Ard Biesheuvel
2021-07-28 14:03                     ` Guenter Roeck [this message]
2021-07-29  8:08                       ` Philippe Mathieu-Daudé
2021-07-29 14:42                         ` Bjorn Helgaas
2021-07-29 15:59                           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80674caa-817a-8be0-2122-fe543ec08a50@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=ardb+tianocore@kernel.org \
    --cc=ardb@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=cenjiahui@huawei.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --subject='Re: aarch64 efi boot failures with qemu 6.0+' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).