qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: "Jiahui Cen" <cenjiahui@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Ard Biesheuvel" <ardb+tianocore@kernel.org>,
	qemu-devel@nongnu.org, "Bjorn Helgaas" <bhelgaas@google.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Guenter Roeck" <linux@roeck-us.net>
Subject: Re: aarch64 efi boot failures with qemu 6.0+
Date: Fri, 18 Mar 2022 11:48:00 +0000	[thread overview]
Message-ID: <YjRxTJINgRsGYuAH@lpieralisi> (raw)
In-Reply-To: <CAMj1kXHyV2Vp60AuqM+9a5jyW_K2=KNUp4NqyFNGBshmFmhkQg@mail.gmail.com>

On Tue, Jul 27, 2021 at 12:14:48PM +0200, Ard Biesheuvel wrote:
> (+ Lorenzo)
> 
> On Tue, 27 Jul 2021 at 12:07, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jul 27, 2021 at 11:50:23AM +0200, Ard Biesheuvel wrote:
> > > On Tue, 27 Jul 2021 at 11:30, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Jul 27, 2021 at 09:04:20AM +0200, Ard Biesheuvel wrote:
> > > > > On Tue, 27 Jul 2021 at 07:12, Guenter Roeck <linux@roeck-us.net> wrote:
> > > > > >
> > > > > > On 7/26/21 9:45 PM, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:
> > > > > > >> (cc Bjorn)
> > > > > > >>
> > > > > > >> On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> > > > > > >>>
> > > > > > >>> On 7/26/21 12:56 AM, Guenter Roeck wrote:
> > > > > > >>>> On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:
> > > > > > >>>>> On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:
> > > > > > >>>>>> Hi all,
> > > > > > >>>>>>
> > > > > > >>>>>> starting with qemu v6.0, some of my aarch64 efi boot tests no longer
> > > > > > >>>>>> work. Analysis shows that PCI devices with IO ports do not instantiate
> > > > > > >>>>>> in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem affects
> > > > > > >>>>>> (at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
> > > > > > >>>>>> affects
> > > > > > >>>>>> aarch64, not x86/x86_64.
> > > > > > >>>>>>
> > > > > > >>>>>> I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
> > > > > > >>>>>> keep firmware resource map"). Since this commit, PCI device BAR
> > > > > > >>>>>> allocation has changed. Taking tulip as example, the kernel reports
> > > > > > >>>>>> the following PCI bar assignments when running qemu v5.2.
> > > > > > >>>>>>
> > > > > > >>>>>> [    3.921801] pci 0000:00:01.0: [1011:0019] type 00 class 0x020000
> > > > > > >>>>>> [    3.922207] pci 0000:00:01.0: reg 0x10: [io  0x0000-0x007f]
> > > > > > >>>>>> [    3.922505] pci 0000:00:01.0: reg 0x14: [mem 0x10000000-0x1000007f]
> > > > > > >>
> > > > > > >> IIUC, these lines are read back from the BARs
> > > > > > >>
> > > > > > >>>>>> [    3.927111] pci 0000:00:01.0: BAR 0: assigned [io  0x1000-0x107f]
> > > > > > >>>>>> [    3.927455] pci 0000:00:01.0: BAR 1: assigned [mem
> > > > > > >>>>>> 0x10000000-0x1000007f]
> > > > > > >>>>>>
> > > > > > >>
> > > > > > >> ... and this is the assignment created by the kernel.
> > > > > > >>
> > > > > > >>>>>> With qemu v6.0, the assignment is reported as follows.
> > > > > > >>>>>>
> > > > > > >>>>>> [    3.922887] pci 0000:00:01.0: [1011:0019] type 00 class 0x020000
> > > > > > >>>>>> [    3.923278] pci 0000:00:01.0: reg 0x10: [io  0x0000-0x007f]
> > > > > > >>>>>> [    3.923451] pci 0000:00:01.0: reg 0x14: [mem 0x10000000-0x1000007f]
> > > > > > >>>>>>
> > > > > > >>
> > > > > > >> The problem here is that Linux, for legacy reasons, does not support
> > > > > > >> I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
> > > > > > >> rejected.
> > > > > > >>
> > > > > > >> This might make sense on x86, where legacy I/O ports may exist, but on
> > > > > > >> other architectures, this makes no sense.
> > > > > > >
> > > > > > >
> > > > > > > Fixing Linux makes sense but OTOH EFI probably shouldn't create mappings
> > > > > > > that trip up existing guests, right?
> > > > > > >
> > > > > >
> > > > > > I think it is difficult to draw a line. Sure, maybe EFI should not create
> > > > > > such mappings, but then maybe qemu should not suddenly start to enforce
> > > > > > those mappings for existing guests either.
> > > > > >
> > > > >
> > > > > EFI creates the mappings primarily for itself, and up until DSM #5
> > > > > started to be enforced, all PCI resource allocations that existed at
> > > > > boot were ignored by Linux and recreated from scratch.
> > > > >
> > > > > Also, the commit in question looks dubious to me. I don't think it is
> > > > > likely that Linux would fail to create a resource tree. What does
> > > > > happen is that BARs get moved around, which may cause trouble in some
> > > > > cases: for instance, we had to add special code to the EFI framebuffer
> > > > > driver to copy with framebuffer BARs being relocated.
> > > > >
> > > > > > For my own testing, I simply reverted commit 0cf8882fd0 in my copy of
> > > > > > qemu. That solves my immediate problem, giving us time to find a solution
> > > > > > that is acceptable for everyone. After all, it doesn't look like anyone
> > > > > > else has noticed the problem, so there is no real urgency.
> > > > > >
> > > > >
> > > > > I would argue that it is better to revert that commit. DSM #5 has a
> > > > > long history of debate and misinterpretation, and while I think we
> > > > > ended up with something sane, I don't think we should be using it in
> > > > > this particular case.
> > > >
> > > > I think revert might make sense, however:
> > > >
> > > > 0: No (The operating system shall not ignore the PCI configuration that firmware has done
> > > > at boot time. However, the operating system is free to configure the devices in this hierarchy
> > > > that have not been configured by the firmware. There may be a reduced level of hot plug
> > > > capability support in this hierarchy due to resource constraints. This situation is the same as
> > > > the legacy situation where this _DSM is not provided.)
> > > >
> > > > ^^^^ does not this imply that reporting a 0 as we currently do
> > > >      should be mostly a NOP?
> > > >
> > >
> > > Not really. The resource allocation strategies are different between
> > > EDK2 and Linux, and as Guenter's testing proves, EDK2 may lay out PCI
> > > resources in a way that interferes with Linux's expectations. The I/O
> > > port 0x0 problem is just one potential issue here: another issue is
> > > resource padding for hotplug, which is important for VMs, not only the
> > > IO/MEM resource allocations, but the bus ranges as well.
> >
> > Hmm not sure I understand the answer. The text above seems to say
> > that 0 should be the same as _DSM 5 is not provided, does it not?
> 
> That is what the spec says, but it has never been what Linux/arm64
> does. Its PCI arch code is based on 32-bit ARM, which uses simple
> bootloaders that are completely unaware of the existence of PCI, and
> so from the beginning, we have always reassigned all resources (but
> not bus numbers IIRC) because that is what ARM did. On arm64, of
> course, we often do have rich firmware that initializes PCI, but by
> that time, the cat was out of the bag already, and we could not simply
> stop reassigning resources without running a substantial regression
> risk, even when booting in ACPI mode.
> 
> So the default behavior on arm64 is '1' not '0' in terms of DSM #5.

So, to summarize, what happens here is that if the _DSM #5 returns
0, we try to keep the BAR content, which in Linux terms means we
"claim" the resources and succeed, correct ?

In other words: pci_bus_claim_resources(), for IO BARs succeeds,
it is not the claiming that fails but kernel code that tries to
access IO resources at 0x0 and deems it inappropriate on arm64.

The issue then is that the kernel rejects IO port addresses on PCI
starting at 0x0 - are we able to pinpoint where the actual bug
kicks in in this specific case ?

Thanks,
Lorenzo

> 
> > Why did behaviour change when we switched from not providing _DSM 5
> > to providing but returning 0?
> >
> >
> > > >
> > > > 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
> > > > at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> > > >
> > > >
> > > > So I am debating with myself whether this should be a plain revert or
> > > > return 1 here:
> > > >      /*
> > > >       * 0 - The operating system must not ignore the PCI configuration that
> > > >       *     firmware has done at boot time.
> > > >       */
> > > >      aml_append(ifctx1, aml_return(aml_int(0)));
> > > > -    aml_append(ifctx, ifctx1);
> > > > +    aml_append(ifctx1, aml_return(aml_int(1)));
> > > >      aml_append(method, ifctx);
> > > >
> > >
> > > I agree that returning '1' here is a better choice, as it explicitly
> > > gives the OS license to reassign all resources, which is what we have
> > > been relying on to begin with.
> > >
> > > OTOH, I do think we should fix arbitrary zero checks in Linux that
> > > make no sense on !x86
> > >
> > > >
> > > >
> > > > Guenter what happens if we return 1? Do things work well?
> > > >
> > > > --
> > > > MST
> > > >
> >


  reply	other threads:[~2022-03-18 11:51 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-24 18:52 aarch64 efi boot failures with qemu 6.0+ Guenter Roeck
2021-07-25 22:14 ` Michael S. Tsirkin
2021-07-25 22:56   ` Guenter Roeck
2021-07-26  9:08     ` Philippe Mathieu-Daudé
2021-07-26 16:00       ` Ard Biesheuvel
2021-07-26 21:16         ` Bjorn Helgaas
2021-07-26 21:31           ` Bjorn Helgaas
2021-07-27  4:22             ` Guenter Roeck
2021-07-27 14:25               ` Bjorn Helgaas
2021-07-27  4:45         ` Michael S. Tsirkin
2021-07-27  5:12           ` Guenter Roeck
2021-07-27  7:04             ` Ard Biesheuvel
2021-07-27  9:02               ` Michael S. Tsirkin
2021-07-27  9:30               ` Michael S. Tsirkin
2021-07-27  9:50                 ` Ard Biesheuvel
2021-07-27 10:07                   ` Michael S. Tsirkin
2021-07-27 10:14                     ` Ard Biesheuvel
2022-03-18 11:48                       ` Lorenzo Pieralisi [this message]
2021-07-27 11:18                 ` Guenter Roeck
2021-07-27  9:01             ` Michael S. Tsirkin
2021-07-27 10:36               ` Igor Mammedov
2021-07-27 11:32                 ` Guenter Roeck
2021-07-28 13:11                 ` Michael S. Tsirkin
2021-07-28 13:25                   ` Ard Biesheuvel
2021-07-28 14:03                     ` Guenter Roeck
2021-07-29  8:08                       ` Philippe Mathieu-Daudé
2021-07-29 14:42                         ` Bjorn Helgaas
2021-07-29 15:59                           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjRxTJINgRsGYuAH@lpieralisi \
    --to=lorenzo.pieralisi@arm.com \
    --cc=ardb+tianocore@kernel.org \
    --cc=ardb@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=cenjiahui@huawei.com \
    --cc=imammedo@redhat.com \
    --cc=linux@roeck-us.net \
    --cc=mst@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).