qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
To: qemu-devel@nongnu.org
Cc: pedro.principeza@canonical.com, ehabkost@redhat.com,
	dann.frazier@canonical.com, dgilbert@redhat.com,
	christian.ehrhardt@canonical.com, kraxel@redhat.com,
	lersek@redhat.com, fw@gpiccoli.net
Subject: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture
Date: Tue, 16 Jun 2020 12:16:16 -0300	[thread overview]
Message-ID: <99779e9c-f05f-501b-b4be-ff719f140a88@canonical.com> (raw)

Hello folks, I'd like to start a discussion (or bump it, in case it was
already discussed) about an "issue", or better saying, a limitation
we've been observing (and receiving reports) on qemu/ovmf with regards
to the PCI passthrough of large BAR devices.

After OVMF commit 7e5b1b670c38 ("OvmfPkg: PlatformPei: determine the
64-bit PCI host aperture for X64 DXE"), the PCI 64-bit aperture is a
hardcoded value passed to the guest via ACPI CRS that, in practical
terms does not allow 32G+ BAR PCI devices to be correctly passthrough'ed
to guests.

There was a very informative discussion on edk2 groups [0] started by my
colleague Dann, to which some edk2 and qemu developers responded with a
good amount of information and rationale about this limitation, and the
problems that increasing such limit would bring. All the colleagues that
responded in that group discussion are hereby CC'ed.

The summary (in my understanding) is:

- The main reasoning for the current limitation is to make it simple; we
need to take into account the 64-bit aperture in order to accomplish
memory mapping on OVMF, and for common scenarios the current limit of
32G accommodates the majority of use cases.

- On top of it, increasing the 64-bit aperture will incur in the
increase of the memory required for OVMF-calculated PEI (Pre-EFI
Initialization) page tables.

- The current aperture also accounts for the 36-bit CPU physical bits
(PCPU) common in old processors and in some qemu generic vcpus, and this
"helps" with live migration, since 36-bit seems to be the LCD (lowest
common denominator) between all processors (for 64-bit architectures),
hence the limiting PCI64 aperture wouldn't be yet another factor that
makes live migration difficult or impossible.

- Finally, there's an _experimental_ parameter to allow some users'
flexibility on PCI64 aperture calculation: "X-PciMmio64Mb".

The point is we have more and more devices out there with bigger BARs
(mostly GPUs), that either exceed 32G by themselves or are almost there
(16G) and if users want to pass-through such devices, OVMF doesn't allow
that. Relying on "X-PciMmio64Mb" is problematic due to the
experimental/unstable nature of such parameter.

Linux kernel allows bypassing ACPI CRS with "pci=nocrs", some discussion
about that on [1]. But other OSes may not have such option, effectively
preventing the PCI-PT of such large devices to succeed or forcing user
to rely on the experimental parameter.

I'd like to discuss here a definitive solution; I've started this
discussion on Tianocore bugzilla [2], but Laszlo wisely suggested us to
move here to gather input from qemu community.
Currently I see 2 options, being (a) my preferred one:

(a) We could rely in the guest physbits to calculate the PCI64 aperture.
If the users are doing host bits' passthrough (or setting the physbits
manually through -phys-bits), they are already risking a live migration
failure. Also, if the users are not setting the physbits in the guest,
there must be a default (seems to be 40bit according to my experiments),
seems to be a good idea to rely on that.
If guest physbits is 40, why to have OVMF limiting it to 36, right?

(b) Making the experimental "X-PciMmio64Mb" not experimental anymore is
also an option, allowing users to rely on it without the risk of support

Please let me know your thoughts on such limitation and how we could
improve it. Other ideas are also welcome, of course. Thanks for the


[0] edk2.groups.io/g/discuss/topic/ovmf_resource_assignment/59340711
[1] bugs.launchpad.net/bugs/1849563
[2] bugzilla.tianocore.org/show_bug.cgi?id=2796

             reply	other threads:[~2020-06-16 15:24 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16 15:16 Guilherme G. Piccoli [this message]
2020-06-16 16:50 ` Gerd Hoffmann
2020-06-16 16:57   ` Dr. David Alan Gilbert
2020-06-16 17:10     ` Eduardo Habkost
2020-06-17  8:17       ` Christophe de Dinechin
2020-06-17 16:25         ` Eduardo Habkost
2020-06-17  8:50       ` Daniel P. Berrangé
2020-06-17 10:28         ` Dr. David Alan Gilbert
2020-06-17 14:11         ` Eduardo Habkost
2020-06-16 17:10     ` Gerd Hoffmann
2020-06-16 17:16       ` Dr. David Alan Gilbert
2020-06-16 17:14     ` Guilherme Piccoli
2020-06-17  6:40       ` Gerd Hoffmann
2020-06-17 13:25         ` Laszlo Ersek
2020-06-17 13:26         ` Laszlo Ersek
2020-06-17 13:22       ` Laszlo Ersek
2020-06-17 13:43         ` Guilherme Piccoli
2020-06-17 15:57           ` Laszlo Ersek
2020-06-17 16:01             ` Guilherme Piccoli
2020-06-18  7:56               ` Laszlo Ersek
2020-06-17 13:46         ` Dr. David Alan Gilbert
2020-06-17 15:49           ` Eduardo Habkost
2020-06-17 15:57             ` Guilherme Piccoli
2020-06-17 16:33               ` Eduardo Habkost
2020-06-17 16:40                 ` Guilherme Piccoli
2020-06-18  8:00                 ` Laszlo Ersek
2020-06-17 16:04             ` Dr. David Alan Gilbert
2020-06-17 16:17               ` Daniel P. Berrangé
2020-06-17 16:22                 ` Eduardo Habkost
2020-06-17 16:41                   ` Dr. David Alan Gilbert
2020-06-17 17:17                     ` Daniel P. Berrangé
2020-06-17 17:23                       ` Dr. David Alan Gilbert
2020-06-17 16:28               ` Eduardo Habkost
2020-06-19 16:13               ` Dr. David Alan Gilbert
2020-06-17 16:14           ` Laszlo Ersek
2020-06-17 16:43             ` Laszlo Ersek
2020-06-17 17:02               ` Eduardo Habkost
2020-06-18  8:29                 ` Laszlo Ersek
2020-06-17  8:16   ` Christophe de Dinechin
2020-06-17 10:12     ` Gerd Hoffmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99779e9c-f05f-501b-b4be-ff719f140a88@canonical.com \
    --to=gpiccoli@canonical.com \
    --cc=christian.ehrhardt@canonical.com \
    --cc=dann.frazier@canonical.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=fw@gpiccoli.net \
    --cc=kraxel@redhat.com \
    --cc=lersek@redhat.com \
    --cc=pedro.principeza@canonical.com \
    --cc=qemu-devel@nongnu.org \
    --subject='Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture' \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).