All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: Zihan Yang <whois.zihan.yang@gmail.com>,
	qemu-devel@nongnu.org, Igor Mammedov <imammedo@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Eric Auger <eauger@redhat.com>, Drew Jones <drjones@redhat.com>,
	Wei Huang <wei@redhat.com>
Subject: Re: [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges
Date: Wed, 23 May 2018 19:25:04 +0200	[thread overview]
Message-ID: <aa3e1859-bfd0-7fc6-bc9e-c35dd68588a8@redhat.com> (raw)
In-Reply-To: <6b3efcdf-c953-b2bc-e8f7-ac172f143d07@gmail.com>

On 05/23/18 19:11, Marcel Apfelbaum wrote:
> On 05/23/2018 10:32 AM, Laszlo Ersek wrote:
>> On 05/23/18 01:40, Michael S. Tsirkin wrote:
>>> On Wed, May 23, 2018 at 12:42:09AM +0200, Laszlo Ersek wrote:

>>>> If we figure out a placement strategy or an easy to consume
>>>> representation of these data for the firmware, it might be possible
>>>> for OVMF to hook them into the edk2 core (although not in the
>>>> earliest firmware phases, such as SEC and PEI).
>
> Can you please remind me how OVMF places the 64-bit PCI hotplug
> window?

If you mean the 64-bit PCI MMIO aperture, I described it here in detail:

  https://bugzilla.redhat.com/show_bug.cgi?id=1353591#c8

I'll also quote it inline, before returning to your email:

On 03/26/18 16:10, bugzilla@redhat.com wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=1353591
>
> Laszlo Ersek <lersek@redhat.com> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>               Flags|needinfo?(lersek@redhat.com |
>                    |)                           |
>
>
>
> --- Comment #8 from Laszlo Ersek <lersek@redhat.com> ---
> Sure, I can attempt :) The function to look at is GetFirstNonAddress()
> in "OvmfPkg/PlatformPei/MemDetect.c". I'll try to write it up here in
> natural language (although I commented the function heavily as well).
>
> As an introduction, the "number of address bits" is a quantity that
> the firmware itself needs to know, so that in the DXE phase page
> tables exist that actually map that address space. The
> GetFirstNonAddress() function (in the PEI phase) calculates the
> highest *exclusive* address that the firmware might want or need to
> use (in the DXE phase).
>
> (1) First we get the highest exclusive cold-plugged RAM address.
> (There are two methods for this, the more robust one is to read QEMU's
> E820 map, the older / less robust one is to calculate it from the
> CMOS.) If the result would be <4GB, then we take exactly 4GB from this
> step, because the firmware always needs to be able to address up to
> 4GB. Note that this is already somewhat non-intuitive; for example, if
> you have 4GB of RAM (as in, *amount*), it will go up to 6GB in the
> guest-phys address space (because [0x8000_0000..0xFFFF_FFFF] is not
> RAM but MMIO on q35).
>
> (2) If the DXE phase is 32-bit, then we're done. (No addresses >=4GB
> can be accessed, either for RAM or MMIO.) For RHEL this is never the
> case.
>
> (3) Grab the size of the 64-bit PCI MMIO aperture. This defaults to
> 32GB, but a custom (OVMF specific) fw_cfg file (from the QEMU command
> line) can resize it or even disable it. This aperture is relevant
> because it's going to be the top of the address space that the
> firmware is interested in. If the aperture is disabled (on the QEMU
> cmdline), then we're done, and only the value from point (1) matters
> -- that determines the address width we need.
>
> (4) OK, so we have a 64-bit PCI MMIO aperture (for allocating BARs out
> of, later); we have to place it somewhere. The base cannot match the
> value from (1) directly, because that would not leave room for the
> DIMM hotplug area. So the end of that area is read from the fw_cfg
> file "etc/reserved-memory-end". DIMM hotplug is enabled iff
> "etc/reserved-memory-end" exists. If "etc/reserved-memory-end" exists,
> then it is guaranteed to be larger than the value from (1) -- i.e.,
> top of cold-plugged RAM.
>
> (5) We round up the size of the 64-bit PCI aperture to 1GB. We also
> round up the base of the same -- i.e., from (4) or (1), as appropriate
> -- to 1GB. This is inspired by SeaBIOS, because this lets the host map
> the aperture with 1GB hugepages.
>
> (6) The base address of the aperture is then rounded up so that it
> ends up aligned "naturally". "Natural" alignment means that we take
> the largest whole power of two (i.e., BAR size) that can fit *within*
> the aperture (whose size comes from (3) and (5)) and use that BAR size
> as alignment requirement. This is because the PciBusDxe driver sorts
> the BARs in decreasing size order (and equivalently, decreasing
> alignment order), for allocation in increasing address order, so if
> our aperture base is aligned sufficiently for the largest BAR that can
> theoretically fit into the aperture, then the base will be aligned
> correctly for *any* other BAR that fits.
>
> For example, if you have a 32GB aperture size, then the largest BAR
> that can fit is 32GB, so the alignment requirement in step (6) will be
> 32GB. Whereas, if the user configures a 48GB aperture size in (3),
> then your alignment will remain 32GB in step (6), because a 64GB BAR
> would not fit, and a 32GB BAR (which fits) dictates a 32GB alignment.
>
> Thus we have the following "ladder" of ranges:
>
> (a) cold-plugged RAM (low, <2GB)
> (b) 32-bit PCI MMIO aperture, ECAM/MMCONFIG, APIC, pflash, etc (<4GB)
> (c) cold-plugged RAM (high, >=4GB)
> (d) DIMM hot-plug area
> (e) padding up to 1GB alignment (for hugepages)
> (f) padding up to the natural alignment of the 64-bit PCI MMIO
>    aperture size (32GB by default)
> (g) 64-bit PCI MMIO aperture
>
> To my understanding, "maxmem" determines the end of (d). And, the
> address width is dictated by the end of (g).
>
> Two more examples.
>
> - If you have 36 phys address bits, that doesn't let you use
>   maxmem=32G. This is because maxmem=32G puts the end of the DIMM
>   hotplug area (d) strictly *above* 32GB (due to the "RAM gap" (b)),
>   and then the padding (f) places the 64-bit PCI MMIO aperture at
>   64GB. So 36 phys address bits don't suffice.
>
> - On the other hand, if you have 37 phys address bits, that *should*
>   let you use maxmem=64G. While the DIMM hot-plug area will end
>   strictly above 64GB, the 64-bit PCI MMIO aperture (of size 32GB) can
>   be placed at 96GB, so it will all fit into 128GB (i.e. 37 address
>   bits).
>
> Sorry if this is confusing, I got very little sleep last night.
>

Back to your email:

On 05/23/18 19:11, Marcel Apfelbaum wrote:
> I think we may be able to succeed with "standard" APCI declarations of
> the PCI segments + placing the extra MMCONFIG ranges before the 64-bit
> PCI hotplug area.

That idea could work, but firmware will need hints about it.

Thanks!
Laszlo

  reply	other threads:[~2018-05-23 17:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-20  7:28 [Qemu-devel] [RFC 0/3] pci_expander_brdige: Put pxb host bridge into separate pci domain Zihan Yang
2018-05-20  7:28 ` [Qemu-devel] [RFC 1/3] pci_expander_bridge: reserve enough mcfg space for pxb host Zihan Yang
2018-05-21 11:03   ` Marcel Apfelbaum
2018-05-22  5:59     ` Zihan Yang
2018-05-22 18:47       ` Marcel Apfelbaum
2018-05-20  7:28 ` [Qemu-devel] [RFC 2/3] pci: Link pci_host_bridges with QTAILQ Zihan Yang
2018-05-21 11:05   ` Marcel Apfelbaum
2018-05-22  5:59     ` Zihan Yang
2018-05-22 18:39       ` Marcel Apfelbaum
2018-05-20  7:28 ` [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges Zihan Yang
2018-05-21 11:53   ` Marcel Apfelbaum
2018-05-22  6:03     ` Zihan Yang
2018-05-22 18:43       ` Marcel Apfelbaum
2018-05-22  9:52     ` Laszlo Ersek
2018-05-22 19:01       ` Marcel Apfelbaum
2018-05-22 19:51         ` Laszlo Ersek
2018-05-22 20:58           ` Michael S. Tsirkin
2018-05-22 21:36             ` Alex Williamson
2018-05-22 21:44               ` Michael S. Tsirkin
2018-05-22 21:47                 ` Alex Williamson
2018-05-22 22:00                   ` Laszlo Ersek
2018-05-22 23:38                   ` Michael S. Tsirkin
2018-05-23  4:28                     ` Alex Williamson
2018-05-23 14:25                       ` Michael S. Tsirkin
2018-05-23 14:57                         ` Alex Williamson
2018-05-23 15:01                           ` Michael S. Tsirkin
2018-05-23 16:50                         ` Marcel Apfelbaum
2018-05-22 21:17           ` Alex Williamson
2018-05-22 21:22             ` Michael S. Tsirkin
2018-05-22 21:58               ` Laszlo Ersek
2018-05-22 21:50             ` Laszlo Ersek
2018-05-23 17:00             ` Marcel Apfelbaum
2018-05-22 22:42           ` Laszlo Ersek
2018-05-22 23:40             ` Michael S. Tsirkin
2018-05-23  7:32               ` Laszlo Ersek
2018-05-23 11:11                 ` Zihan Yang
2018-05-23 12:28                   ` Laszlo Ersek
2018-05-23 17:23                     ` Marcel Apfelbaum
2018-05-24  9:57                     ` Zihan Yang
2018-05-23 17:33                   ` Marcel Apfelbaum
2018-05-24 10:00                     ` Zihan Yang
2018-05-23 17:11                 ` Marcel Apfelbaum
2018-05-23 17:25                   ` Laszlo Ersek [this message]
2018-05-28 11:02                 ` Laszlo Ersek
2018-05-21 15:23 ` [Qemu-devel] [RFC 0/3] pci_expander_brdige: Put pxb host bridge into separate pci domain Marcel Apfelbaum
2018-05-22  6:04   ` Zihan Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa3e1859-bfd0-7fc6-bc9e-c35dd68588a8@redhat.com \
    --to=lersek@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=drjones@redhat.com \
    --cc=eauger@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wei@redhat.com \
    --cc=whois.zihan.yang@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.