All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Gerd Hoffmann <kraxel@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org, Daniel Jordan <daniel.m.jordan@oracle.com>,
	David Edmondson <david.edmondson@oracle.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Ani Sinha <ani@anisinha.ca>,
	Igor Mammedov <imammedo@redhat.com>,
	Joao Martins <joao.m.martins@oracle.com>
Subject: Re: [PATCH RFCv2 2/4] i386/pc: relocate 4g start to 1T where applicable
Date: Wed, 16 Feb 2022 09:51:15 +0000	[thread overview]
Message-ID: <YgzJE7ufEYm6OFyg@redhat.com> (raw)
In-Reply-To: <20220215095358.5qcrgwlasheu63uj@sirius.home.kraxel.org>

On Tue, Feb 15, 2022 at 10:53:58AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > I don't know what behavior should be if firmware tries to program
> > PCI64 hole beyond supported phys-bits.
> 
> Well, you are basically f*cked.
> 
> Unfortunately there is no reliable way to figure what phys-bits actually
> is.  Because of that the firmware (both seabios and edk2) tries to place
> the pci64 hole as low as possible.
> 
> The long version:
> 
> qemu advertises phys-bits=40 to the guest by default.  Probably because
> this is what the first amd opteron processors had, assuming that it
> would be a safe default.  Then intel came, releasing processors with
> phys-bits=36, even recent (desktop-class) hardware has phys-bits=39.
> Boom.
> 
> End result is that edk2 uses a 32G pci64 window by default, which is
> placed at the first 32G border beyond normal ram.  So for virtual
> machines with up to ~ 30G ram (including reservations for memory
> hotplug) the pci64 hole covers 32G -> 64G in guest physical address
> space, which is low enough that it works on hardware with phys-bits=36.
> 
> If your VM has more than 32G of memory the pci64 hole will move and
> phys-bits=36 isn't enough any more, but given that you probably only do
> that on more beefy hosts which can take >= 64G of RAM and have a larger
> physical address space this heuristic works good enough in practice.
> 
> Changing phys-bits behavior has been discussed on and off since years.
> It's tricky to change for live migration compatibility reasons.
> 
> We got the host-phys-bits and host-phys-bits-limit properties, which
> solve some of the phys-bits problems.
> 
>  * host-phys-bits=on makes sure the phys-bits advertised to the guest
>    actually works.  It's off by default though for backward
>    compatibility reasons (except microvm).  Also because turning it on
>    breaks live migration of machines between hosts with different
>    phys-bits.

RHEL has shipped with host-phys-bits=on in its machine types
sinec RHEL-7. If it is good enough for RHEL machine types
for 8 years, IMHO, it is a sign that its reasonable to do the
same with upstream for new machine types.


>  * host-phys-bits-limit can be used to tweak phys-bits to
>    be lower than what the host supports.  Which can be used for
>    live migration compatibility, i.e. if you have a pool of machines
>    where some have 36 and some 39 you can limit phys-bits to 36 so
>    live migration from 39 hosts to 36 hosts works.

RHEL machine types have set this to host-phys-bits-limit=48
since RHEL-8 days, to avoid accidentally enabling 5-level
paging in guests without explicit user opt-in.

> What is missing:
> 
>  * Some way for the firmware to get a phys-bits value it can actually
>    use.  One possible way would be to have a paravirtual bit somewhere
>    telling whenever host-phys-bits is enabled or not.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  parent reply	other threads:[~2022-02-16 10:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-07 20:24 [PATCH RFCv2 0/4] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU Joao Martins
2022-02-07 20:24 ` [PATCH RFCv2 1/4] hw/i386: add 4g boundary start to X86MachineState Joao Martins
2022-02-14 13:19   ` Igor Mammedov
2022-02-14 13:21     ` Joao Martins
2022-02-07 20:24 ` [PATCH RFCv2 2/4] i386/pc: relocate 4g start to 1T where applicable Joao Martins
2022-02-14 14:53   ` Igor Mammedov
2022-02-14 15:05     ` Joao Martins
2022-02-14 15:31       ` Igor Mammedov
2022-02-15  9:53         ` Gerd Hoffmann
2022-02-15 19:37           ` Joao Martins
2022-02-16  8:19             ` Gerd Hoffmann
2022-02-16 11:54               ` Joao Martins
2022-02-16 12:32                 ` Gerd Hoffmann
2022-02-16  9:51           ` Daniel P. Berrangé [this message]
2022-02-21 13:15             ` Dr. David Alan Gilbert
2022-02-22  8:46               ` Igor Mammedov
2022-02-22  9:30                 ` Dr. David Alan Gilbert
2022-02-22  9:42                 ` Gerd Hoffmann
2022-02-23  8:43                   ` Igor Mammedov
2022-02-23  9:16                     ` Dr. David Alan Gilbert
2022-02-23  9:31                       ` Igor Mammedov
2022-02-18 17:12         ` Joao Martins
2022-02-21  6:58           ` Igor Mammedov
2022-02-21 15:28             ` Joao Martins
2022-02-22 11:00               ` Joao Martins
2022-02-23  8:38                 ` Igor Mammedov
2022-02-07 20:24 ` [PATCH RFCv2 3/4] i386/pc: warn if phys-bits is too low Joao Martins
2022-02-14 13:15   ` David Edmondson
2022-02-14 13:18     ` Joao Martins
2022-02-14 15:03   ` Igor Mammedov
2022-02-14 15:18     ` Joao Martins
2022-02-14 15:41       ` Igor Mammedov
2022-02-14 15:48         ` Joao Martins
2022-02-23 17:18       ` Joao Martins
2022-02-24  9:01         ` Igor Mammedov
2022-02-24  9:27           ` Joao Martins
2022-02-07 20:24 ` [PATCH RFCv2 4/4] i386/pc: Restrict AMD-only enforcing of valid IOVAs to new machine type Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgzJE7ufEYm6OFyg@redhat.com \
    --to=berrange@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david.edmondson@oracle.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=suravee.suthikulpanit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.