All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"David Edmondson" <david.edmondson@oracle.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Suravee Suthikulpanit" <suravee.suthikulpanit@amd.com>,
	"Ani Sinha" <ani@anisinha.ca>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Joao Martins" <joao.m.martins@oracle.com>
Subject: Re: [PATCH RFCv2 2/4] i386/pc: relocate 4g start to 1T where applicable
Date: Tue, 22 Feb 2022 09:30:43 +0000	[thread overview]
Message-ID: <YhStQ1SVY9YhMJpp@work-vm> (raw)
In-Reply-To: <20220222094602.66d55613@redhat.com>

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Mon, 21 Feb 2022 13:15:40 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Tue, Feb 15, 2022 at 10:53:58AM +0100, Gerd Hoffmann wrote:  
> > > >   Hi,
> > > >   
> > > > > I don't know what behavior should be if firmware tries to program
> > > > > PCI64 hole beyond supported phys-bits.  
> > > > 
> > > > Well, you are basically f*cked.
> > > > 
> > > > Unfortunately there is no reliable way to figure what phys-bits actually
> > > > is.  Because of that the firmware (both seabios and edk2) tries to place
> > > > the pci64 hole as low as possible.
> > > > 
> > > > The long version:
> > > > 
> > > > qemu advertises phys-bits=40 to the guest by default.  Probably because
> > > > this is what the first amd opteron processors had, assuming that it
> > > > would be a safe default.  Then intel came, releasing processors with
> > > > phys-bits=36, even recent (desktop-class) hardware has phys-bits=39.
> > > > Boom.
> > > > 
> > > > End result is that edk2 uses a 32G pci64 window by default, which is
> > > > placed at the first 32G border beyond normal ram.  So for virtual
> > > > machines with up to ~ 30G ram (including reservations for memory
> > > > hotplug) the pci64 hole covers 32G -> 64G in guest physical address
> > > > space, which is low enough that it works on hardware with phys-bits=36.
> > > > 
> > > > If your VM has more than 32G of memory the pci64 hole will move and
> > > > phys-bits=36 isn't enough any more, but given that you probably only do
> > > > that on more beefy hosts which can take >= 64G of RAM and have a larger
> > > > physical address space this heuristic works good enough in practice.
> > > > 
> > > > Changing phys-bits behavior has been discussed on and off since years.
> > > > It's tricky to change for live migration compatibility reasons.
> > > > 
> > > > We got the host-phys-bits and host-phys-bits-limit properties, which
> > > > solve some of the phys-bits problems.
> > > > 
> > > >  * host-phys-bits=on makes sure the phys-bits advertised to the guest
> > > >    actually works.  It's off by default though for backward
> > > >    compatibility reasons (except microvm).  Also because turning it on
> > > >    breaks live migration of machines between hosts with different
> > > >    phys-bits.  
> > > 
> > > RHEL has shipped with host-phys-bits=on in its machine types
> > > sinec RHEL-7. If it is good enough for RHEL machine types
> > > for 8 years, IMHO, it is a sign that its reasonable to do the
> > > same with upstream for new machine types.  
> > 
> > And the upstream code is now pretty much identical except for the
> > default;  note that for TCG you do need to keep to 40 I think.
> 
> will TCG work with 40bits on host that supports less than that?
> 
> Also quick look at host-phys-bits shows that it affects only 'host'
> cpu model and is NOP for all other models.
> If it's so than we probably need to expand it's scope to other cpu
> models to cap them at actually supported range.

(We shouldn't really bring TCG oddities into this series!)

As I remember it effectively gets it from the accelerator, and TCG being
portable, there's no portable way of reading the phys-bits.

Whether it would work, hmm.  I'm assuming the host OS would stop you
allocating a huge ram block, so it shouldn't break from that.
But then the guest address translation is done in software, not using
the host MMU, so I think the guests view of addressing should be able
to be larger than the host. (Unless you try things like vfio/iommu on
tcg, which I'm told does work in some combos).

Dave


> > 
> > Dave
> > >   
> > > >  * host-phys-bits-limit can be used to tweak phys-bits to
> > > >    be lower than what the host supports.  Which can be used for
> > > >    live migration compatibility, i.e. if you have a pool of machines
> > > >    where some have 36 and some 39 you can limit phys-bits to 36 so
> > > >    live migration from 39 hosts to 36 hosts works.  
> > > 
> > > RHEL machine types have set this to host-phys-bits-limit=48
> > > since RHEL-8 days, to avoid accidentally enabling 5-level
> > > paging in guests without explicit user opt-in.
> > >   
> > > > What is missing:
> > > > 
> > > >  * Some way for the firmware to get a phys-bits value it can actually
> > > >    use.  One possible way would be to have a paravirtual bit somewhere
> > > >    telling whenever host-phys-bits is enabled or not.  
> > > 
> > > 
> > > Regards,
> > > Daniel
> > > -- 
> > > |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> > > |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> > > |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> > > 
> > >   
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2022-02-22  9:34 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-07 20:24 [PATCH RFCv2 0/4] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU Joao Martins
2022-02-07 20:24 ` [PATCH RFCv2 1/4] hw/i386: add 4g boundary start to X86MachineState Joao Martins
2022-02-14 13:19   ` Igor Mammedov
2022-02-14 13:21     ` Joao Martins
2022-02-07 20:24 ` [PATCH RFCv2 2/4] i386/pc: relocate 4g start to 1T where applicable Joao Martins
2022-02-14 14:53   ` Igor Mammedov
2022-02-14 15:05     ` Joao Martins
2022-02-14 15:31       ` Igor Mammedov
2022-02-15  9:53         ` Gerd Hoffmann
2022-02-15 19:37           ` Joao Martins
2022-02-16  8:19             ` Gerd Hoffmann
2022-02-16 11:54               ` Joao Martins
2022-02-16 12:32                 ` Gerd Hoffmann
2022-02-16  9:51           ` Daniel P. Berrangé
2022-02-21 13:15             ` Dr. David Alan Gilbert
2022-02-22  8:46               ` Igor Mammedov
2022-02-22  9:30                 ` Dr. David Alan Gilbert [this message]
2022-02-22  9:42                 ` Gerd Hoffmann
2022-02-23  8:43                   ` Igor Mammedov
2022-02-23  9:16                     ` Dr. David Alan Gilbert
2022-02-23  9:31                       ` Igor Mammedov
2022-02-18 17:12         ` Joao Martins
2022-02-21  6:58           ` Igor Mammedov
2022-02-21 15:28             ` Joao Martins
2022-02-22 11:00               ` Joao Martins
2022-02-23  8:38                 ` Igor Mammedov
2022-02-07 20:24 ` [PATCH RFCv2 3/4] i386/pc: warn if phys-bits is too low Joao Martins
2022-02-14 13:15   ` David Edmondson
2022-02-14 13:18     ` Joao Martins
2022-02-14 15:03   ` Igor Mammedov
2022-02-14 15:18     ` Joao Martins
2022-02-14 15:41       ` Igor Mammedov
2022-02-14 15:48         ` Joao Martins
2022-02-23 17:18       ` Joao Martins
2022-02-24  9:01         ` Igor Mammedov
2022-02-24  9:27           ` Joao Martins
2022-02-07 20:24 ` [PATCH RFCv2 4/4] i386/pc: Restrict AMD-only enforcing of valid IOVAs to new machine type Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YhStQ1SVY9YhMJpp@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=berrange@redhat.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david.edmondson@oracle.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=suravee.suthikulpanit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.