All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Bolognani <abologna@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: groug@kaod.org, aik@ozlabs.ru, qemu-ppc@nongnu.org,
	qemu-devel@nongnu.org, clg@kaod.org
Subject: Re: [Qemu-devel] [RFC for-2.13 0/7] spapr: Clean up pagesize handling
Date: Fri, 20 Apr 2018 11:31:10 +0200	[thread overview]
Message-ID: <1524216670.3017.11.camel@redhat.com> (raw)
In-Reply-To: <20180420023542.GD2434@umbus.fritz.box>

On Fri, 2018-04-20 at 12:35 +1000, David Gibson wrote:
> On Thu, Apr 19, 2018 at 05:30:04PM +0200, Andrea Bolognani wrote:
> > On Thu, 2018-04-19 at 16:29 +1000, David Gibson wrote:
> > > This means that in order to use hugepages in a PAPR guest it's
> > > necessary to add a "cap-hpt-mps=24" machine parameter as well as
> > > setting the mem-path correctly.  This is a bit more work on the user
> > > and/or management side, but results in consistent behaviour so I think
> > > it's worth it.
> > 
> > libvirt guests already need to explicitly opt-in to hugepages, so
> > adding this new option automagically based on that shouldn't be too
> > difficult.
> 
> Right.  We have to be a bit careful with automagic though, because
> treating hugepage as a boolean is one of the problems that this
> parameter is there to address.
> 
> If libvirt were to set the parameter based on the pagesize of the
> hugepage mount, then it might not be consistent across a migration
> (e.g. p8 to p9).  Now the new code would at least catch that and
> safely fail the migration, but that might be confusing to users.

Good point.

I'll have to look into it to be sure, but I think it should be
possible for libvirt to convert a generic

  <memoryBacking>
    <hugepages/>
  </memoryBacking>

to a more specific

  <memoryBacking>
    <hugepages>
      <page size="16384" unit="KiB"/>
    </hugepages>
  </memoryBacking>

by figuring out the page size for the default hugepage mount,
which actually sounds like a good idea regardless. Of course users
user would still be able to provide the page size themselves in the
first place.

Is the 16 MiB page size available for both POWER8 and POWER9?

> > A couple of questions:
> > 
> >   * I see the option accepts values 12, 16, 24 and 34, with 16
> >     being the default.
> 
> In fact it should accept any value >= 12, though the ones that you
> list are the interesting ones.

Well, I copied them from the QEMU help text, and I kinda assumed
that you wouldn't just list completely random values there O:-)

> This does mean, for example, that if
> it was just set to the hugepage size on a p9, 21 (2MiB) things should
> work correctly (in practice it would act identically to setting it to
> 16).

Wouldn't that lead to different behavior depending on whether you
start the guest on a POWER9 or POWER8 machine? The former would be
able to use 2 MiB hugepages, while the latter would be stuck using
regular 64 KiB pages. Migration of such a guest from POWER9 to
POWER8 wouldn't work because the hugepage allocation couldn't be
fulfilled, but the other way around would probably work and lead to
different page sizes being available inside the guest after a power
cycle, no?

> > I guess 34 corresponds to 1 GiB hugepages?
> 
> No, 16GiB hugepages, which is the "colossal page" size on HPT POWER
> machines.  It's a simple shift, (1 << 34) == 16 GiB, 1GiB pages would
> be 30 (but wouldn't let the guest do any more than 24 ~ 16 MiB in
> practice).

Isn't 1 GiB hugepages support at least being worked on[1]?

> >     Also, in what scenario would 12 be used?
> 
> So RHEL, at least, generally configures ppc64 kernels to use 64kiB
> pages, but 4kiB pages are still supported upstream (not sure if there
> are any distros that still use that mode).  If your host uses 4kiB
> pages you wouldn't be able to start a (KVM HV) guest without setting
> this to 12 (or using a 64kiB hugepage mount).

Mh, that's annoying, as needing to support 4 KiB pages would most
likely mean we'd have to turn this into a stand-alone configuration
knob rather than deriving it entirely from existing ones, which I'd
prefer as it's clearly much more user-friendly.

I'll check out what other distros are doing: if all the major ones
are defaulting to 64 KiB pages these days, it might be reasonable
to do the same and pretend smaller page sizes don't exist at all in
order to avoid the pain of having to tweak yet another knob, even
if that means leaving people compiling their own custom kernels
with 4 KiB page size in the dust.

> >   * The name of the property suggests this setting is only relevant
> >     for HPT guests. libvirt doesn't really have the notion of HPT
> >     and RPT, and I'm not really itching to introduce it. Can we
> >     safely use this option for all guests, even RPT ones?
> 
> Yes.  The "hpt" in the main is meant to imply that its restriction
> only applies when the guest is in HPT mode, but it can be safely set
> in any mode.  In RPT mode guest and host pagesizes are independent of
> each other, so we don't have to deal with this mess.

Good :)


[1] https://patchwork.kernel.org/patch/9729991/
-- 
Andrea Bolognani / Red Hat / Virtualization

  reply	other threads:[~2018-04-20  9:31 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19  6:29 [Qemu-devel] [RFC for-2.13 0/7] spapr: Clean up pagesize handling David Gibson
2018-04-19  6:29 ` [Qemu-devel] [RFC for-2.13 1/7] spapr: Maximum (HPT) pagesize property David Gibson
2018-05-02 21:06   ` Murilo Opsfelder Araujo
2018-05-03  1:34     ` David Gibson
2018-04-19  6:29 ` [Qemu-devel] [RFC for-2.13 2/7] spapr: Use maximum page size capability to simplify memory backend checking David Gibson
2018-04-19  6:29 ` [Qemu-devel] [RFC for-2.13 3/7] target/ppc: Add ppc_hash64_filter_pagesizes() David Gibson
2018-05-03 15:57   ` Murilo Opsfelder Araujo
2018-05-04  6:30     ` David Gibson
2018-04-19  6:29 ` [Qemu-devel] [RFC for-2.13 4/7] spapr: Add cpu_apply hook to capabilities David Gibson
2018-04-19  6:29 ` [Qemu-devel] [RFC for-2.13 5/7] spapr: Limit available pagesizes to provide a consistent guest environment David Gibson
2018-04-19  6:29 ` [Qemu-devel] [RFC for-2.13 6/7] spapr: Don't rewrite mmu capabilities in KVM mode David Gibson
2018-04-19  6:29 ` [Qemu-devel] [RFC for-2.13 7/7] spapr_pci: Remove unhelpful pagesize warning David Gibson
2018-04-19 15:30 ` [Qemu-devel] [RFC for-2.13 0/7] spapr: Clean up pagesize handling Andrea Bolognani
2018-04-20  2:35   ` David Gibson
2018-04-20  9:31     ` Andrea Bolognani [this message]
2018-04-20 10:21       ` David Gibson
2018-04-23  8:31         ` Andrea Bolognani
2018-04-24  1:26           ` David Gibson
2018-04-24 15:35         ` Andrea Bolognani
2018-04-25  6:32           ` David Gibson
2018-04-25 16:09         ` Andrea Bolognani
2018-04-26  0:55           ` David Gibson
2018-04-26  8:45             ` Andrea Bolognani
2018-04-27  2:14               ` David Gibson
2018-04-27  8:31                 ` Andrea Bolognani
2018-04-27 12:17                   ` David Gibson
2018-05-07 13:48                     ` Andrea Bolognani
2018-06-14  1:52                       ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1524216670.3017.11.camel@redhat.com \
    --to=abologna@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.