All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eduardo Habkost <ehabkost@redhat.com>
To: Anthony Liguori <aliguori@us.ibm.com>
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org,
	Wanlong Gao <gaowanlong@cn.fujitsu.com>,
	andre.przywara@amd.com
Subject: Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes
Date: Fri, 14 Jun 2013 10:56:09 -0300	[thread overview]
Message-ID: <20130614135609.GM2895@otherpad.lan.raisama.net> (raw)
In-Reply-To: <87ppvptse7.fsf@codemonkey.ws>

On Thu, Jun 13, 2013 at 08:04:00PM -0500, Anthony Liguori wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > On Wed, Jun 05, 2013 at 07:57:42AM -0500, Anthony Liguori wrote:
> >> Wanlong Gao <gaowanlong@cn.fujitsu.com> writes:
> >> 
> >> > Add monitor command mem-nodes to show the huge mapped
> >> > memory nodes locations.
> >> >
> >> > (qemu) info mem-nodes
> >> > /proc/14132/fd/13: 00002aaaaac00000-00002aaaeac00000: node0
> >> > /proc/14132/fd/13: 00002aaaeac00000-00002aab2ac00000: node1
> >> > /proc/14132/fd/14: 00002aab2ac00000-00002aab2b000000: node0
> >> > /proc/14132/fd/14: 00002aab2b000000-00002aab2b400000: node1
> >> 
> >> This creates an ABI that we don't currently support.  Memory hotplug or
> >> a variety of things can break this mapping and then we'd have to provide
> >> an interface to describe that the mapping was broken.
> >
> > What do you mean by "breaking this mapping", exactly? Would the backing
> > file of existing guest RAM ever change? (It would require a memory copy
> > from one file to another, why would QEMU ever do that?)
> 
> Memory hot-add will change the mapping.  hot-remove (if ever
> implemented) would break it.

So, would the backing-file/offset of existing guest RAM ever change? (It
would require a memory copy from one file to another, why would QEMU
ever do that?)


[...]
> >
> > Does THP work with tmpfs, already?
> 
> No.

OK, that's a real problem.


> > If it does, people who doesn't want
> > hugetlbfs and want numa tuning to work with THP could just use tmpfs for
> > -mem-path.
> >
> >> 
> >> I had hoped that we would get proper userspace interfaces for describing
> >> memory groups but that appears to have stalled out.
> >
> > I would love to have it. But while we don't have it, sharing the
> > tmpfs/hugetlbfs backing files seem to work just fine as a mechanism to
> > let other tools manipulate guest memory policy.  We just need to let
> > external tools know where the backing files are.
> 
> Is this meant for numad?  Wouldn't you want numad to work without
> hugetlbfs?
> 
> You have to preallocate pages to hugetlbfs.  It's very difficult to use
> in practice.

If you don't want hugetlbfs you could use tmpfs, and set the policy on
the tmpfs files. What I am asking is: why do we need to ask the kernel
folks for interfaces to define and set policies on memory groups if we
can (in theory) do the exactly same using tmpfs and hugetlbfs files?

(But the fact that THP doesn't work with tmpfs is a real problem, as I
said above)

> >> 
> >> Does anyone know if this is still on the table?
> >> 
> >> If we can't get a proper kernel interface, then perhaps we need to add
> >> full libnuma support but that would really be unfortunate...
> >
> > Why isn't the "info mem-nodes" solution (I mean: not this version, but a
> > proper QMP version that exposes all the information we need) an
> > option?
> 
> We're exposing internal QEMU information (the HVA -> GPA mapping) as an
> external stable interface.

I never wanted to expose the HVA -> GPA mapping. What I want to expose
is:

 * The virtual-NUMA-node -> GPA-range mapping
 * The GPA -> mem-path file/offset mapping

(Alternatively, a simple virtual-NUMA-node -> mem-path file/offset
mapping would be enough, too)

We could even replace "mem-path file/offset mapping" with "memory
groups", if the kernel already had interfaces to deal with memory
groups.

-- 
Eduardo

  reply	other threads:[~2013-06-14 13:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05  3:58 [Qemu-devel] [PATCH 1/2] Add Linux libnuma detection Wanlong Gao
2013-06-05  3:58 ` [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes Wanlong Gao
2013-06-05 12:39   ` Eric Blake
2013-06-05 12:57   ` Anthony Liguori
2013-06-05 15:54     ` Eduardo Habkost
2013-06-06  9:30       ` Wanlong Gao
2013-06-06 16:15         ` Eduardo Habkost
2013-06-14  1:04       ` Anthony Liguori
2013-06-14 13:56         ` Eduardo Habkost [this message]
2013-06-05 13:46   ` Eduardo Habkost
2013-06-11  7:22     ` Wanlong Gao
2013-06-11 13:40       ` Eduardo Habkost
2013-06-13  1:40         ` Wanlong Gao
2013-06-13 12:50           ` Eduardo Habkost
2013-06-13 22:32             ` Paolo Bonzini
2013-06-14  1:05               ` Anthony Liguori
2013-06-14  1:16                 ` Wanlong Gao
2013-06-15 17:23                   ` Paolo Bonzini
2013-06-05 10:02 ` [Qemu-devel] [PATCH 1/2] Add Linux libnuma detection Andreas Färber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130614135609.GM2895@otherpad.lan.raisama.net \
    --to=ehabkost@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=andre.przywara@amd.com \
    --cc=gaowanlong@cn.fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.