All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	kvm list <kvm@vger.kernel.org>,
	qemu-devel Developers <qemu-devel@nongnu.org>,
	Alexander Graf <agraf@suse.de>,
	Chris Wright <chrisw@sous-sol.org>,
	bharata@linux.vnet.ibm.com, Vaidyanathan S <svaidy@in.ibm.com>
Subject: Re: [RFC PATCH] Exporting Guest RAM information for NUMA binding
Date: Thu, 1 Dec 2011 18:36:23 +0100	[thread overview]
Message-ID: <20111201173623.GV23466@redhat.com> (raw)
In-Reply-To: <20111201172520.GA26737@in.ibm.com>

On Thu, Dec 01, 2011 at 10:55:20PM +0530, Dipankar Sarma wrote:
> On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote:
> > On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote:
> > > create the guest topology correctly and optimize for NUMA. This
> > > would work for us.
> > 
> > Even on the case of 1 guest that fits in one node, you're not going to
> > max out the full bandwidth of all memory channels with this.
> > 
> > qemu all can do with ms_mbind/tbind is to create a vtopology that
> > matches the hardware topology. It has these limits:
> > 
> > 1) requires all userland applications to be modified to scan either
> >    the physical topology if run on host, or the vtopology if run on
> >    guest to get the full benefit.
> 
> Not sure why you would need that. qemu can reflect the
> topology based on -numa specifications and the corresponding
> ms_tbind/mbind in FDT (in the case of Power, I guess ACPI
> tables for x86) and guest kernel would detect this virtualized
> topology. So there is no need for two types of topologies afaics.
> It will all be reflected in /sys/devices/system/node in the guest.

The point is: what a vtopology gives you? If you don't modify all apps
running in the guest to use it? vtopology on guest, helps exactly like
the topology on host -> very little unless you modify qemu on host to
use ms_tbind/mbind.

> > 2) breaks across live migration if host physical topology changes
> 
> That is indeed an issue. Either VM placement software needs to
> be really smart to migrate VMs that fit well or, more likely,
> we will have to find a way to make guest kernels aware of
> topology changes. But the latter has impact on userspace
> as well for applications that might have optimized for NUMA.

Making guest kernel aware about "memory" topology changes is going to
be a whole mess. Or at least harder than memory hotplug.

> I agree. Specifying NUMA topology for guest can result in
> sub-optimal performance in some cases, it is a tradeoff.

I see it more like a limit of this solution, which is a common limit
to the hard bindings than a tradeoff.

> Agreed.

Yep I just wanted to make clear the limits remains with this solution.

I'll try to teach knumad to detect thread<->memory affinity too with
some logic, we'll see how well that can work.

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	kvm list <kvm@vger.kernel.org>,
	qemu-devel Developers <qemu-devel@nongnu.org>,
	Alexander Graf <agraf@suse.de>,
	Chris Wright <chrisw@sous-sol.org>,
	bharata@linux.vnet.ibm.com, Vaidyanathan S <svaidy@in.ibm.com>
Subject: Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding
Date: Thu, 1 Dec 2011 18:36:23 +0100	[thread overview]
Message-ID: <20111201173623.GV23466@redhat.com> (raw)
In-Reply-To: <20111201172520.GA26737@in.ibm.com>

On Thu, Dec 01, 2011 at 10:55:20PM +0530, Dipankar Sarma wrote:
> On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote:
> > On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote:
> > > create the guest topology correctly and optimize for NUMA. This
> > > would work for us.
> > 
> > Even on the case of 1 guest that fits in one node, you're not going to
> > max out the full bandwidth of all memory channels with this.
> > 
> > qemu all can do with ms_mbind/tbind is to create a vtopology that
> > matches the hardware topology. It has these limits:
> > 
> > 1) requires all userland applications to be modified to scan either
> >    the physical topology if run on host, or the vtopology if run on
> >    guest to get the full benefit.
> 
> Not sure why you would need that. qemu can reflect the
> topology based on -numa specifications and the corresponding
> ms_tbind/mbind in FDT (in the case of Power, I guess ACPI
> tables for x86) and guest kernel would detect this virtualized
> topology. So there is no need for two types of topologies afaics.
> It will all be reflected in /sys/devices/system/node in the guest.

The point is: what a vtopology gives you? If you don't modify all apps
running in the guest to use it? vtopology on guest, helps exactly like
the topology on host -> very little unless you modify qemu on host to
use ms_tbind/mbind.

> > 2) breaks across live migration if host physical topology changes
> 
> That is indeed an issue. Either VM placement software needs to
> be really smart to migrate VMs that fit well or, more likely,
> we will have to find a way to make guest kernels aware of
> topology changes. But the latter has impact on userspace
> as well for applications that might have optimized for NUMA.

Making guest kernel aware about "memory" topology changes is going to
be a whole mess. Or at least harder than memory hotplug.

> I agree. Specifying NUMA topology for guest can result in
> sub-optimal performance in some cases, it is a tradeoff.

I see it more like a limit of this solution, which is a common limit
to the hard bindings than a tradeoff.

> Agreed.

Yep I just wanted to make clear the limits remains with this solution.

I'll try to teach knumad to detect thread<->memory affinity too with
some logic, we'll see how well that can work.

  reply	other threads:[~2011-12-01 17:36 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-29 18:45 [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding Bharata B Rao
2011-10-29 19:57 ` Alexander Graf
2011-10-29 19:57   ` [Qemu-devel] " Alexander Graf
2011-10-30  9:32   ` Vaidyanathan Srinivasan
2011-10-30  9:32     ` [Qemu-devel] " Vaidyanathan Srinivasan
2011-11-08 17:33   ` Chris Wright
2011-11-08 17:33     ` [Qemu-devel] " Chris Wright
2011-11-21 15:18     ` Bharata B Rao
2011-11-21 15:18       ` Bharata B Rao
2011-11-21 15:25       ` Peter Zijlstra
2011-11-21 15:25         ` [Qemu-devel] " Peter Zijlstra
2011-11-21 16:00         ` Bharata B Rao
2011-11-21 17:03           ` Peter Zijlstra
2011-11-21 17:03             ` [Qemu-devel] " Peter Zijlstra
2011-11-21 22:50             ` Chris Wright
2011-11-21 22:50               ` [Qemu-devel] " Chris Wright
2011-11-22  1:57               ` Anthony Liguori
2011-11-22  1:57                 ` Anthony Liguori
2011-11-22  1:51             ` Anthony Liguori
2011-11-22  1:51               ` Anthony Liguori
2011-11-23 15:03               ` Andrea Arcangeli
2011-11-23 15:03                 ` Andrea Arcangeli
2011-11-23 18:34                 ` Alexander Graf
2011-11-23 18:34                   ` Alexander Graf
2011-11-23 20:19                   ` Andrea Arcangeli
2011-11-23 20:19                     ` [Qemu-devel] " Andrea Arcangeli
2011-11-30 16:22                   ` Dipankar Sarma
2011-11-30 16:22                     ` Dipankar Sarma
2011-11-30 16:25                     ` Peter Zijlstra
2011-11-30 16:25                       ` [Qemu-devel] " Peter Zijlstra
2011-11-30 16:33                       ` Chris Wright
2011-11-30 16:33                         ` [Qemu-devel] " Chris Wright
2011-11-30 17:41                     ` Andrea Arcangeli
2011-11-30 17:41                       ` [Qemu-devel] " Andrea Arcangeli
2011-12-01 17:25                       ` Dipankar Sarma
2011-12-01 17:25                         ` Dipankar Sarma
2011-12-01 17:36                         ` Andrea Arcangeli [this message]
2011-12-01 17:36                           ` Andrea Arcangeli
2011-12-01 17:49                           ` Dipankar Sarma
2011-12-01 17:49                             ` Dipankar Sarma
2011-12-01 17:40                 ` Peter Zijlstra
2011-12-01 17:40                   ` Peter Zijlstra
2011-12-22 11:01                   ` Marcelo Tosatti
2011-12-22 11:01                     ` Marcelo Tosatti
2011-12-22 17:13                     ` Anthony Liguori
2011-12-22 17:13                       ` Anthony Liguori
2011-12-22 17:55                       ` Marcelo Tosatti
2011-12-22 17:55                         ` Marcelo Tosatti
2011-12-22 19:04                     ` Peter Zijlstra
2011-12-22 19:04                       ` [Qemu-devel] " Peter Zijlstra
2011-12-22 11:24                   ` Marcelo Tosatti
2011-12-22 11:24                     ` [Qemu-devel] " Marcelo Tosatti
2011-11-21 18:03         ` Avi Kivity
2011-11-21 18:03           ` [Qemu-devel] " Avi Kivity
2011-11-21 19:31           ` Peter Zijlstra
2011-11-21 19:31             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111201173623.GV23466@redhat.com \
    --to=aarcange@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=agraf@suse.de \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=chrisw@sous-sol.org \
    --cc=dipankar@in.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=svaidy@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.