From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [RFC PATCH] Exporting Guest RAM information for NUMA binding Date: Thu, 1 Dec 2011 18:36:23 +0100 Message-ID: <20111201173623.GV23466@redhat.com> References: <20111121150054.GA3602@in.ibm.com> <1321889126.28118.5.camel@twins> <20111121160001.GB3602@in.ibm.com> <1321894980.28118.16.camel@twins> <4ECB0019.7020800@codemonkey.ws> <20111123150300.GH8397@redhat.com> <4ECD3CBD.7010902@suse.de> <20111130162237.GC27308@in.ibm.com> <20111130174113.GM23466@redhat.com> <20111201172520.GA26737@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Peter Zijlstra , kvm list , qemu-devel Developers , Alexander Graf , Chris Wright , bharata@linux.vnet.ibm.com, Vaidyanathan S To: Dipankar Sarma Return-path: Content-Disposition: inline In-Reply-To: <20111201172520.GA26737@in.ibm.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On Thu, Dec 01, 2011 at 10:55:20PM +0530, Dipankar Sarma wrote: > On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote: > > On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote: > > > create the guest topology correctly and optimize for NUMA. This > > > would work for us. > > > > Even on the case of 1 guest that fits in one node, you're not going to > > max out the full bandwidth of all memory channels with this. > > > > qemu all can do with ms_mbind/tbind is to create a vtopology that > > matches the hardware topology. It has these limits: > > > > 1) requires all userland applications to be modified to scan either > > the physical topology if run on host, or the vtopology if run on > > guest to get the full benefit. > > Not sure why you would need that. qemu can reflect the > topology based on -numa specifications and the corresponding > ms_tbind/mbind in FDT (in the case of Power, I guess ACPI > tables for x86) and guest kernel would detect this virtualized > topology. So there is no need for two types of topologies afaics. > It will all be reflected in /sys/devices/system/node in the guest. The point is: what a vtopology gives you? If you don't modify all apps running in the guest to use it? vtopology on guest, helps exactly like the topology on host -> very little unless you modify qemu on host to use ms_tbind/mbind. > > 2) breaks across live migration if host physical topology changes > > That is indeed an issue. Either VM placement software needs to > be really smart to migrate VMs that fit well or, more likely, > we will have to find a way to make guest kernels aware of > topology changes. But the latter has impact on userspace > as well for applications that might have optimized for NUMA. Making guest kernel aware about "memory" topology changes is going to be a whole mess. Or at least harder than memory hotplug. > I agree. Specifying NUMA topology for guest can result in > sub-optimal performance in some cases, it is a tradeoff. I see it more like a limit of this solution, which is a common limit to the hard bindings than a tradeoff. > Agreed. Yep I just wanted to make clear the limits remains with this solution. I'll try to teach knumad to detect thread<->memory affinity too with some logic, we'll see how well that can work. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:47688) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RWAZ8-0001TU-Ve for qemu-devel@nongnu.org; Thu, 01 Dec 2011 12:37:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RWAZ3-0001DL-6E for qemu-devel@nongnu.org; Thu, 01 Dec 2011 12:36:58 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51877) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RWAZ2-0001CT-EP for qemu-devel@nongnu.org; Thu, 01 Dec 2011 12:36:53 -0500 Date: Thu, 1 Dec 2011 18:36:23 +0100 From: Andrea Arcangeli Message-ID: <20111201173623.GV23466@redhat.com> References: <20111121150054.GA3602@in.ibm.com> <1321889126.28118.5.camel@twins> <20111121160001.GB3602@in.ibm.com> <1321894980.28118.16.camel@twins> <4ECB0019.7020800@codemonkey.ws> <20111123150300.GH8397@redhat.com> <4ECD3CBD.7010902@suse.de> <20111130162237.GC27308@in.ibm.com> <20111130174113.GM23466@redhat.com> <20111201172520.GA26737@in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111201172520.GA26737@in.ibm.com> Subject: Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Dipankar Sarma Cc: Peter Zijlstra , kvm list , qemu-devel Developers , Alexander Graf , Chris Wright , bharata@linux.vnet.ibm.com, Vaidyanathan S On Thu, Dec 01, 2011 at 10:55:20PM +0530, Dipankar Sarma wrote: > On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote: > > On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote: > > > create the guest topology correctly and optimize for NUMA. This > > > would work for us. > > > > Even on the case of 1 guest that fits in one node, you're not going to > > max out the full bandwidth of all memory channels with this. > > > > qemu all can do with ms_mbind/tbind is to create a vtopology that > > matches the hardware topology. It has these limits: > > > > 1) requires all userland applications to be modified to scan either > > the physical topology if run on host, or the vtopology if run on > > guest to get the full benefit. > > Not sure why you would need that. qemu can reflect the > topology based on -numa specifications and the corresponding > ms_tbind/mbind in FDT (in the case of Power, I guess ACPI > tables for x86) and guest kernel would detect this virtualized > topology. So there is no need for two types of topologies afaics. > It will all be reflected in /sys/devices/system/node in the guest. The point is: what a vtopology gives you? If you don't modify all apps running in the guest to use it? vtopology on guest, helps exactly like the topology on host -> very little unless you modify qemu on host to use ms_tbind/mbind. > > 2) breaks across live migration if host physical topology changes > > That is indeed an issue. Either VM placement software needs to > be really smart to migrate VMs that fit well or, more likely, > we will have to find a way to make guest kernels aware of > topology changes. But the latter has impact on userspace > as well for applications that might have optimized for NUMA. Making guest kernel aware about "memory" topology changes is going to be a whole mess. Or at least harder than memory hotplug. > I agree. Specifying NUMA topology for guest can result in > sub-optimal performance in some cases, it is a tradeoff. I see it more like a limit of this solution, which is a common limit to the hard bindings than a tradeoff. > Agreed. Yep I just wanted to make clear the limits remains with this solution. I'll try to teach knumad to detect thread<->memory affinity too with some logic, we'll see how well that can work.