From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163058Ab2GMOp3 (ORCPT ); Fri, 13 Jul 2012 10:45:29 -0400 Received: from g1t0027.austin.hp.com ([15.216.28.34]:4635 "EHLO g1t0027.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752158Ab2GMOpZ (ORCPT ); Fri, 13 Jul 2012 10:45:25 -0400 Message-ID: <5000347E.1050301@hp.com> Date: Fri, 13 Jul 2012 07:45:18 -0700 From: Don Morris User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0 MIME-Version: 1.0 To: Rik van Riel CC: Peter Zijlstra , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Dan Smith , Bharata B Rao , Lee Schermerhorn , Andrea Arcangeli , Johannes Weiner , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC][PATCH 14/26] sched, numa: Numa balancer References: <20120316144028.036474157@chello.nl> <20120316144241.012558280@chello.nl> <4FFF4987.4050205@redhat.com> In-Reply-To: <4FFF4987.4050205@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/12/2012 03:02 PM, Rik van Riel wrote: > On 03/16/2012 10:40 AM, Peter Zijlstra wrote: > > At LSF/MM, there was a presentation comparing Peter's > NUMA code with Andrea's NUMA code. I believe this is > the main reason why Andrea's code performed better in > that particular test... > >> + if (sched_feat(NUMA_BALANCE_FILTER)) { >> + /* >> + * Avoid moving ne's when we create a larger imbalance >> + * on the other end. >> + */ >> + if ((imb->type & NUMA_BALANCE_CPU) && >> + imb->cpu - cpu_moved < ne_cpu / 2) >> + goto next; >> + >> + /* >> + * Avoid migrating ne's when we'll know we'll push our >> + * node over the memory limit. >> + */ >> + if (max_mem_load && >> + imb->mem_load + mem_moved + ne_mem > max_mem_load) >> + goto next; >> + } > > IIRC the test consisted of a 16GB NUMA system with two 8GB nodes. > It was running 3 KVM guests, two guests of 3GB memory each, and > one guest of 6GB each. How many cpus per guest (host threads) and how many physical/logical cpus per node on the host? Any comparisons with a situation where the memory would fit within nodes but the scheduling load would be too high? Don > > With autonuma, the 6GB guest ended up on one node, and the > 3GB guests on the other. > > With sched numa, each node had a 3GB guest, and part of the 6GB guest. > > There is a fundamental difference in the balancing between autonuma > and sched numa. > > In sched numa, a process is moved over to the current node only if > the current node has space for it. > > Autonuma, on the other hand, operates more of a a "hostage exchange" > policy, where a thread on one node is exchanged with a thread on > another node, if it looks like that will reduce the overall number > of cross-node NUMA faults in the system. > > I am not sure how to do a "hostage exchange" algorithm with > sched numa, but it would seem like it could be necessary in order > for some workloads to converge on a sane configuration. > > After all, with only about 2GB free on each node, you will never > get to move either a 3GB guest, or parts of a 6GB guest... > > Any ideas? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx123.postini.com [74.125.245.123]) by kanga.kvack.org (Postfix) with SMTP id D235A6B005A for ; Fri, 13 Jul 2012 10:45:25 -0400 (EDT) Message-ID: <5000347E.1050301@hp.com> Date: Fri, 13 Jul 2012 07:45:18 -0700 From: Don Morris MIME-Version: 1.0 Subject: Re: [RFC][PATCH 14/26] sched, numa: Numa balancer References: <20120316144028.036474157@chello.nl> <20120316144241.012558280@chello.nl> <4FFF4987.4050205@redhat.com> In-Reply-To: <4FFF4987.4050205@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: Peter Zijlstra , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Dan Smith , Bharata B Rao , Lee Schermerhorn , Andrea Arcangeli , Johannes Weiner , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 07/12/2012 03:02 PM, Rik van Riel wrote: > On 03/16/2012 10:40 AM, Peter Zijlstra wrote: > > At LSF/MM, there was a presentation comparing Peter's > NUMA code with Andrea's NUMA code. I believe this is > the main reason why Andrea's code performed better in > that particular test... > >> + if (sched_feat(NUMA_BALANCE_FILTER)) { >> + /* >> + * Avoid moving ne's when we create a larger imbalance >> + * on the other end. >> + */ >> + if ((imb->type & NUMA_BALANCE_CPU) && >> + imb->cpu - cpu_moved < ne_cpu / 2) >> + goto next; >> + >> + /* >> + * Avoid migrating ne's when we'll know we'll push our >> + * node over the memory limit. >> + */ >> + if (max_mem_load && >> + imb->mem_load + mem_moved + ne_mem > max_mem_load) >> + goto next; >> + } > > IIRC the test consisted of a 16GB NUMA system with two 8GB nodes. > It was running 3 KVM guests, two guests of 3GB memory each, and > one guest of 6GB each. How many cpus per guest (host threads) and how many physical/logical cpus per node on the host? Any comparisons with a situation where the memory would fit within nodes but the scheduling load would be too high? Don > > With autonuma, the 6GB guest ended up on one node, and the > 3GB guests on the other. > > With sched numa, each node had a 3GB guest, and part of the 6GB guest. > > There is a fundamental difference in the balancing between autonuma > and sched numa. > > In sched numa, a process is moved over to the current node only if > the current node has space for it. > > Autonuma, on the other hand, operates more of a a "hostage exchange" > policy, where a thread on one node is exchanged with a thread on > another node, if it looks like that will reduce the overall number > of cross-node NUMA faults in the system. > > I am not sure how to do a "hostage exchange" algorithm with > sched numa, but it would seem like it could be necessary in order > for some workloads to converge on a sane configuration. > > After all, with only about 2GB free on each node, you will never > get to move either a 3GB guest, or parts of a 6GB guest... > > Any ideas? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org