From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752432AbbFVQNf (ORCPT ); Mon, 22 Jun 2015 12:13:35 -0400 Received: from e28smtp07.in.ibm.com ([122.248.162.7]:35354 "EHLO e28smtp07.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750838AbbFVQN1 (ORCPT ); Mon, 22 Jun 2015 12:13:27 -0400 X-Helo: d28dlp01.in.ibm.com X-MailFrom: srikar@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 22 Jun 2015 21:43:22 +0530 From: Srikar Dronamraju To: Rik van Riel Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, mgorman@suse.de Subject: Re: [PATCH] sched,numa: document and fix numa_preferred_nid setting Message-ID: <20150622161322.GA32412@linux.vnet.ibm.com> Reply-To: Srikar Dronamraju References: <20150616155450.62ec234b@cuia.usersys.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20150616155450.62ec234b@cuia.usersys.redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15062216-0025-0000-0000-000005745698 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > + * migrating the task to where it really belongs. > + * The exception is a task that belongs to a large numa_group, which > + * spans multiple NUMA nodes. If that task migrates into one of the > + * workload's active nodes, remember that node as the task's > + * numa_preferred_nid, so the workload can settle down. > */ > if (p->numa_group) { > if (env.best_cpu == -1) > @@ -1513,7 +1520,7 @@ static int task_numa_migrate(struct task_struct *p) > nid = env.dst_nid; > > if (node_isset(nid, p->numa_group->active_nodes)) > - sched_setnuma(p, env.dst_nid); > + sched_setnuma(p, nid); > } > > /* No better CPU than the current one was found. */ > When I refer to the Modified Rik's patch, I mean to remove the node_isset() check before the sched_setnuma. With that change, we kind of reduce the numa02 and 1JVMper System regression while getting as good numbers as Rik's patch with 2JVM and 4JVM per System. The idea behind removing the node_isset check is: node_isset is mostly used to track mem movement to nodes where cpus are running and not vice versa. This is as per comment in update_numa_active_node_mask. There could be a sitation where task memory is all in a node and the node has capacity to accomodate but no tasks associated with the task have run enuf on that node. In such a case, we shouldnt be ruling out migrating the task to the node. -- Thanks and Regards Srikar Dronamraju -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/