LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH] sched: Fix numabalancing to work with isolated cpus
Date: Thu, 6 Apr 2017 09:36:59 +0200
Message-ID: <20170406073659.y6ubqriyshax4v4m@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <1491326848-5748-1-git-send-email-srikar@linux.vnet.ibm.com>

On Tue, Apr 04, 2017 at 10:57:28PM +0530, Srikar Dronamraju wrote:
> When performing load balancing, numabalancing only looks at
> task->cpus_allowed to see if the task can run on the target cpu. If
> isolcpus kernel parameter is set, then isolated cpus will not be part of
> mask task->cpus_allowed.
> 
> For example: (On a Power 8 box running in smt 1 mode)
> 
> isolcpus=56,64,72,80,88
> 
> Cpus_allowed_list:	0-55,57-63,65-71,73-79,81-87,89-175
> /proc/20996/task/20996/status:Cpus_allowed_list:	0-55,57-63,65-71,73-79,81-87,89-175
> /proc/20996/task/20997/status:Cpus_allowed_list:	0-55,57-63,65-71,73-79,81-87,89-175
> /proc/20996/task/20998/status:Cpus_allowed_list:	0-55,57-63,65-71,73-79,81-87,89-175
> 
> Note: offline cpus are excluded in cpus_allowed_list.
> 
> However a task might call sched_setaffinity() that includes all possible
> cpus in the system including the isolated cpus.
> 
> For example:
> perf bench numa mem --no-data_rand_walk -p 4 -t $THREADS -G 0 -P 3072 -T 0 -l 50 -c -s 1000
> would call sched_setaffinity that resets the cpus_allowed mask.
> 
> Cpus_allowed_list:	0-55,57-63,65-71,73-79,81-87,89-175
> Cpus_allowed_list:	0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
> Cpus_allowed_list:	0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
> Cpus_allowed_list:	0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
> Cpus_allowed_list:	0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
> 
> The isolated cpus are part of the cpus allowed list. In the above case,
> numabalancing ends up scheduling some of these tasks on isolated cpus.
> 
> To avoid this, please check for isolated cpus before choosing a target
> cpu.

Is there anything stopping the numa balancer taking tasks off an
isolated CPU?

Its been too long since I've looked at the NUMA bits; but from a quick
reading we mostly completely ignore the sched domain stuff.

That means there's likely to be more holes here, and just plugging them
as we find them doesn't appear to be the best approach.

For example, if set use cpusets to partition the scheduler, but somehow
leave a task in the root group, numa balancer looks like it will happily
migrate tasks between the partitions.

So please try and fix the bigger problem, then I think this one will go
away as well.

      parent reply index

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-04 17:27 Srikar Dronamraju
2017-04-04 18:56 ` Rik van Riel
2017-04-04 20:37 ` Mel Gorman
2017-04-05  1:50   ` Srikar Dronamraju
2017-04-05  8:09     ` Mel Gorman
2017-04-05 12:57 ` Michal Hocko
2017-04-05 15:22   ` Srikar Dronamraju
2017-04-05 16:44     ` Michal Hocko
2017-04-06  7:19       ` Srikar Dronamraju
2017-04-06  7:34         ` Michal Hocko
2017-04-06  9:23           ` Peter Zijlstra
2017-04-06 10:13             ` Michal Hocko
2017-04-06 10:29               ` Peter Zijlstra
2017-04-06 10:42                 ` Michal Hocko
2017-04-06 10:47                   ` Peter Zijlstra
2017-04-06 13:44                     ` Michal Hocko
2017-04-06  7:36 ` Mike Galbraith
2017-04-06  7:36 ` Peter Zijlstra [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170406073659.y6ubqriyshax4v4m@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git