All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: "Song Bao Hua \(Barry Song\)" <song.bao.hua@hisilicon.com>
Cc: "linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"vincent.guittot\@linaro.org" <vincent.guittot@linaro.org>,
	"dietmar.eggemann\@arm.com" <dietmar.eggemann@arm.com>,
	"morten.rasmussen\@arm.com" <morten.rasmussen@arm.com>,
	Linuxarm <linuxarm@huawei.com>
Subject: Re: [RFC] sched/topology: NUMA topology limitations
Date: Tue, 01 Sep 2020 10:40:47 +0100	[thread overview]
Message-ID: <jhjmu29omw0.mognet@arm.com> (raw)
In-Reply-To: <f9c1012800844c5dbaa049e05006c131@hisilicon.com>


On 31/08/20 11:45, Barry Song wrote:
>> From: Valentin Schneider [mailto:valentin.schneider@arm.com]
>>
>> Ignoring corner cases where task affinity gets in the way, load balance
>> will always pull tasks to the local CPU (i.e. the CPU who's sched_domain we
>> are working on).
>>
>> If we're balancing load for CPU0-domain1, we would be looking at which CPUs
>> in [0-2] (i.e. the domain's span) we could (if we should) pull tasks from
>> to migrate them over to CPU0.
>>
>> We'll first try to figure out which sched_group has the more load (see
>> find_busiest_group() & friends), and that's where we may hit issues.
>>
>> Consider a scenario where CPU3 is noticeably busier than the other
>> CPUs. We'll end up marking CPU0-domain1-group2 (1-3) as the busiest group,
>> and compute an imbalance (i.e. amount of load to pull) mostly based on the
>> status of CPU3.
>>
>> We'll then go to find_busiest_queue(); the mask of CPUs we iterate over is
>> restricted by the sched_domain_span (i.e. doesn't include CPU3 here), so
>> we'll pull things from either CPU1 or CPU2 based on stats we built looking
>> at CPU3, which is bound to be pretty bogus.
>>
>> To summarise: we won't pull from the "outsider" node(s) (i.e., nodes
>> included in the sched_groups but not covered by the sched_domain), but they
>> will influence the stats and heuristics of the load balance.
>
> Hi Valentin,
> Thanks for your clarification. For many scenarios, to achieve good performance, people would
> pin processes in numa node. So the priority to pin would be local node first, then domain0 with one hop. Domain1
> with two hops is actually too far. Domain2 with three hops would be a disaster. If cpu0 pulls task from cpu2,
> but memory is still one CPU2's node, 3 hops would be a big problem for memory access and page migration.
>

Did you mean CPU3 here?

> However, for automatic numa balance, I would agree we need to fix the groups layout to make groups
> stay in the span of sched_domain. Otherwise, it seems the scheduler is running incorrectly to find the right
> cpu to pull task.
>
> In case we have
> 0 task on cpu0
> 1 task on cpu1
> 1 task on cpu2
> 4 task on cpu3
>
> In sched_domain1, cpu1+cpu3 is busy, so cpu0 would try to pull task from cpu2 of the group(1-3) because cpu3 is busy,
> meanwhile, it is an outsider.
>

Right, we'd pull from either CPU1 or CPU2 (in this case via a tentative
active load balance) because they are in the same group as CPU3 which
inflates the sched_group load stats, but we can't pull from it at this
domain because it's not included in the domain span.

> Thanks
> Barry

  reply	other threads:[~2020-09-01  9:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <6a5f06ff4ecb4f34bd7e9890dc07fb99@hisilicon.com>
2020-08-29  5:32 ` [RFC] sched/topology: NUMA topology limitations Song Bao Hua (Barry Song)
2020-08-29 12:33   ` Valentin Schneider
2020-08-31 10:45     ` Song Bao Hua (Barry Song)
2020-09-01  9:40       ` Valentin Schneider [this message]
2020-09-04  2:02         ` Song Bao Hua (Barry Song)
2020-08-29  5:42 ` Song Bao Hua (Barry Song)
2020-08-14 10:15 Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jhjmu29omw0.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.