From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Valentin Schneider <valentin.schneider@arm.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Mel Gorman <mgorman@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
"linuxarm@openeuler.org" <linuxarm@openeuler.org>
Subject: RE: [RFC PATCH] sched/fair: first try to fix the scheduling impact of NUMA diameter > 2
Date: Mon, 25 Jan 2021 21:55:40 +0000 [thread overview]
Message-ID: <803c439c1d1f435bb22a6ef6c0c2d99e@hisilicon.com> (raw)
In-Reply-To: <jhjwnw11ak2.mognet@arm.com>
> -----Original Message-----
> From: Valentin Schneider [mailto:valentin.schneider@arm.com]
> Sent: Tuesday, January 26, 2021 1:11 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Vincent Guittot
> <vincent.guittot@linaro.org>; Mel Gorman <mgorman@suse.de>
> Cc: Ingo Molnar <mingo@kernel.org>; Peter Zijlstra <peterz@infradead.org>;
> Dietmar Eggemann <dietmar.eggemann@arm.com>; Morten Rasmussen
> <morten.rasmussen@arm.com>; linux-kernel <linux-kernel@vger.kernel.org>;
> linuxarm@openeuler.org
> Subject: RE: [RFC PATCH] sched/fair: first try to fix the scheduling impact
> of NUMA diameter > 2
>
> On 25/01/21 03:13, Song Bao Hua (Barry Song) wrote:
> > As long as NUMA diameter > 2, building sched_domain by sibling's child domain
> > will definitely create a sched_domain with sched_group which will span
> > out of the sched_domain
> > +------+ +------+ +-------+ +------+
> > | node | 12 |node | 20 | node | 12 |node |
> > | 0 +---------+1 +--------+ 2 +-------+3 |
> > +------+ +------+ +-------+ +------+
> >
> > domain0 node0 node1 node2 node3
> >
> > domain1 node0+1 node0+1 node2+3 node2+3
> > +
> > domain2 node0+1+2 |
> > group: node0+1 |
> > group:node2+3 <-------------------+
> >
> > when node2 is added into the domain2 of node0, kernel is using the child
> > domain of node2's domain2, which is domain1(node2+3). Node 3 is outside
> > the span of node0+1+2.
> >
> > Will we move to use the *child* domain of the *child* domain of node2's
> > domain2 to build the sched_group?
> >
> > I mean:
> > +------+ +------+ +-------+ +------+
> > | node | 12 |node | 20 | node | 12 |node |
> > | 0 +---------+1 +--------+ 2 +-------+3 |
> > +------+ +------+ +-------+ +------+
> >
> > domain0 node0 node1 +- node2 node3
> > |
> > domain1 node0+1 node0+1 | node2+3 node2+3
> > |
> > domain2 node0+1+2 |
> > group: node0+1 |
> > group:node2 <-------------------+
> >
> > In this way, it seems we don't have to create a new group as we are just
> > reusing the existing group?
> >
>
> One thing I've been musing over is pretty much this; that is to say we
> would make all non-local NUMA sched_groups span a single node. This would
> let us reuse an existing span+sched_group_capacity: the local group of that
> node at its first NUMA topology level.
>
> Essentially this means getting rid of the overlapping groups, and the
> balance mask is handled the same way as for !NUMA, i.e. it's the local
> group span. I've not gone far enough through the thought experiment to see
> where does it miserably fall apart... It is at the very least violating the
> expectation that a group span is a child domain's span - here it can be a
> grand^x children domain's span.
>
>
> If we take your topology, we currently have:
>
> | tl\node | 0 | 1 | 2 | 3 |
> |---------+--------------+---------------+---------------+--------------|
> | NUMA0 | (0)->(1) | (1)->(2)->(0) | (2)->(3)->(1) | (3)->(2) |
> | NUMA1 | (0-1)->(1-3) | (0-2)->(2-3) | (1-3)->(0-1) | (2-3)->(0-2) |
> | NUMA2 | (0-2)->(1-3) | N/A | N/A | (1-3)->(0-2) |
>
> With the current overlapping group scheme, we would need to make it look
> like so:
>
> | tl\node | 0 | 1 | 2 | 3 |
> |---------+---------------+---------------+---------------+---------------
> |
> | NUMA0 | (0)->(1) | (1)->(2)->(0) | (2)->(3)->(1) | (3)->(2) |
> | NUMA1 | (0-1)->(1-2)* | (0-2)->(2-3) | (1-3)->(0-1) | (2-3)->(1-2)* |
> | NUMA2 | (0-2)->(1-3) | N/A | N/A | (1-3)->(0-2) |
>
> But as already discussed, that's tricky to make work. With the node-span
> groups thing, we would turn this into:
>
> | tl\node | 0 | 1 | 2 | 3 |
> |---------+------------+---------------+---------------+------------|
> | NUMA0 | (0)->(1) | (1)->(2)->(0) | (2)->(3)->(1) | (3)->(2) |
> | NUMA1 | (0-1)->(2) | (0-2)->(3) | (1-3)->(0) | (2-3)->(1) |
> | NUMA2 | (0-2)->(3) | N/A | N/A | (1-3)->(0) |
Actually I didn't mean going that far. What I was thinking is that
we only fix the sched_domain while sched_group isn't a subset of
sched_domain. For those sched_domains which haven't the group span
issue, we just don't touch it. For NUMA1, we change like your diagram,
but NUMA2 won't be changed. The concept is like:
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1040,6 +1040,19 @@ build_overlap_sched_groups(struct sched_domain
*sd, int cpu)
}
sg_span = sched_group_span(sg);
+#if 1
+ if (sibling->child && !cpumask_subset(sg_span, span)) {
+ sg = build_group_from_child_sched_domain(sibling->child, cpu);
+ ...
+ sg_span = sched_group_span(sg);
+ }
+#endif
cpumask_or(covered, covered, sg_span);
Thanks
Barry
prev parent reply other threads:[~2021-01-25 22:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-15 20:36 [RFC PATCH] sched/fair: first try to fix the scheduling impact of NUMA diameter > 2 Barry Song
2021-01-18 11:13 ` Vincent Guittot
2021-01-18 11:25 ` Song Bao Hua (Barry Song)
2021-01-21 18:14 ` Valentin Schneider
2021-01-22 2:53 ` Song Bao Hua (Barry Song)
2021-01-25 3:13 ` Song Bao Hua (Barry Song)
2021-01-25 12:10 ` Valentin Schneider
2021-01-25 21:55 ` Song Bao Hua (Barry Song) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=803c439c1d1f435bb22a6ef6c0c2d99e@hisilicon.com \
--to=song.bao.hua@hisilicon.com \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@openeuler.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).