linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Valentin Schneider <valentin.schneider@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "mingo@kernel.org" <mingo@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	"morten.rasmussen@arm.com" <morten.rasmussen@arm.com>,
	"mgorman@suse.de" <mgorman@suse.de>
Subject: RE: [PATCH 1/1] sched/topology: Make sched_init_numa() use a set for the deduplicating sort
Date: Fri, 29 Jan 2021 02:02:58 +0000	[thread overview]
Message-ID: <e12ec4f50c6c41db84f601038d3ee39c@hisilicon.com> (raw)
In-Reply-To: <jhjo8h915l2.mognet@arm.com>



> -----Original Message-----
> From: Valentin Schneider [mailto:valentin.schneider@arm.com]
> Sent: Friday, January 29, 2021 3:47 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>;
> linux-kernel@vger.kernel.org
> Cc: mingo@kernel.org; peterz@infradead.org; vincent.guittot@linaro.org;
> dietmar.eggemann@arm.com; morten.rasmussen@arm.com; mgorman@suse.de
> Subject: RE: [PATCH 1/1] sched/topology: Make sched_init_numa() use a set
> for the deduplicating sort
> 
> On 25/01/21 21:35, Song Bao Hua (Barry Song) wrote:
> > I was using 5.11-rc1. One thing I'd like to mention is that:
> >
> > For the below topology:
> > +-------+          +-----+
> > | node1 |  20      |node2|
> > |       +----------+     |
> > +---+---+          +-----+
> >     |                  |12
> > 12  |                  |
> > +---+---+          +---+-+
> > |       |          |node3|
> > | node0 |          |     |
> > +-------+          +-----+
> >
> > with node0-node2 as 22, node0-node3 as 24, node1-node3 as 22.
> >
> > I will get the below sched_domains_numa_distance[]:
> > 10, 12, 22, 24
> > As you can see there is *no* 20. So the node1 and node2 will
> > only get two-level numa sched_domain:
> >
> 
> 
> So that's
> 
>     -numa node,cpus=0-1,nodeid=0 -numa node,cpus=2-3,nodeid=1, \
>     -numa node,cpus=4-5,nodeid=2, -numa node,cpus=6-7,nodeid=3, \
>     -numa dist,src=0,dst=1,val=12, \
>     -numa dist,src=0,dst=2,val=22, \
>     -numa dist,src=0,dst=3,val=24, \
>     -numa dist,src=1,dst=2,val=20, \
>     -numa dist,src=1,dst=3,val=22, \
>     -numa dist,src=2,dst=3,val=12
> 
> but running this still doesn't get me a splat. Debugging
> sched_domains_numa_distance[] still gives me
> {10, 12, 20, 22, 24}
> 
> >
> > But for the below topology:
> > +-------+          +-----+
> > | node0 |  20      |node2|
> > |       +----------+     |
> > +---+---+          +-----+
> >     |                  |12
> > 12  |                  |
> > +---+---+          +---+-+
> > |       |          |node3|
> > | node1 |          |     |
> > +-------+          +-----+
> >
> > with node1-node2 as 22, node1-node3 as 24,node0-node3 as 22.
> >
> > I will get the below sched_domains_numa_distance[]:
> > 10, 12, 20, 22, 24
> >
> > What I have seen is the performance will be better if we
> > drop the 20 as we will get a sched_domain hierarchy with less
> > levels, and two intermediate nodes won't have the group span
> > issue.
> >
> 
> That is another thing that's worth considering. Morten was arguing that if
> the distance between two nodes is so tiny, it might not be worth
> representing it at all in the scheduler topology.

Yes. I agree it is a different thing. Anyway, I saw your patch has been
in sched tree. One side effect your patch is the one more sched_domain
level is imported for this topology:

                            24
                      X X XXX X X  X X X X XXX
             XX XX X                          XXXXX
         XXX                                        X
       XX                                             XXX
     XX                                 22              XXX
     X                           XXXXXXX                   XX
    X                        XXXXX      XXXXXXXXX           XXXX
   XX                      XXX                    XX X XX X    XX
+--------+           +---------+          +---------+      XX+---------+
| 0      |   12      | 1       | 20       | 2       |   12   |3        |
|        +-----------+         +----------+         +--------+         |
+---X----+           +---------+          +--X------+        +---------+
    X                                        X
    XX                                      X
     X                                     XX
      XX                                  XX
       XX                                X
        X XXX                         XXX
             X XXXXXX XX XX X X X XXXX
                       22
Without the patch, Linux will use 10,12,22,24 to build sched_domain;
With your patch, Linux will use 10,12,20,22,24 to build sched_domain.

So one more layer is added. What I have seen is that:

For node0 sched_domain <=12 and sched_domain <=20 span the same range
(node0, node1). So one of them is redundant. then in cpu_attach_domain,
the redundant one is dropped due to "remove the sched domains which
do not contribute to scheduling".

For node1&2, the origin code had no "20", thus built one less sched_domain
level.

What is really interesting is that removing 20 actually gives better
benchmark in speccpu :-)


> 
> > Thanks
> > Barry

Thanks
Barry


  reply	other threads:[~2021-01-29  2:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-22 12:39 [PATCH 0/1] sched/topology: NUMA distance deduplication Valentin Schneider
2021-01-22 12:39 ` [PATCH 1/1] sched/topology: Make sched_init_numa() use a set for the deduplicating sort Valentin Schneider
2021-01-25  2:23   ` Song Bao Hua (Barry Song)
2021-01-25  9:26     ` Valentin Schneider
2021-01-25 16:45       ` Valentin Schneider
2021-01-25 21:35         ` Song Bao Hua (Barry Song)
2021-01-28 14:47           ` Valentin Schneider
2021-01-29  2:02             ` Song Bao Hua (Barry Song) [this message]
2021-02-01 12:03               ` Valentin Schneider
2021-02-01  9:53   ` Dietmar Eggemann
2021-02-01 10:19     ` Vincent Guittot
2021-02-01 10:35     ` Song Bao Hua (Barry Song)
2021-02-01 11:55     ` Valentin Schneider
2021-02-02 10:03     ` [tip: sched/core] sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa() tip-bot2 for Dietmar Eggemann
2021-02-17 13:17     ` tip-bot2 for Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e12ec4f50c6c41db84f601038d3ee39c@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).