From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>,
Vincent Guittot <vincent.guittot@linaro.org>
Cc: "tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"will@kernel.org" <will@kernel.org>,
"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
"bp@alien8.de" <bp@alien8.de>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"lenb@kernel.org" <lenb@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
"bsegall@google.com" <bsegall@google.com>,
"mgorman@suse.de" <mgorman@suse.de>,
"msys.mizuma@gmail.com" <msys.mizuma@gmail.com>,
"valentin.schneider@arm.com" <valentin.schneider@arm.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
"mark.rutland@arm.com" <mark.rutland@arm.com>,
"sudeep.holla@arm.com" <sudeep.holla@arm.com>,
"aubrey.li@linux.intel.com" <aubrey.li@linux.intel.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"xuwei (O)" <xuwei5@huawei.com>,
"Zengtao (B)" <prime.zeng@hisilicon.com>,
"guodong.xu@linaro.org" <guodong.xu@linaro.org>,
yangyicong <yangyicong@huawei.com>,
"Liguozhu (Kenneth)" <liguozhu@hisilicon.com>,
"linuxarm@openeuler.org" <linuxarm@openeuler.org>,
"hpa@zytor.com" <hpa@zytor.com>
Subject: RE: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC
Date: Mon, 3 May 2021 06:19:00 +0000 [thread overview]
Message-ID: <51df51d84c764d9292146e11d1031b08@hisilicon.com> (raw)
In-Reply-To: <d31a65af-d1d5-5fd1-276c-d2318cdba078@arm.com>
> -----Original Message-----
> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> Sent: Friday, April 30, 2021 10:43 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Vincent Guittot
> <vincent.guittot@linaro.org>
> Cc: tim.c.chen@linux.intel.com; catalin.marinas@arm.com; will@kernel.org;
> rjw@rjwysocki.net; bp@alien8.de; tglx@linutronix.de; mingo@redhat.com;
> lenb@kernel.org; peterz@infradead.org; rostedt@goodmis.org;
> bsegall@google.com; mgorman@suse.de; msys.mizuma@gmail.com;
> valentin.schneider@arm.com; gregkh@linuxfoundation.org; Jonathan Cameron
> <jonathan.cameron@huawei.com>; juri.lelli@redhat.com; mark.rutland@arm.com;
> sudeep.holla@arm.com; aubrey.li@linux.intel.com;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-acpi@vger.kernel.org; x86@kernel.org; xuwei (O) <xuwei5@huawei.com>;
> Zengtao (B) <prime.zeng@hisilicon.com>; guodong.xu@linaro.org; yangyicong
> <yangyicong@huawei.com>; Liguozhu (Kenneth) <liguozhu@hisilicon.com>;
> linuxarm@openeuler.org; hpa@zytor.com
> Subject: Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
> within one LLC
>
> On 29/04/2021 00:41, Song Bao Hua (Barry Song) wrote:
> >
> >
> >> -----Original Message-----
> >> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
>
> [...]
>
> >>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> >>
> >> [...]
> >>
> >>>>> On 20/04/2021 02:18, Barry Song wrote:
>
> [...]
>
> > Though we will never go to slow path, wake_wide() will affect want_affine,
> > so eventually affect the "new_cpu"?
>
> yes.
>
> >
> > for_each_domain(cpu, tmp) {
> > /*
> > * If both 'cpu' and 'prev_cpu' are part of this domain,
> > * cpu is a valid SD_WAKE_AFFINE target.
> > */
> > if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
> > cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
> > if (cpu != prev_cpu)
> > new_cpu = wake_affine(tmp, p, cpu, prev_cpu, sync);
> >
> > sd = NULL; /* Prefer wake_affine over balance flags */
> > break;
> > }
> >
> > if (tmp->flags & sd_flag)
> > sd = tmp;
> > else if (!want_affine)
> > break;
> > }
> >
> > If wake_affine is false, the above won't execute, new_cpu(target) will
> > always be "prev_cpu"? so when task size > cluster size in wake_wide(),
> > this means we won't pull the wakee to the cluster of waker? It seems
> > sensible.
>
> What is `task size` here?
>
> The criterion is `!(slave < factor || master < slave * factor)` or
> `slave >= factor && master >= slave * factor` to wake wide.
>
Yes. For "task size", I actually mean a bundle of waker-wakee tasks
which can make "slave >= factor && master >= slave * factor" either
true or false, then change the target cpu where we are going to scan
from.
Now since I have moved to cluster level when tasks have been in same
LLC level, it seems it would be more sensible to use "cluster_size" as
factor?
> I see that since you effectively change the sched domain size from LLC
> to CLUSTER (e.g. 24->6) for wakeups with cpu and prev_cpu sharing LLC
> (hence the `numactl -N 0` in your workload), wake_wide() has to take
> CLUSTER size into consideration.
>
> I was wondering if you saw wake_wide() returning 1 with your use cases:
>
> numactl -N 0 /usr/lib/lmbench/bin/stream -P [6,12] -M 1024M -N 5
I couldn't make wake_wide return 1 by the above stream command.
And I can't reproduce it by a 1:1(monogamous) hackbench "-f 1".
But I am able to reproduce this issue by a M:N hackbench, for example:
numactl -N 0 hackbench -p -T -f 10 -l 20000 -g 1
hackbench will create 10 senders which will send messages to 10
receivers. (Each sender can send messages to all 10 receivers.)
I've often seen flips like:
waker wakee
1501 39
1509 17
11 1320
13 2016
11, 13, 17 is smaller than LLC but larger than cluster. So the wake_wide()
using cluster factor will return 1, on the other hand, if we always use
llc_size as factor, it will return 0.
However, it seems the change in wake_wide() could bring some negative
influence to M:N relationship(-f 10) according to tests made today by:
numactl -N 0 hackbench -p -T -f 10 -l 20000 -g $1
g = 1 2 3 4
cluster_size 0.5768 0.6578 0.8117 1.0119
LLC_size 0.5479 0.6162 0.6922 0.7754
Always using llc_size as factor in wake_wide still shows better result
in the 10:10 polygamous hackbench.
So it seems the `slave >= factor && master >= slave * factor` isn't
a suitable criterion for cluster size?
Thanks
Barry
next prev parent reply other threads:[~2021-05-03 6:19 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-20 0:18 [RFC PATCH v6 0/4] scheduler: expose the topology of clusters and add cluster scheduler Barry Song
2021-04-20 0:18 ` [RFC PATCH v6 1/4] topology: Represent clusters of CPUs within a die Barry Song
2021-04-28 9:48 ` Andrew Jones
2021-04-30 3:46 ` Song Bao Hua (Barry Song)
2021-04-20 0:18 ` [RFC PATCH v6 2/4] scheduler: add scheduler level for clusters Barry Song
2021-04-20 0:18 ` [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC Barry Song
2021-04-27 11:35 ` Dietmar Eggemann
2021-04-28 9:51 ` Song Bao Hua (Barry Song)
2021-04-28 13:04 ` Vincent Guittot
2021-04-28 16:47 ` Dietmar Eggemann
[not found] ` <185746c4d02a485ca8f3509439328b26@hisilicon.com>
2021-04-30 10:42 ` Dietmar Eggemann
2021-05-03 6:19 ` Song Bao Hua (Barry Song) [this message]
2021-05-03 11:35 ` Song Bao Hua (Barry Song)
2021-05-05 12:29 ` Dietmar Eggemann
2021-05-07 13:07 ` Song Bao Hua (Barry Song)
2021-05-13 12:32 ` Dietmar Eggemann
2021-05-25 8:14 ` Song Bao Hua (Barry Song)
2021-05-26 9:54 ` Song Bao Hua (Barry Song)
2021-04-20 0:18 ` [RFC PATCH v6 4/4] scheduler: Add cluster scheduler level for x86 Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51df51d84c764d9292146e11d1031b08@hisilicon.com \
--to=song.bao.hua@hisilicon.com \
--cc=aubrey.li@linux.intel.com \
--cc=bp@alien8.de \
--cc=bsegall@google.com \
--cc=catalin.marinas@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=gregkh@linuxfoundation.org \
--cc=guodong.xu@linaro.org \
--cc=hpa@zytor.com \
--cc=jonathan.cameron@huawei.com \
--cc=juri.lelli@redhat.com \
--cc=lenb@kernel.org \
--cc=liguozhu@hisilicon.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@openeuler.org \
--cc=mark.rutland@arm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=msys.mizuma@gmail.com \
--cc=peterz@infradead.org \
--cc=prime.zeng@hisilicon.com \
--cc=rjw@rjwysocki.net \
--cc=rostedt@goodmis.org \
--cc=sudeep.holla@arm.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xuwei5@huawei.com \
--cc=yangyicong@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).