* [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
@ 2021-03-20 22:14 ` Barry Song
0 siblings, 0 replies; 13+ messages in thread
From: Barry Song @ 2021-03-20 22:14 UTC (permalink / raw)
To: vincent.guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman
Cc: valentin.schneider, aubrey.li, linux-arm-kernel, linux-kernel,
xuwei5, prime.zeng, guodong.xu, yangyicong, liguozhu, linuxarm,
Barry Song
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
smt.
this could contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no smt. this patch removes the
redundant test_idle_cores() for non-smt machines.
we run the below hackbench with different -g parameter from 2 to
14, for each different g, we run the command 10 times and get the
average time:
$ numactl -N 0 hackbench -p -T -l 20000 -g $1
hackbench will report the time which is needed to complete a certain
number of messages transmissions between a certain number of tasks,
for example:
$ numactl -N 0 hackbench -p -T -l 20000 -g 10
Running in threaded mode with 10 groups using 40 file descriptors each
(== 400 tasks)
Each sender will pass 20000 messages of 100 bytes
The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14
w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
+4.1% +8.3% +7.3% +6.3%
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
kernel/sched/fair.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2e2ab1e..de42a32 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
{
struct sched_domain_shared *sds;
- sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
- if (sds)
- return READ_ONCE(sds->has_idle_cores);
+ if (static_branch_likely(&sched_smt_present)) {
+ sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (sds)
+ return READ_ONCE(sds->has_idle_cores);
+ }
return def;
}
--
1.8.3.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
2021-03-20 22:14 ` Barry Song
@ 2021-03-22 4:36 ` Li, Aubrey
-1 siblings, 0 replies; 13+ messages in thread
From: Li, Aubrey @ 2021-03-22 4:36 UTC (permalink / raw)
To: Barry Song, vincent.guittot, mingo, peterz, juri.lelli,
dietmar.eggemann, rostedt, bsegall, mgorman
Cc: valentin.schneider, linux-arm-kernel, linux-kernel, xuwei5,
prime.zeng, guodong.xu, yangyicong, liguozhu, linuxarm
Hi Barry,
On 2021/3/21 6:14, Barry Song wrote:
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
The patch looks good to me.
May I know for what case we need to keep CONFIG_SCHED_SMT for non-smt
machines?
Thanks,
-Aubrey
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> ---
> kernel/sched/fair.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2e2ab1e..de42a32 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
> {
> struct sched_domain_shared *sds;
>
> - sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> - if (sds)
> - return READ_ONCE(sds->has_idle_cores);
> + if (static_branch_likely(&sched_smt_present)) {
> + sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> + if (sds)
> + return READ_ONCE(sds->has_idle_cores);
> + }
>
> return def;
> }
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
@ 2021-03-22 4:36 ` Li, Aubrey
0 siblings, 0 replies; 13+ messages in thread
From: Li, Aubrey @ 2021-03-22 4:36 UTC (permalink / raw)
To: Barry Song, vincent.guittot, mingo, peterz, juri.lelli,
dietmar.eggemann, rostedt, bsegall, mgorman
Cc: valentin.schneider, linux-arm-kernel, linux-kernel, xuwei5,
prime.zeng, guodong.xu, yangyicong, liguozhu, linuxarm
Hi Barry,
On 2021/3/21 6:14, Barry Song wrote:
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
The patch looks good to me.
May I know for what case we need to keep CONFIG_SCHED_SMT for non-smt
machines?
Thanks,
-Aubrey
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> ---
> kernel/sched/fair.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2e2ab1e..de42a32 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
> {
> struct sched_domain_shared *sds;
>
> - sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> - if (sds)
> - return READ_ONCE(sds->has_idle_cores);
> + if (static_branch_likely(&sched_smt_present)) {
> + sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> + if (sds)
> + return READ_ONCE(sds->has_idle_cores);
> + }
>
> return def;
> }
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [Linuxarm] Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
2021-03-22 4:36 ` Li, Aubrey
@ 2021-03-22 5:08 ` Song Bao Hua (Barry Song)
-1 siblings, 0 replies; 13+ messages in thread
From: Song Bao Hua (Barry Song) @ 2021-03-22 5:08 UTC (permalink / raw)
To: Li, Aubrey, vincent.guittot, mingo, peterz, juri.lelli,
dietmar.eggemann, rostedt, bsegall, mgorman
Cc: valentin.schneider, linux-arm-kernel, linux-kernel, xuwei (O),
Zengtao (B), guodong.xu, yangyicong, Liguozhu (Kenneth),
linuxarm
> -----Original Message-----
> From: Li, Aubrey [mailto:aubrey.li@linux.intel.com]
> Sent: Monday, March 22, 2021 5:37 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>;
> vincent.guittot@linaro.org; mingo@redhat.com; peterz@infradead.org;
> juri.lelli@redhat.com; dietmar.eggemann@arm.com; rostedt@goodmis.org;
> bsegall@google.com; mgorman@suse.de
> Cc: valentin.schneider@arm.com; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; xuwei (O) <xuwei5@huawei.com>; Zengtao (B)
> <prime.zeng@hisilicon.com>; guodong.xu@linaro.org; yangyicong
> <yangyicong@huawei.com>; Liguozhu (Kenneth) <liguozhu@hisilicon.com>;
> linuxarm@openeuler.org
> Subject: [Linuxarm] Re: [PATCH] sched/fair: remove redundant test_idle_cores
> for non-smt
>
> Hi Barry,
>
> On 2021/3/21 6:14, Barry Song wrote:
> > update_idle_core() is only done for the case of sched_smt_present.
> > but test_idle_cores() is done for all machines even those without
> > smt.
>
> The patch looks good to me.
> May I know for what case we need to keep CONFIG_SCHED_SMT for non-smt
> machines?
Hi Aubrey,
I think the defconfig of arm64 has always enabled
CONFIG_SCHED_SMT:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/configs/defconfig
it is probably true for x86 as well.
I don't think Linux distribution will build a separate kernel
for machines without smt. so basically the kernel depends on
runtime topology parse to figure out if smt is present rather
than depending on a rebuild.
>
> Thanks,
> -Aubrey
>
>
> > this could contribute to up 8%+ hackbench performance loss on a
> > machine like kunpeng 920 which has no smt. this patch removes the
> > redundant test_idle_cores() for non-smt machines.
> >
> > we run the below hackbench with different -g parameter from 2 to
> > 14, for each different g, we run the command 10 times and get the
> > average time:
> > $ numactl -N 0 hackbench -p -T -l 20000 -g $1
> >
> > hackbench will report the time which is needed to complete a certain
> > number of messages transmissions between a certain number of tasks,
> > for example:
> > $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> > Running in threaded mode with 10 groups using 40 file descriptors each
> > (== 400 tasks)
> > Each sender will pass 20000 messages of 100 bytes
> >
> > The below is the result of hackbench w/ and w/o this patch:
> > g= 2 4 6 8 10 12 14
> > w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> > w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> > +4.1% +8.3% +7.3% +6.3%
> >
> > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > ---
> > kernel/sched/fair.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 2e2ab1e..de42a32 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
> > {
> > struct sched_domain_shared *sds;
> >
> > - sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> > - if (sds)
> > - return READ_ONCE(sds->has_idle_cores);
> > + if (static_branch_likely(&sched_smt_present)) {
> > + sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> > + if (sds)
> > + return READ_ONCE(sds->has_idle_cores);
> > + }
> >
> > return def;
> > }
Thanks
Barry
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [Linuxarm] Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
@ 2021-03-22 5:08 ` Song Bao Hua (Barry Song)
0 siblings, 0 replies; 13+ messages in thread
From: Song Bao Hua (Barry Song) @ 2021-03-22 5:08 UTC (permalink / raw)
To: Li, Aubrey, vincent.guittot, mingo, peterz, juri.lelli,
dietmar.eggemann, rostedt, bsegall, mgorman
Cc: valentin.schneider, linux-arm-kernel, linux-kernel, xuwei (O),
Zengtao (B), guodong.xu, yangyicong, Liguozhu (Kenneth),
linuxarm
> -----Original Message-----
> From: Li, Aubrey [mailto:aubrey.li@linux.intel.com]
> Sent: Monday, March 22, 2021 5:37 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>;
> vincent.guittot@linaro.org; mingo@redhat.com; peterz@infradead.org;
> juri.lelli@redhat.com; dietmar.eggemann@arm.com; rostedt@goodmis.org;
> bsegall@google.com; mgorman@suse.de
> Cc: valentin.schneider@arm.com; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; xuwei (O) <xuwei5@huawei.com>; Zengtao (B)
> <prime.zeng@hisilicon.com>; guodong.xu@linaro.org; yangyicong
> <yangyicong@huawei.com>; Liguozhu (Kenneth) <liguozhu@hisilicon.com>;
> linuxarm@openeuler.org
> Subject: [Linuxarm] Re: [PATCH] sched/fair: remove redundant test_idle_cores
> for non-smt
>
> Hi Barry,
>
> On 2021/3/21 6:14, Barry Song wrote:
> > update_idle_core() is only done for the case of sched_smt_present.
> > but test_idle_cores() is done for all machines even those without
> > smt.
>
> The patch looks good to me.
> May I know for what case we need to keep CONFIG_SCHED_SMT for non-smt
> machines?
Hi Aubrey,
I think the defconfig of arm64 has always enabled
CONFIG_SCHED_SMT:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/configs/defconfig
it is probably true for x86 as well.
I don't think Linux distribution will build a separate kernel
for machines without smt. so basically the kernel depends on
runtime topology parse to figure out if smt is present rather
than depending on a rebuild.
>
> Thanks,
> -Aubrey
>
>
> > this could contribute to up 8%+ hackbench performance loss on a
> > machine like kunpeng 920 which has no smt. this patch removes the
> > redundant test_idle_cores() for non-smt machines.
> >
> > we run the below hackbench with different -g parameter from 2 to
> > 14, for each different g, we run the command 10 times and get the
> > average time:
> > $ numactl -N 0 hackbench -p -T -l 20000 -g $1
> >
> > hackbench will report the time which is needed to complete a certain
> > number of messages transmissions between a certain number of tasks,
> > for example:
> > $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> > Running in threaded mode with 10 groups using 40 file descriptors each
> > (== 400 tasks)
> > Each sender will pass 20000 messages of 100 bytes
> >
> > The below is the result of hackbench w/ and w/o this patch:
> > g= 2 4 6 8 10 12 14
> > w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> > w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> > +4.1% +8.3% +7.3% +6.3%
> >
> > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > ---
> > kernel/sched/fair.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 2e2ab1e..de42a32 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
> > {
> > struct sched_domain_shared *sds;
> >
> > - sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> > - if (sds)
> > - return READ_ONCE(sds->has_idle_cores);
> > + if (static_branch_likely(&sched_smt_present)) {
> > + sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> > + if (sds)
> > + return READ_ONCE(sds->has_idle_cores);
> > + }
> >
> > return def;
> > }
Thanks
Barry
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
2021-03-20 22:14 ` Barry Song
@ 2021-03-22 10:15 ` Mel Gorman
-1 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2021-03-22 10:15 UTC (permalink / raw)
To: Barry Song
Cc: vincent.guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, valentin.schneider, aubrey.li,
linux-arm-kernel, linux-kernel, xuwei5, prime.zeng, guodong.xu,
yangyicong, liguozhu, linuxarm
On Sun, Mar 21, 2021 at 11:14:32AM +1300, Barry Song wrote:
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Mel Gorman <mgorman@suse.de>
That said, the numa_idle_core() function then becomes slightly
redundant. A possible follow up is to move the "idle_core >= 0" check
in numa_idle_core() to its caller in update_numa_stats() and then remove
the redundant check in !static_branch_likely(&sched_smt_present) check
in numa_idle_core.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
@ 2021-03-22 10:15 ` Mel Gorman
0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2021-03-22 10:15 UTC (permalink / raw)
To: Barry Song
Cc: vincent.guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, valentin.schneider, aubrey.li,
linux-arm-kernel, linux-kernel, xuwei5, prime.zeng, guodong.xu,
yangyicong, liguozhu, linuxarm
On Sun, Mar 21, 2021 at 11:14:32AM +1300, Barry Song wrote:
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Mel Gorman <mgorman@suse.de>
That said, the numa_idle_core() function then becomes slightly
redundant. A possible follow up is to move the "idle_core >= 0" check
in numa_idle_core() to its caller in update_numa_stats() and then remove
the redundant check in !static_branch_likely(&sched_smt_present) check
in numa_idle_core.
--
Mel Gorman
SUSE Labs
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
2021-03-20 22:14 ` Barry Song
@ 2021-03-22 11:09 ` Peter Zijlstra
-1 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2021-03-22 11:09 UTC (permalink / raw)
To: Barry Song
Cc: vincent.guittot, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, valentin.schneider, aubrey.li,
linux-arm-kernel, linux-kernel, xuwei5, prime.zeng, guodong.xu,
yangyicong, liguozhu, linuxarm
On Sun, Mar 21, 2021 at 11:14:32AM +1300, Barry Song wrote:
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
The patch looks obvious, but the Changelog and Subject needed a lot of
help.
I've changed it like so:
---
Subject: sched/fair: Optimize test_idle_cores() for !SMT
From: Barry Song <song.bao.hua@hisilicon.com>
Date: Sun, 21 Mar 2021 11:14:32 +1300
From: Barry Song <song.bao.hua@hisilicon.com>
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.
This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.
Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.
$ numactl -N 0 hackbench -p -T -l 20000 -g $1
The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14
w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
+4.1% +8.3% +7.3% +6.3%
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com
---
kernel/sched/fair.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int c
{
struct sched_domain_shared *sds;
- sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
- if (sds)
- return READ_ONCE(sds->has_idle_cores);
+ if (static_branch_likely(&sched_smt_present)) {
+ sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (sds)
+ return READ_ONCE(sds->has_idle_cores);
+ }
return def;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
@ 2021-03-22 11:09 ` Peter Zijlstra
0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2021-03-22 11:09 UTC (permalink / raw)
To: Barry Song
Cc: vincent.guittot, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, valentin.schneider, aubrey.li,
linux-arm-kernel, linux-kernel, xuwei5, prime.zeng, guodong.xu,
yangyicong, liguozhu, linuxarm
On Sun, Mar 21, 2021 at 11:14:32AM +1300, Barry Song wrote:
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
The patch looks obvious, but the Changelog and Subject needed a lot of
help.
I've changed it like so:
---
Subject: sched/fair: Optimize test_idle_cores() for !SMT
From: Barry Song <song.bao.hua@hisilicon.com>
Date: Sun, 21 Mar 2021 11:14:32 +1300
From: Barry Song <song.bao.hua@hisilicon.com>
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.
This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.
Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.
$ numactl -N 0 hackbench -p -T -l 20000 -g $1
The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14
w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
+4.1% +8.3% +7.3% +6.3%
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com
---
kernel/sched/fair.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int c
{
struct sched_domain_shared *sds;
- sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
- if (sds)
- return READ_ONCE(sds->has_idle_cores);
+ if (static_branch_likely(&sched_smt_present)) {
+ sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (sds)
+ return READ_ONCE(sds->has_idle_cores);
+ }
return def;
}
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
2021-03-20 22:14 ` Barry Song
@ 2021-03-22 13:11 ` Vincent Guittot
-1 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-03-22 13:11 UTC (permalink / raw)
To: Barry Song
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Aubrey Li, LAK, linux-kernel, xuwei (O), Zengtao (B),
Guodong Xu, yangyicong, Liguozhu (Kenneth),
linuxarm
On Sat, 20 Mar 2021 at 23:21, Barry Song <song.bao.hua@hisilicon.com> wrote:
>
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2e2ab1e..de42a32 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
> {
> struct sched_domain_shared *sds;
>
> - sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> - if (sds)
> - return READ_ONCE(sds->has_idle_cores);
> + if (static_branch_likely(&sched_smt_present)) {
> + sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> + if (sds)
> + return READ_ONCE(sds->has_idle_cores);
> + }
>
> return def;
> }
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sched/fair: remove redundant test_idle_cores for non-smt
@ 2021-03-22 13:11 ` Vincent Guittot
0 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-03-22 13:11 UTC (permalink / raw)
To: Barry Song
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Aubrey Li, LAK, linux-kernel, xuwei (O), Zengtao (B),
Guodong Xu, yangyicong, Liguozhu (Kenneth),
linuxarm
On Sat, 20 Mar 2021 at 23:21, Barry Song <song.bao.hua@hisilicon.com> wrote:
>
> update_idle_core() is only done for the case of sched_smt_present.
> but test_idle_cores() is done for all machines even those without
> smt.
> this could contribute to up 8%+ hackbench performance loss on a
> machine like kunpeng 920 which has no smt. this patch removes the
> redundant test_idle_cores() for non-smt machines.
>
> we run the below hackbench with different -g parameter from 2 to
> 14, for each different g, we run the command 10 times and get the
> average time:
> $ numactl -N 0 hackbench -p -T -l 20000 -g $1
>
> hackbench will report the time which is needed to complete a certain
> number of messages transmissions between a certain number of tasks,
> for example:
> $ numactl -N 0 hackbench -p -T -l 20000 -g 10
> Running in threaded mode with 10 groups using 40 file descriptors each
> (== 400 tasks)
> Each sender will pass 20000 messages of 100 bytes
>
> The below is the result of hackbench w/ and w/o this patch:
> g= 2 4 6 8 10 12 14
> w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2e2ab1e..de42a32 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
> {
> struct sched_domain_shared *sds;
>
> - sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> - if (sds)
> - return READ_ONCE(sds->has_idle_cores);
> + if (static_branch_likely(&sched_smt_present)) {
> + sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> + if (sds)
> + return READ_ONCE(sds->has_idle_cores);
> + }
>
> return def;
> }
> --
> 1.8.3.1
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip: sched/core] sched/fair: Optimize test_idle_cores() for !SMT
2021-03-20 22:14 ` Barry Song
` (4 preceding siblings ...)
(?)
@ 2021-03-23 15:08 ` tip-bot2 for Barry Song
-1 siblings, 0 replies; 13+ messages in thread
From: tip-bot2 for Barry Song @ 2021-03-23 15:08 UTC (permalink / raw)
To: linux-tip-commits
Cc: Barry Song, Peter Zijlstra (Intel),
Vincent Guittot, Mel Gorman, x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: c8987ae5af793a73e2c0d6ce804d8ff454ea377c
Gitweb: https://git.kernel.org/tip/c8987ae5af793a73e2c0d6ce804d8ff454ea377c
Author: Barry Song <song.bao.hua@hisilicon.com>
AuthorDate: Sun, 21 Mar 2021 11:14:32 +13:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 23 Mar 2021 16:01:59 +01:00
sched/fair: Optimize test_idle_cores() for !SMT
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.
This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.
Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.
$ numactl -N 0 hackbench -p -T -l 20000 -g $1
The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14
w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
+4.1% +8.3% +7.3% +6.3%
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com
---
kernel/sched/fair.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6aad028..aaa0dfa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
{
struct sched_domain_shared *sds;
- sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
- if (sds)
- return READ_ONCE(sds->has_idle_cores);
+ if (static_branch_likely(&sched_smt_present)) {
+ sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (sds)
+ return READ_ONCE(sds->has_idle_cores);
+ }
return def;
}
^ permalink raw reply related [flat|nested] 13+ messages in thread