[PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
@ 2021-05-26  9:10 Barry Song
  2021-05-26 12:15 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Barry Song @ 2021-05-26  9:10 UTC (permalink / raw)
  To: vincent.guittot, mingo, peterz, dietmar.eggemann, rostedt,
	bsegall, mgorman
  Cc: valentin.schneider, juri.lelli, bristot, linux-kernel,
	guodong.xu, yangyicong, tangchengchang, linuxarm, Barry Song

when waker and wakee are already in the same LLC, it is pointless to worry
about the competition caused by pulling wakee to waker's LLC domain.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 kernel/sched/fair.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3248e24a90b0..cfb1bd47acc3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6795,7 +6795,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 			new_cpu = prev_cpu;
 		}
 
-		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
+		/*
+		 * we use wake_wide to make smarter pull and avoid cruel
+		 * competition because of jam-packed tasks in waker's LLC
+		 * domain. But if waker and wakee have been already in
+		 * same LLC domain, it seems it is pointless to depend
+		 * on wake_wide
+		 */
+		want_affine = (cpus_share_cache(cpu, prev_cpu) || !wake_wide(p)) &&
+				cpumask_test_cpu(cpu, p->cpus_ptr);
 	}
 
 	rcu_read_lock();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
  2021-05-26  9:10 [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC Barry Song
@ 2021-05-26 12:15 ` Peter Zijlstra
  2021-05-26 21:38   ` Song Bao Hua (Barry Song)
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2021-05-26 12:15 UTC (permalink / raw)
  To: Barry Song
  Cc: vincent.guittot, mingo, dietmar.eggemann, rostedt, bsegall,
	mgorman, valentin.schneider, juri.lelli, bristot, linux-kernel,
	guodong.xu, yangyicong, tangchengchang, linuxarm


$subject is weird; sched/fair: is the right tag, and then start with a
capital letter.

On Wed, May 26, 2021 at 09:10:57PM +1200, Barry Song wrote:
> when waker and wakee are already in the same LLC, it is pointless to worry
> about the competition caused by pulling wakee to waker's LLC domain.

But there's more than LLC.

> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> ---
>  kernel/sched/fair.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3248e24a90b0..cfb1bd47acc3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6795,7 +6795,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>  			new_cpu = prev_cpu;
>  		}
>  
> -		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
> +		/*
> +		 * we use wake_wide to make smarter pull and avoid cruel
> +		 * competition because of jam-packed tasks in waker's LLC
> +		 * domain. But if waker and wakee have been already in
> +		 * same LLC domain, it seems it is pointless to depend
> +		 * on wake_wide
> +		 */
> +		want_affine = (cpus_share_cache(cpu, prev_cpu) || !wake_wide(p)) &&
> +				cpumask_test_cpu(cpu, p->cpus_ptr);
>  	}

And no supportive numbers...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
  2021-05-26 12:15 ` Peter Zijlstra
@ 2021-05-26 21:38   ` Song Bao Hua (Barry Song)
  2021-05-27 12:14     ` Mel Gorman
  0 siblings, 1 reply; 7+ messages in thread
From: Song Bao Hua (Barry Song) @ 2021-05-26 21:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: vincent.guittot, mingo, dietmar.eggemann, rostedt, bsegall,
	mgorman, valentin.schneider, juri.lelli, bristot, linux-kernel,
	guodong.xu, yangyicong, tangchengchang, Linuxarm



> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Thursday, May 27, 2021 12:16 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: vincent.guittot@linaro.org; mingo@redhat.com; dietmar.eggemann@arm.com;
> rostedt@goodmis.org; bsegall@google.com; mgorman@suse.de;
> valentin.schneider@arm.com; juri.lelli@redhat.com; bristot@redhat.com;
> linux-kernel@vger.kernel.org; guodong.xu@linaro.org; yangyicong
> <yangyicong@huawei.com>; tangchengchang <tangchengchang@huawei.com>;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee
> are already in same LLC
> 
> 
> $subject is weird; sched/fair: is the right tag, and then start with a
> capital letter.
> 
> On Wed, May 26, 2021 at 09:10:57PM +1200, Barry Song wrote:
> > when waker and wakee are already in the same LLC, it is pointless to worry
> > about the competition caused by pulling wakee to waker's LLC domain.
> 
> But there's more than LLC.

I suppose other concerns might be about the "idle" and "load" of
waker's cpu and wakee's prev_cpu. Here even though we disable
wake_wide(), wake_affine() still has chance to select wakee's
prev_cpu rather than pulling to waker. So disabling wake_wide()
doesn't mean we will 100% pull.

static int wake_affine(struct sched_domain *sd, struct task_struct *p,
		       int this_cpu, int prev_cpu, int sync)
{
	int target = nr_cpumask_bits;

	if (sched_feat(WA_IDLE))
		target = wake_affine_idle(this_cpu, prev_cpu, sync);

	if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits)
		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);

	if (target == nr_cpumask_bits)
		return prev_cpu;

	..
	return target;
}

Furthermore, select_idle_sibling() can also pick wakee's prev_cpu
if it is idle:

static int select_idle_sibling(struct task_struct *p, int prev, int target)
{
	...

	/*
	 * If the previous CPU is cache affine and idle, don't be stupid:
	 */
	if (prev != target && cpus_share_cache(prev, target) &&
	    (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
	    asym_fits_capacity(task_util, prev))
		return prev;
	...
}

Except those, could you please give me some clue about what else
you have concerns on?

> 
> > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > ---
> >  kernel/sched/fair.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3248e24a90b0..cfb1bd47acc3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6795,7 +6795,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu,
> int wake_flags)
> >  			new_cpu = prev_cpu;
> >  		}
> >
> > -		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
> > +		/*
> > +		 * we use wake_wide to make smarter pull and avoid cruel
> > +		 * competition because of jam-packed tasks in waker's LLC
> > +		 * domain. But if waker and wakee have been already in
> > +		 * same LLC domain, it seems it is pointless to depend
> > +		 * on wake_wide
> > +		 */
> > +		want_affine = (cpus_share_cache(cpu, prev_cpu) || !wake_wide(p)) &&
> > +				cpumask_test_cpu(cpu, p->cpus_ptr);
> >  	}
> 
> And no supportive numbers...

Sorry for the confusion.

I actually put some supportive numbers at the below thread which
derived this patch:
https://lore.kernel.org/lkml/bbc339cef87e4009b6d56ee37e202daf@hisilicon.com/

when I tried to give Dietmar some pgbench data in that thread,
I found in kunpeng920, while software ran in one die/numa with
24cores sharing LLC, disabling wake_wide() brought the best
pgbench result.

                llc_as_factor          don't_use_wake_wide
Hmean     1     10869.27 (   0.00%)    10723.08 *  -1.34%*
Hmean     8     19580.59 (   0.00%)    19469.34 *  -0.57%*
Hmean     12    29643.56 (   0.00%)    29520.16 *  -0.42%*
Hmean     24    43194.47 (   0.00%)    43774.78 *   1.34%*
Hmean     32    40163.23 (   0.00%)    40742.93 *   1.44%*
Hmean     48    42249.29 (   0.00%)    48329.00 *  14.39%*

The test was done by https://github.com/gormanm/mmtests
and
./run-mmtests.sh --config ./configs/config-db-pgbench-timed-ro-medium test_tag

Commit "sched: Implement smarter wake-affine logic"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62470419
says pgbench can improve by wake_wide(), but I've actually
seen the opposite result while waker and wakee are already
in one LLC.

Not quite sure if it is specific to kunpeng920, perhaps
I need to run the same test on some x86 machines.

Thanks
Barry

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
  2021-05-26 21:38   ` Song Bao Hua (Barry Song)
@ 2021-05-27 12:14     ` Mel Gorman
  2021-05-31 22:21       ` Song Bao Hua (Barry Song)
  0 siblings, 1 reply; 7+ messages in thread
From: Mel Gorman @ 2021-05-27 12:14 UTC (permalink / raw)
  To: Song Bao Hua (Barry Song)
  Cc: Peter Zijlstra, vincent.guittot, mingo, dietmar.eggemann,
	rostedt, bsegall, valentin.schneider, juri.lelli, bristot,
	linux-kernel, guodong.xu, yangyicong, tangchengchang, Linuxarm

On Wed, May 26, 2021 at 09:38:19PM +0000, Song Bao Hua (Barry Song) wrote:
> > And no supportive numbers...
> 
> Sorry for the confusion.
> 
> I actually put some supportive numbers at the below thread which
> derived this patch:
> https://lore.kernel.org/lkml/bbc339cef87e4009b6d56ee37e202daf@hisilicon.com/
> 
> when I tried to give Dietmar some pgbench data in that thread,
> I found in kunpeng920, while software ran in one die/numa with
> 24cores sharing LLC, disabling wake_wide() brought the best
> pgbench result.
> 
>                 llc_as_factor          don't_use_wake_wide
> Hmean     1     10869.27 (   0.00%)    10723.08 *  -1.34%*
> Hmean     8     19580.59 (   0.00%)    19469.34 *  -0.57%*
> Hmean     12    29643.56 (   0.00%)    29520.16 *  -0.42%*
> Hmean     24    43194.47 (   0.00%)    43774.78 *   1.34%*
> Hmean     32    40163.23 (   0.00%)    40742.93 *   1.44%*
> Hmean     48    42249.29 (   0.00%)    48329.00 *  14.39%*
> 
> The test was done by https://github.com/gormanm/mmtests
> and
> ./run-mmtests.sh --config ./configs/config-db-pgbench-timed-ro-medium test_tag
> 

Out of curiousity, I briefly tested this on a Zen2 machine which also
has multiple LLCs per node. Only tbench4 was executed and I cancelled
the other tests because of results like this

tbench4
                          5.13.0-rc2             5.13.0-rc2
                             vanilla sched-nowakewidellc-v1r1
Hmean     1        349.34 (   0.00%)      334.18 *  -4.34%*
Hmean     2        668.49 (   0.00%)      659.12 *  -1.40%*
Hmean     4       1307.90 (   0.00%)     1274.35 *  -2.57%*
Hmean     8       2482.08 (   0.00%)     2377.84 *  -4.20%*
Hmean     16      4460.06 (   0.00%)     4656.28 *   4.40%*
Hmean     32      9463.76 (   0.00%)     8909.61 *  -5.86%*
Hmean     64     15865.30 (   0.00%)    19682.77 *  24.06%*
Hmean     128    24350.06 (   0.00%)    21593.20 * -11.32%*
Hmean     256    39593.90 (   0.00%)    31389.33 * -20.72%*
Hmean     512    37851.54 (   0.00%)    30260.23 * -20.06%*

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
  2021-05-27 12:14     ` Mel Gorman
@ 2021-05-31 22:21       ` Song Bao Hua (Barry Song)
  2021-06-01  7:59         ` Mel Gorman
  0 siblings, 1 reply; 7+ messages in thread
From: Song Bao Hua (Barry Song) @ 2021-05-31 22:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Peter Zijlstra, vincent.guittot, mingo, dietmar.eggemann,
	rostedt, bsegall, valentin.schneider, juri.lelli, bristot,
	linux-kernel, guodong.xu, yangyicong, tangchengchang, Linuxarm



> -----Original Message-----
> From: Mel Gorman [mailto:mgorman@suse.de]
> Sent: Friday, May 28, 2021 12:14 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: Peter Zijlstra <peterz@infradead.org>; vincent.guittot@linaro.org;
> mingo@redhat.com; dietmar.eggemann@arm.com; rostedt@goodmis.org;
> bsegall@google.com; valentin.schneider@arm.com; juri.lelli@redhat.com;
> bristot@redhat.com; linux-kernel@vger.kernel.org; guodong.xu@linaro.org;
> yangyicong <yangyicong@huawei.com>; tangchengchang
> <tangchengchang@huawei.com>; Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee
> are already in same LLC
> 
> On Wed, May 26, 2021 at 09:38:19PM +0000, Song Bao Hua (Barry Song) wrote:
> > > And no supportive numbers...
> >
> > Sorry for the confusion.
> >
> > I actually put some supportive numbers at the below thread which
> > derived this patch:
> >
> https://lore.kernel.org/lkml/bbc339cef87e4009b6d56ee37e202daf@hisilicon.co
> m/
> >
> > when I tried to give Dietmar some pgbench data in that thread,
> > I found in kunpeng920, while software ran in one die/numa with
> > 24cores sharing LLC, disabling wake_wide() brought the best
> > pgbench result.
> >
> >                 llc_as_factor          don't_use_wake_wide
> > Hmean     1     10869.27 (   0.00%)    10723.08 *  -1.34%*
> > Hmean     8     19580.59 (   0.00%)    19469.34 *  -0.57%*
> > Hmean     12    29643.56 (   0.00%)    29520.16 *  -0.42%*
> > Hmean     24    43194.47 (   0.00%)    43774.78 *   1.34%*
> > Hmean     32    40163.23 (   0.00%)    40742.93 *   1.44%*
> > Hmean     48    42249.29 (   0.00%)    48329.00 *  14.39%*
> >
> > The test was done by https://github.com/gormanm/mmtests
> > and
> > ./run-mmtests.sh --config ./configs/config-db-pgbench-timed-ro-medium
> test_tag
> >
> 
> Out of curiousity, I briefly tested this on a Zen2 machine which also
> has multiple LLCs per node. Only tbench4 was executed and I cancelled
> the other tests because of results like this
> 
> tbench4
>                           5.13.0-rc2             5.13.0-rc2
>                              vanilla sched-nowakewidellc-v1r1
> Hmean     1        349.34 (   0.00%)      334.18 *  -4.34%*
> Hmean     2        668.49 (   0.00%)      659.12 *  -1.40%*
> Hmean     4       1307.90 (   0.00%)     1274.35 *  -2.57%*
> Hmean     8       2482.08 (   0.00%)     2377.84 *  -4.20%*
> Hmean     16      4460.06 (   0.00%)     4656.28 *   4.40%*
> Hmean     32      9463.76 (   0.00%)     8909.61 *  -5.86%*
> Hmean     64     15865.30 (   0.00%)    19682.77 *  24.06%*
> Hmean     128    24350.06 (   0.00%)    21593.20 * -11.32%*
> Hmean     256    39593.90 (   0.00%)    31389.33 * -20.72%*
> Hmean     512    37851.54 (   0.00%)    30260.23 * -20.06%*

Thanks, Mel.

I guess a major difference between your testing and mine
is that I was actually running tests on CPUs sharing LLC
rather than in CPUs which are crossing several LLCs.

In the tested machine kunpeng920, 24 cores share LLC and
become one NUMA, though we have 4 numa, the benchmark was
running in one LLC domain and one NUMA only. This is supposed
to benefit those use cases using "numactl -N x tasks" to bind
tasks, which is quite common.

I also tried to reproduce tbench4 in my desktop with intel
i9 10cores(20 threads) sharing only one 20MB LLC:

$ lstopo -c
Machine (15GB total) cpuset=0x000fffff
  Package L#0 cpuset=0x000fffff
    NUMANode L#0 (P#0 15GB) cpuset=0x000fffff
    L3 L#0 (20MB) cpuset=0x000fffff
      L2 L#0 (256KB) cpuset=0x00000401
        L1d L#0 (32KB) cpuset=0x00000401
          L1i L#0 (32KB) cpuset=0x00000401
            Core L#0 cpuset=0x00000401
              PU L#0 (P#0) cpuset=0x00000001
              PU L#1 (P#10) cpuset=0x00000400
      L2 L#1 (256KB) cpuset=0x00000802
        L1d L#1 (32KB) cpuset=0x00000802
          L1i L#1 (32KB) cpuset=0x00000802
            Core L#1 cpuset=0x00000802
              PU L#2 (P#1) cpuset=0x00000002
              PU L#3 (P#11) cpuset=0x00000800
      L2 L#2 (256KB) cpuset=0x00001004
        L1d L#2 (32KB) cpuset=0x00001004
          L1i L#2 (32KB) cpuset=0x00001004
            Core L#2 cpuset=0x00001004
              PU L#4 (P#2) cpuset=0x00000004
              PU L#5 (P#12) cpuset=0x00001000
      L2 L#3 (256KB) cpuset=0x00002008
        L1d L#3 (32KB) cpuset=0x00002008
          L1i L#3 (32KB) cpuset=0x00002008
            Core L#3 cpuset=0x00002008
              PU L#6 (P#3) cpuset=0x00000008
              PU L#7 (P#13) cpuset=0x00002000
      L2 L#4 (256KB) cpuset=0x00004010
        L1d L#4 (32KB) cpuset=0x00004010
          L1i L#4 (32KB) cpuset=0x00004010
            Core L#4 cpuset=0x00004010
              PU L#8 (P#4) cpuset=0x00000010
              PU L#9 (P#14) cpuset=0x00004000
      L2 L#5 (256KB) cpuset=0x00008020
        L1d L#5 (32KB) cpuset=0x00008020
          L1i L#5 (32KB) cpuset=0x00008020
            Core L#5 cpuset=0x00008020
              PU L#10 (P#5) cpuset=0x00000020
              PU L#11 (P#15) cpuset=0x00008000
      L2 L#6 (256KB) cpuset=0x00010040
        L1d L#6 (32KB) cpuset=0x00010040
          L1i L#6 (32KB) cpuset=0x00010040
            Core L#6 cpuset=0x00010040
              PU L#12 (P#6) cpuset=0x00000040
              PU L#13 (P#16) cpuset=0x00010000
      L2 L#7 (256KB) cpuset=0x00020080
        L1d L#7 (32KB) cpuset=0x00020080
          L1i L#7 (32KB) cpuset=0x00020080
            Core L#7 cpuset=0x00020080
              PU L#14 (P#7) cpuset=0x00000080
              PU L#15 (P#17) cpuset=0x00020000
      L2 L#8 (256KB) cpuset=0x00040100
        L1d L#8 (32KB) cpuset=0x00040100
          L1i L#8 (32KB) cpuset=0x00040100
            Core L#8 cpuset=0x00040100
              PU L#16 (P#8) cpuset=0x00000100
              PU L#17 (P#18) cpuset=0x00040000
      L2 L#9 (256KB) cpuset=0x00080200
        L1d L#9 (32KB) cpuset=0x00080200
          L1i L#9 (32KB) cpuset=0x00080200
            Core L#9 cpuset=0x00080200
              PU L#18 (P#9) cpuset=0x00000200
              PU L#19 (P#19) cpuset=0x00080000

The benchmark of tbenchs is still positive:

tbench4

                           5.13-rc4               5.13-rc4
                                     disable-llc-wakewide/

Hmean     1       514.87 (   0.00%)      505.17 *  -1.88%*
Hmean     2       914.45 (   0.00%)      918.45 *   0.44%*
Hmean     4      1483.81 (   0.00%)     1485.38 *   0.11%*
Hmean     8      2211.62 (   0.00%)     2236.02 *   1.10%*
Hmean     16     2129.80 (   0.00%)     2450.81 *  15.07%*
Hmean     32     5098.35 (   0.00%)     5085.20 *  -0.26%*
Hmean     64     4797.62 (   0.00%)     4801.34 *   0.08%*
Hmean     80     4802.89 (   0.00%)     4780.40 *  -0.47%*

I guess something which work across several LLC domains
cause performance regression.

I wonder how your test will be like if you pin the testing
to CPUs within one LLC?

Thanks
Barry


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
  2021-05-31 22:21       ` Song Bao Hua (Barry Song)
@ 2021-06-01  7:59         ` Mel Gorman
  2021-06-01  8:09           ` Song Bao Hua (Barry Song)
  0 siblings, 1 reply; 7+ messages in thread
From: Mel Gorman @ 2021-06-01  7:59 UTC (permalink / raw)
  To: Song Bao Hua (Barry Song)
  Cc: Peter Zijlstra, vincent.guittot, mingo, dietmar.eggemann,
	rostedt, bsegall, valentin.schneider, juri.lelli, bristot,
	linux-kernel, guodong.xu, yangyicong, tangchengchang, Linuxarm

On Mon, May 31, 2021 at 10:21:55PM +0000, Song Bao Hua (Barry Song) wrote:
> The benchmark of tbenchs is still positive:
> 
> tbench4
> 
>                            5.13-rc4               5.13-rc4
>                                      disable-llc-wakewide/
> 
> Hmean     1       514.87 (   0.00%)      505.17 *  -1.88%*
> Hmean     2       914.45 (   0.00%)      918.45 *   0.44%*
> Hmean     4      1483.81 (   0.00%)     1485.38 *   0.11%*
> Hmean     8      2211.62 (   0.00%)     2236.02 *   1.10%*
> Hmean     16     2129.80 (   0.00%)     2450.81 *  15.07%*
> Hmean     32     5098.35 (   0.00%)     5085.20 *  -0.26%*
> Hmean     64     4797.62 (   0.00%)     4801.34 *   0.08%*
> Hmean     80     4802.89 (   0.00%)     4780.40 *  -0.47%*
> 
> I guess something which work across several LLC domains
> cause performance regression.
> 
> I wonder how your test will be like if you pin the testing
> to CPUs within one LLC?
> 

While I could do this, what would be the benefit? Running within one LLC
would be running the test in one small fraction of the entire machine as
the machine has multiple LLCs per NUMA node. A patch dealing with how the
scheduler works with respect to LLC should take different configurations
into consideration as best as possible.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
  2021-06-01  7:59         ` Mel Gorman
@ 2021-06-01  8:09           ` Song Bao Hua (Barry Song)
  0 siblings, 0 replies; 7+ messages in thread
From: Song Bao Hua (Barry Song) @ 2021-06-01  8:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Peter Zijlstra, vincent.guittot, mingo, dietmar.eggemann,
	rostedt, bsegall, valentin.schneider, juri.lelli, bristot,
	linux-kernel, guodong.xu, yangyicong, tangchengchang, Linuxarm



> -----Original Message-----
> From: Mel Gorman [mailto:mgorman@suse.de]
> Sent: Tuesday, June 1, 2021 7:59 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: Peter Zijlstra <peterz@infradead.org>; vincent.guittot@linaro.org;
> mingo@redhat.com; dietmar.eggemann@arm.com; rostedt@goodmis.org;
> bsegall@google.com; valentin.schneider@arm.com; juri.lelli@redhat.com;
> bristot@redhat.com; linux-kernel@vger.kernel.org; guodong.xu@linaro.org;
> yangyicong <yangyicong@huawei.com>; tangchengchang
> <tangchengchang@huawei.com>; Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee
> are already in same LLC
> 
> On Mon, May 31, 2021 at 10:21:55PM +0000, Song Bao Hua (Barry Song) wrote:
> > The benchmark of tbenchs is still positive:
> >
> > tbench4
> >
> >                            5.13-rc4               5.13-rc4
> >                                      disable-llc-wakewide/
> >
> > Hmean     1       514.87 (   0.00%)      505.17 *  -1.88%*
> > Hmean     2       914.45 (   0.00%)      918.45 *   0.44%*
> > Hmean     4      1483.81 (   0.00%)     1485.38 *   0.11%*
> > Hmean     8      2211.62 (   0.00%)     2236.02 *   1.10%*
> > Hmean     16     2129.80 (   0.00%)     2450.81 *  15.07%*
> > Hmean     32     5098.35 (   0.00%)     5085.20 *  -0.26%*
> > Hmean     64     4797.62 (   0.00%)     4801.34 *   0.08%*
> > Hmean     80     4802.89 (   0.00%)     4780.40 *  -0.47%*
> >
> > I guess something which work across several LLC domains
> > cause performance regression.
> >
> > I wonder how your test will be like if you pin the testing
> > to CPUs within one LLC?
> >
> 
> While I could do this, what would be the benefit? Running within one LLC
> would be running the test in one small fraction of the entire machine as
> the machine has multiple LLCs per NUMA node. A patch dealing with how the
> scheduler works with respect to LLC should take different configurations
> into consideration as best as possible.

I do agree with this. And I do admit this patch is lacking of
consideration and testing of supporting various configurations.
But more input of numbers will be helpful on figuring out a better
solution which can either extend to wider configurations or shrink
to some specific machines like those whose whole numa share
LLC or desktops whose all cpus share LLC in v2. eg:
My pc with the newest i9 intel has all 10 cpus(20 threads) sharing
LLC.

> 
> --
> Mel Gorman
> SUSE Labs

Thanks
Barry


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-01  8:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-26  9:10 [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC Barry Song
2021-05-26 12:15 ` Peter Zijlstra
2021-05-26 21:38   ` Song Bao Hua (Barry Song)
2021-05-27 12:14     ` Mel Gorman
2021-05-31 22:21       ` Song Bao Hua (Barry Song)
2021-06-01  7:59         ` Mel Gorman
2021-06-01  8:09           ` Song Bao Hua (Barry Song)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).