linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"bsegall@google.com" <bsegall@google.com>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"valentin.schneider@arm.com" <valentin.schneider@arm.com>,
	"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
	"bristot@redhat.com" <bristot@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"guodong.xu@linaro.org" <guodong.xu@linaro.org>,
	yangyicong <yangyicong@huawei.com>,
	tangchengchang <tangchengchang@huawei.com>,
	Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
Date: Wed, 26 May 2021 21:38:19 +0000	[thread overview]
Message-ID: <7dd00a98d6454d5e92a7d9b936d1aa1c@hisilicon.com> (raw)
In-Reply-To: <YK474+4xpYlAha+2@hirez.programming.kicks-ass.net>



> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Thursday, May 27, 2021 12:16 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: vincent.guittot@linaro.org; mingo@redhat.com; dietmar.eggemann@arm.com;
> rostedt@goodmis.org; bsegall@google.com; mgorman@suse.de;
> valentin.schneider@arm.com; juri.lelli@redhat.com; bristot@redhat.com;
> linux-kernel@vger.kernel.org; guodong.xu@linaro.org; yangyicong
> <yangyicong@huawei.com>; tangchengchang <tangchengchang@huawei.com>;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee
> are already in same LLC
> 
> 
> $subject is weird; sched/fair: is the right tag, and then start with a
> capital letter.
> 
> On Wed, May 26, 2021 at 09:10:57PM +1200, Barry Song wrote:
> > when waker and wakee are already in the same LLC, it is pointless to worry
> > about the competition caused by pulling wakee to waker's LLC domain.
> 
> But there's more than LLC.

I suppose other concerns might be about the "idle" and "load" of
waker's cpu and wakee's prev_cpu. Here even though we disable
wake_wide(), wake_affine() still has chance to select wakee's
prev_cpu rather than pulling to waker. So disabling wake_wide()
doesn't mean we will 100% pull.

static int wake_affine(struct sched_domain *sd, struct task_struct *p,
		       int this_cpu, int prev_cpu, int sync)
{
	int target = nr_cpumask_bits;

	if (sched_feat(WA_IDLE))
		target = wake_affine_idle(this_cpu, prev_cpu, sync);

	if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits)
		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);

	if (target == nr_cpumask_bits)
		return prev_cpu;

	..
	return target;
}

Furthermore, select_idle_sibling() can also pick wakee's prev_cpu
if it is idle:

static int select_idle_sibling(struct task_struct *p, int prev, int target)
{
	...

	/*
	 * If the previous CPU is cache affine and idle, don't be stupid:
	 */
	if (prev != target && cpus_share_cache(prev, target) &&
	    (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
	    asym_fits_capacity(task_util, prev))
		return prev;
	...
}

Except those, could you please give me some clue about what else
you have concerns on?

> 
> > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > ---
> >  kernel/sched/fair.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3248e24a90b0..cfb1bd47acc3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6795,7 +6795,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu,
> int wake_flags)
> >  			new_cpu = prev_cpu;
> >  		}
> >
> > -		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
> > +		/*
> > +		 * we use wake_wide to make smarter pull and avoid cruel
> > +		 * competition because of jam-packed tasks in waker's LLC
> > +		 * domain. But if waker and wakee have been already in
> > +		 * same LLC domain, it seems it is pointless to depend
> > +		 * on wake_wide
> > +		 */
> > +		want_affine = (cpus_share_cache(cpu, prev_cpu) || !wake_wide(p)) &&
> > +				cpumask_test_cpu(cpu, p->cpus_ptr);
> >  	}
> 
> And no supportive numbers...

Sorry for the confusion.

I actually put some supportive numbers at the below thread which
derived this patch:
https://lore.kernel.org/lkml/bbc339cef87e4009b6d56ee37e202daf@hisilicon.com/

when I tried to give Dietmar some pgbench data in that thread,
I found in kunpeng920, while software ran in one die/numa with
24cores sharing LLC, disabling wake_wide() brought the best
pgbench result.

                llc_as_factor          don't_use_wake_wide
Hmean     1     10869.27 (   0.00%)    10723.08 *  -1.34%*
Hmean     8     19580.59 (   0.00%)    19469.34 *  -0.57%*
Hmean     12    29643.56 (   0.00%)    29520.16 *  -0.42%*
Hmean     24    43194.47 (   0.00%)    43774.78 *   1.34%*
Hmean     32    40163.23 (   0.00%)    40742.93 *   1.44%*
Hmean     48    42249.29 (   0.00%)    48329.00 *  14.39%*

The test was done by https://github.com/gormanm/mmtests
and
./run-mmtests.sh --config ./configs/config-db-pgbench-timed-ro-medium test_tag

Commit "sched: Implement smarter wake-affine logic"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62470419
says pgbench can improve by wake_wide(), but I've actually
seen the opposite result while waker and wakee are already
in one LLC.

Not quite sure if it is specific to kunpeng920, perhaps
I need to run the same test on some x86 machines.

Thanks
Barry

  reply	other threads:[~2021-05-26 21:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26  9:10 [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC Barry Song
2021-05-26 12:15 ` Peter Zijlstra
2021-05-26 21:38   ` Song Bao Hua (Barry Song) [this message]
2021-05-27 12:14     ` Mel Gorman
2021-05-31 22:21       ` Song Bao Hua (Barry Song)
2021-06-01  7:59         ` Mel Gorman
2021-06-01  8:09           ` Song Bao Hua (Barry Song)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7dd00a98d6454d5e92a7d9b936d1aa1c@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=guodong.xu@linaro.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tangchengchang@huawei.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yangyicong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).