From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=SFki=J6=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-9.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5D5B0C433ED
	for <linux-arm-kernel@archiver.kernel.org>; Mon,  3 May 2021 11:38:48 +0000 (UTC)
Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 7A3E061244
	for <linux-arm-kernel@archiver.kernel.org>; Mon,  3 May 2021 11:38:47 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A3E061244
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=hisilicon.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding
	:Content-Type:List-Subscribe:List-Help:List-Post:List-Archive:
	List-Unsubscribe:List-Id:MIME-Version:References:Message-ID:Date:Subject:CC:
	To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:
	Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:List-Owner;
	 bh=LsgC+txvvnKP+i5LqUek+RXq93+uIkNd52k0WaF89m8=; b=c9FvZjfEuSuczhdz2Gt35Kkw/
	4FH0RGcEfGk0fOTFGWEcuKF5WIj5qVg/3DdlSQrkmzlqklVwZosKmWsf25CbilDq5j5a6qxvEkm07
	fq0NgP+VWFXI1a0lgz6uMNtSpSm8Jw8oaFsZEl8a/VTTxOnlcfLKLwPVRziFpTChGU88E5WHYR48a
	IQ3Peg7AHdgEQLvYxN73KjATYbxWupe+ao9vAkDR1clApdRo61XBW//S2s2PszEaYB2KXHrWBrnJJ
	+79Q1bwvO02VtS/VryLg/c1f0V7rMtR3Q5QNVtHbmXh5/bkpzUMSwLO+VaO7McglX/NX/5DW64Y/H
	jBVRl9+Cg==;
Received: from localhost ([::1] helo=desiato.infradead.org)
	by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux))
	id 1ldWry-00DlL3-BL; Mon, 03 May 2021 11:36:12 +0000
Received: from bombadil.infradead.org ([2607:7c80:54:e::133])
 by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux))
 id 1ldWrM-00DlIv-MR
 for linux-arm-kernel@desiato.infradead.org; Mon, 03 May 2021 11:35:35 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=infradead.org; s=bombadil.20210309; h=MIME-Version:
 Content-Transfer-Encoding:Content-Type:References:Message-ID:Date:Subject:CC:
 To:From:Sender:Reply-To:Content-ID:Content-Description:In-Reply-To;
 bh=u4RK0ibjcEnkY3bUOKdqruG3Gn+F/C/dKyApK1ekQGY=; b=Y/wljKiQQg/Xgt2fZwbugifgeO
 4eJA5ZoIAcTgWfvUWFYHGKnp79x1GzlfwYgH6KEYrE9pSfUCzAHAUf0VEWOGmZ89Gv4sZJ5ltqQ/Z
 CvZv1RAOH8ntsBpM8yxjJ2hTcXKxTLRs04adjP6WGs5loUQOKCIFUS2h+7TPtC4O5hwf5hRa55Why
 RU7fxDq8/qbbyNLnrO2K90Ft0aLTe+IlPudxXiafpi/NLyUa3ez2udX4d5y9xEXhhxQRorOqtqL5I
 37sWLrT26w2SBkv8vqPS8QEtjyGQEm7GdrWsdd0ZczxnftwZRPVVv8uMxPuPzdGZ8octkgFWUm7xA
 we5WKO6A==;
Received: from frasgout.his.huawei.com ([185.176.79.56])
 by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux))
 id 1ldWrH-00315X-NJ
 for linux-arm-kernel@lists.infradead.org; Mon, 03 May 2021 11:35:30 +0000
Received: from fraeml744-chm.china.huawei.com (unknown [172.18.147.226])
 by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4FYgjr62kBz6wlP8;
 Mon,  3 May 2021 19:29:36 +0800 (CST)
Received: from lhreml712-chm.china.huawei.com (10.201.108.63) by
 fraeml744-chm.china.huawei.com (10.206.15.225) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2176.2; Mon, 3 May 2021 13:35:22 +0200
Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by
 lhreml712-chm.china.huawei.com (10.201.108.63) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id
 15.1.2176.2; Mon, 3 May 2021 12:35:20 +0100
Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by
 dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2176.012;
 Mon, 3 May 2021 19:35:18 +0800
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>, Vincent Guittot
 <vincent.guittot@linaro.org>
CC: "tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
 "catalin.marinas@arm.com" <catalin.marinas@arm.com>, "will@kernel.org"
 <will@kernel.org>, "rjw@rjwysocki.net" <rjw@rjwysocki.net>, "bp@alien8.de"
 <bp@alien8.de>, "tglx@linutronix.de" <tglx@linutronix.de>, "mingo@redhat.com"
 <mingo@redhat.com>, "lenb@kernel.org" <lenb@kernel.org>,
 "peterz@infradead.org" <peterz@infradead.org>, "rostedt@goodmis.org"
 <rostedt@goodmis.org>, "bsegall@google.com" <bsegall@google.com>,
 "mgorman@suse.de" <mgorman@suse.de>, "msys.mizuma@gmail.com"
 <msys.mizuma@gmail.com>, "valentin.schneider@arm.com"
 <valentin.schneider@arm.com>, "gregkh@linuxfoundation.org"
 <gregkh@linuxfoundation.org>, Jonathan Cameron <jonathan.cameron@huawei.com>, 
 "juri.lelli@redhat.com" <juri.lelli@redhat.com>, "mark.rutland@arm.com"
 <mark.rutland@arm.com>, "sudeep.holla@arm.com" <sudeep.holla@arm.com>,
 "aubrey.li@linux.intel.com" <aubrey.li@linux.intel.com>,
 "linux-arm-kernel@lists.infradead.org"
 <linux-arm-kernel@lists.infradead.org>, "linux-kernel@vger.kernel.org"
 <linux-kernel@vger.kernel.org>, "linux-acpi@vger.kernel.org"
 <linux-acpi@vger.kernel.org>, "x86@kernel.org" <x86@kernel.org>, "xuwei (O)"
 <xuwei5@huawei.com>, "Zengtao (B)" <prime.zeng@hisilicon.com>,
 "guodong.xu@linaro.org" <guodong.xu@linaro.org>, yangyicong
 <yangyicong@huawei.com>, "Liguozhu (Kenneth)" <liguozhu@hisilicon.com>,
 "linuxarm@openeuler.org" <linuxarm@openeuler.org>, "hpa@zytor.com"
 <hpa@zytor.com>
Subject: RE: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
 within one LLC
Thread-Topic: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
 within one LLC
Thread-Index: AQHXPa2htgkN1X7dCEatlQZ1LQ756arRBrBQgACVCdA=
Date: Mon, 3 May 2021 11:35:18 +0000
Message-ID: <4d1f063504b1420c9f836d1f1a7f8e77@hisilicon.com>
References: <20210420001844.9116-1-song.bao.hua@hisilicon.com>
 <20210420001844.9116-4-song.bao.hua@hisilicon.com>
 <80f489f9-8c88-95d8-8241-f0cfd2c2ac66@arm.com>
 <b42c762a287b4360bfa3179a5c7c3e8c@hisilicon.com>
 <CAKfTPtC51eO2mAuW6mHQ-SdznAtfDL3D4UOs4HmnXaPOOCN_cA@mail.gmail.com>
 <8b5277d9-e367-566d-6bd1-44ac78d21d3f@arm.com>
 <185746c4d02a485ca8f3509439328b26@hisilicon.com>
 <d31a65af-d1d5-5fd1-276c-d2318cdba078@arm.com> 
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.126.203.132]
MIME-Version: 1.0
X-CFilter-Loop: Reflected
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20210503_043528_064416_8CF1E245 
X-CRM114-Status: GOOD (  34.58  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org


> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Monday, May 3, 2021 6:12 PM
> To: 'Dietmar Eggemann' <dietmar.eggemann@arm.com>; Vincent Guittot
> <vincent.guittot@linaro.org>
> Cc: tim.c.chen@linux.intel.com; catalin.marinas@arm.com; will@kernel.org;
> rjw@rjwysocki.net; bp@alien8.de; tglx@linutronix.de; mingo@redhat.com;
> lenb@kernel.org; peterz@infradead.org; rostedt@goodmis.org;
> bsegall@google.com; mgorman@suse.de; msys.mizuma@gmail.com;
> valentin.schneider@arm.com; gregkh@linuxfoundation.org; Jonathan Cameron
> <jonathan.cameron@huawei.com>; juri.lelli@redhat.com; mark.rutland@arm.com;
> sudeep.holla@arm.com; aubrey.li@linux.intel.com;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-acpi@vger.kernel.org; x86@kernel.org; xuwei (O) <xuwei5@huawei.com>;
> Zengtao (B) <prime.zeng@hisilicon.com>; guodong.xu@linaro.org; yangyicong
> <yangyicong@huawei.com>; Liguozhu (Kenneth) <liguozhu@hisilicon.com>;
> linuxarm@openeuler.org; hpa@zytor.com
> Subject: RE: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
> within one LLC
> 
> 
> 
> > -----Original Message-----
> > From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> > Sent: Friday, April 30, 2021 10:43 PM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Vincent Guittot
> > <vincent.guittot@linaro.org>
> > Cc: tim.c.chen@linux.intel.com; catalin.marinas@arm.com; will@kernel.org;
> > rjw@rjwysocki.net; bp@alien8.de; tglx@linutronix.de; mingo@redhat.com;
> > lenb@kernel.org; peterz@infradead.org; rostedt@goodmis.org;
> > bsegall@google.com; mgorman@suse.de; msys.mizuma@gmail.com;
> > valentin.schneider@arm.com; gregkh@linuxfoundation.org; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; juri.lelli@redhat.com;
> mark.rutland@arm.com;
> > sudeep.holla@arm.com; aubrey.li@linux.intel.com;
> > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> > linux-acpi@vger.kernel.org; x86@kernel.org; xuwei (O) <xuwei5@huawei.com>;
> > Zengtao (B) <prime.zeng@hisilicon.com>; guodong.xu@linaro.org; yangyicong
> > <yangyicong@huawei.com>; Liguozhu (Kenneth) <liguozhu@hisilicon.com>;
> > linuxarm@openeuler.org; hpa@zytor.com
> > Subject: Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
> > within one LLC
> >
> > On 29/04/2021 00:41, Song Bao Hua (Barry Song) wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> >
> > [...]
> >
> > >>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> > >>
> > >> [...]
> > >>
> > >>>>> On 20/04/2021 02:18, Barry Song wrote:
> >
> > [...]
> >
> > > Though we will never go to slow path, wake_wide() will affect want_affine,
> > > so eventually affect the "new_cpu"?
> >
> > yes.
> >
> > >
> > > 	for_each_domain(cpu, tmp) {
> > > 		/*
> > > 		 * If both 'cpu' and 'prev_cpu' are part of this domain,
> > > 		 * cpu is a valid SD_WAKE_AFFINE target.
> > > 		 */
> > > 		if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
> > > 		    cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
> > > 			if (cpu != prev_cpu)
> > > 				new_cpu = wake_affine(tmp, p, cpu, prev_cpu, sync);
> > >
> > > 			sd = NULL; /* Prefer wake_affine over balance flags */
> > > 			break;
> > > 		}
> > >
> > > 		if (tmp->flags & sd_flag)
> > > 			sd = tmp;
> > > 		else if (!want_affine)
> > > 			break;
> > > 	}
> > >
> > > If wake_affine is false, the above won't execute, new_cpu(target) will
> > > always be "prev_cpu"? so when task size > cluster size in wake_wide(),
> > > this means we won't pull the wakee to the cluster of waker? It seems
> > > sensible.
> >
> > What is `task size` here?
> >
> > The criterion is `!(slave < factor || master < slave * factor)` or
> > `slave >= factor && master >= slave * factor` to wake wide.
> >
> 
> Yes. For "task size", I actually mean a bundle of waker-wakee tasks
> which can make "slave >= factor && master >= slave * factor" either
> true or false, then change the target cpu where we are going to scan
> from.
> Now since I have moved to cluster level when tasks have been in same
> LLC level, it seems it would be more sensible to use "cluster_size" as
> factor?
> 
> > I see that since you effectively change the sched domain size from LLC
> > to CLUSTER (e.g. 24->6) for wakeups with cpu and prev_cpu sharing LLC
> > (hence the `numactl -N 0` in your workload), wake_wide() has to take
> > CLUSTER size into consideration.
> >
> > I was wondering if you saw wake_wide() returning 1 with your use cases:
> >
> > numactl -N 0 /usr/lib/lmbench/bin/stream -P [6,12] -M 1024M -N 5
> 
> I couldn't make wake_wide return 1 by the above stream command.
> And I can't reproduce it by a 1:1(monogamous) hackbench "-f 1".
> 
> But I am able to reproduce this issue by a M:N hackbench, for example:
> 
> numactl -N 0 hackbench -p -T -f 10 -l 20000 -g 1
> 
> hackbench will create 10 senders which will send messages to 10
> receivers. (Each sender can send messages to all 10 receivers.)
> 
> I've often seen flips like:
> waker wakee
> 1501  39
> 1509  17
> 11   1320
> 13   2016
> 
> 11, 13, 17 is smaller than LLC but larger than cluster. So the wake_wide()
> using cluster factor will return 1, on the other hand, if we always use
> llc_size as factor, it will return 0.
> 
> However, it seems the change in wake_wide() could bring some negative
> influence to M:N relationship(-f 10) according to tests made today by:
> 
> numactl -N 0 hackbench -p -T -f 10 -l 20000 -g $1
> 
> g             =      1     2       3       4
> cluster_size     0.5768 0.6578  0.8117 1.0119
> LLC_size         0.5479 0.6162  0.6922 0.7754
> 
> Always using llc_size as factor in wake_wide still shows better result
> in the 10:10 polygamous hackbench.
> 
> So it seems the `slave >= factor && master >= slave * factor` isn't
> a suitable criterion for cluster size?

On the other hand, according to "sched: Implement smarter wake-affine logic"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62470419

Proper factor in wake_wide is mainly beneficial of 1:n tasks like postgresql/pgbench.
So using the smaller cluster size as factor might help make wake_affine false so
improve pgbench.

>From the commit log, while clients =  2*cpus, the commit made the biggest
improvement. In my case, It should be clients=48 for a machine whose LLC
size is 24.

In Linux, I created a 240MB database and ran "pgbench -c 48 -S -T 20 pgbench"
under two different scenarios:
1. page cache always hit, so no real I/O for database read
2. echo 3 > /proc/sys/vm/drop_caches

For case 1, using cluster_size and using llc_size will result in similar
tps= ~108000, all of 24 cpus have 100% cpu utilization.

For case 2, using llc_size still shows better performance.

tps for each test round(cluster size as factor in wake_wide):
1398.450887 1275.020401 1632.542437 1412.241627 1611.095692 1381.354294 1539.877146
avg tps = 1464

tps for each test round(llc size as factor in wake_wide):
1718.402983 1443.169823 1502.353823 1607.415861 1597.396924 1745.651814 1876.802168
avg tps = 1641  (+12%)

so it seems using cluster_size as factor in "slave >= factor && master >= slave *
factor" isn't a good choice for my machine at least.

Thanks
Barry
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel